NOTES
ON SCHEME
[A
summary of the first four chapters of
Essentials
of Programming Languages]
(last
updated: January 6, 1999)
If you find
any ERRORS or have COMMENTS,
e-mail me
<abatko AT cs.mcgill.ca> ASAP.
Thanks to all those who have sent such mail in the past.
A statement
is a programming language construct that is evaluated only for its effect.
Ex:
assignment statements, input/output statements, control statements,
... etc.
An expression
is a construct that is evaluated to obtain a value.
Ex:
arithmetic expressions.
EXPRESSIONS
A literal
(or constant)
always returns the indicated value.
Ex:
strings, Boolean values, characters.
A variable
reference
is the value currently associated with, or bound
to, the
variable. A variable is said to denote
the value of its binding. The data that can be bound to variables
constitute the denoted
values.
Identifiers
represent variables.
Procedures
that are the values of standard bindings are called standard
procedures.
Functions
return values, while expressions and procedures do not return values.
Function
calls are expressions and procedure calls are statements but Scheme
does not have statements, thus it does not make the distinction.
Nevertheless, Scheme functions are usually called procedures, so its
function calls are usually referred to as procedure calls.
Procedure (or function) calls are sometimes referred to as applications or combinations.
The
general syntax of procedure calls is (operator
operand_1 operand_2 . . . operand_n).
The operator
and each operand
are components that are themselves expressions, called subexpressions.
The operator subexpression is evaluated to obtain a procedure, while
the operand
subexpressions are evaluated to obtain the arguments of the call before
invoking the procedure.
Arguments are also referred to as
actual parameters,
or simply parameters.
In Scheme, the order in which the operator and operand
subexpressions are evaluated is not specified. Procedures that return
procedures are call higher-order
procedures,
and expressions that return procedures are called higher-order
expressions.
Special
form identifiers
called keywords
express operations that cannot be expressed as procedure calls. Each
special form has its own sequencing
rule.
Conditional expressions are a second situation in which a special
form is required.
The
typical programming
environment
in which the Scheme language is used is the interactive
mode,
during which the read-eval-print
loop is
repeatedly executed. By making a clear distinction between a
programming language and programming environments that support it, we
treat the language itself as an abstraction allowing the same
language to be used in many different environments.
" >
" is the Scheme prompt, " ;
" is the comment marker, and " #<procedure>
" indicates that a procedure has been returned. Characters that
are visible when printed are represented as literals by preceding
them with #\.
Nonprinting characters also have representations, such as #\space
and #\newline.
Indexing
is zero based.
When
identifiers are treated as values, they are called symbols.
The only literals that are "self quoting" are numbers, booleans, strings, and characters.
Lists
A list
is an ordered sequence of elements,
which may be of arbitrary types. Lists provide a way of combining
multiple values into a single compound object. Lists are represented
by surrounding representations of its elements with a pair of parentheses.
constructors:
list, cons
cons always takes two arguments. The first may be any Scheme value, and the second must be a list. The name cons stands for construct, because cons constructs a new compound object. If the first argument is v and its second argument is the list (v0 v1 ... vn-1), then cons returns the list (v v0 v1 ... vn-1).
(Actually, cons's second argument may be anything, but if it is not a list, then the value returned will not be list either).
(cons 'a
'(c d))
=> (a c d)
Lists are
a derived
data type because they are built using primitive
data types: the cons cell and the empty list. The simplest way to
divide a list is between the first element and the rest of the list.
selectors:
car, cdr
The first element of a list is known as its car and the rest of the list is known as its cdr.
Car and cdr undo what cons does. You can take compositions of car and cdr by creating procedures such as cadr or caddar.
Take note
that it is an error to call car
or cdr
with the empty list.
You can test for an empty list by using the predicate null.
Pairs
Nonempty lists are represented as pairs. A pair is a structure with two fields, called car and cdr. The procedure cons creates a new pair with the car and cdr fields initialized to the values of its first and second arguments, respectively. In fact, these cells may be used to construct cells of arbitrary length. Note that a pair is sometimes called a dotted pair or cons cell.
Box diagrams can be used to illustrate structures of values built from pairs. Thus a list is represented by either the empty list or a chain of pairs, linked by their cdr fields, with the empty list in the cdr field of the last pair of the chain.
A cdr-linked chain of cons cells that does not end in the empty list is called an improper list, even though it is not a list at all.
We can denote such data structures in a linear format by writing (a . d) for a pair whose car is a and whose cdr is d. Dot notation may be intermixed with conventional list notation. Dot notation is required only when writing improper lists.
Pairs may be shared, since different variable bindings and pair fields may refer to the same pair. Standard printed notation does not represent sharing, but box diagrams do. Note that the sharing of literals is not specified.
Vectors
Heterogeneous means that the elements of an ADT may differ in their type.
Homogeneous means that the elements of an ADT must be the same type.
An ADT provides random access to its components if each of its elements be accessed in the same amount of time.
Lists do
not provide names for all their elements or random access to them,
while Vectors provide random access via index numbers, and may be
heterogeneous. The standard procedure vector
takes an arbitrary number of arguments, and constructs a vector whose
elements contain the argument values. Vectors are written like lists,
but with hash (#) immediately preceding the left parenthesis. Vectors
must be quoted when they appear in programs as literals.
selector: vector-ref takes a vector and a zero-based index and returns the value of the element indicated by the index.
Cells
A cell is a "one-element" data structure consisting of a one-element vector. Procedure make-cell, constructs a cell, cell-ref references a cell, and cell? determines if its argument is a cell.
Procedures
It is
possible to create a new procedure, which may not be bound to
variables. Naming can be accomplished by another expression, such as define.
The special form for creating a new procedure is lambda.
Its most common syntax is (lambda
formals body).
Here formals
is a (possibly empty) list of variables, and body
is any expression. The listed variables are said to be formal
parameters,
or bound variables,
of the procedure. Scheme automatically keeps track of types at run
time, so declarations are not required. When the procedure is called,
the formal parameters (if any) are first bound
to
(associated with) the arguments, and then the body is evaluated.
Within the body, the argument values may be obtained by variables
that correspond to the formal parameters. Lambda bindings are not
accessible outside the body of the procedure: they are said to be
local to the procedure's body.
Procedures
without names, which are not the binding of a variable, are said to
be anonymous,
and are often used as arguments. Procedures map
and andmap
are both anonymous.
A value is
said to be first
class if
it may be passed to and returned from procedures and stored in data
structures. In Scheme, all values are first class, including procedures.
Having functions of more than one argument is certainly convenient, but it is not absolutely necessary. Any procedure p of n ³ 2 arguments can always be transformed into a procedure p' of one argument that returns a procedure of n-1 arguments such that ((p' x1) x2 ... xn) = (p x1 x2 ... xn). By repeating this transformation n-1 times, we obtain a procedure p" such that (... ((p" x1) x2) ... xn) = (p x1 x2 ... xn). This transformation is known as currying.
A curried
version of a procedure takes only one argument. If an existing
procedure is to be replaced by a curried version, all calls to the
procedure must be changed.
The arity of a procedure is the number of arguments that it takes. Most lambda expressions have fixed arity, although it is possible to define new procedures of variable arity with a lambda expression of the form (lambda formal body) where formal is a single variable. When the resulting procedure is invoked, this variable is bound to a list of the argument values.
Inductively Specified Data
A data type consists of a set of values along with a set of operations on those values. Inductive specification is a powerful method of specifying a set of values.
For example, the data type list-of-numbers is the smallest set of values satisfying the two properties:
The empty
list is a list-of-numbers, and
A notation called Backus-Naur Form, or BNF has been developed to express the idea of inductive specification. BNF can be used to inductively define a number of sets at once. These sets are called syntactic categories, or sometimes nonterminals, and are customarily written with angle brackets around the name of the set: <list-of-numbers>.
Thus the BNF definition of list-of-numbers has two rules that correspond to the two properties:
<list-of-numbers> ::= ( )
<list-of-numbers> ::= (<number> . <list-of-numbers>)
Each rule begins by naming the syntactic category being defined, followed by ::= (read is). The right-hand side of each rule specifies a method for constructing members of the syntactic category in terms of other syntactic categories and terminal symbols, such as the left and right parentheses, and the period in the previous example.
BNF can be simplified by combining two rules into one, using the special symbol | (read or):
<list-of-numbers> ::= ( ) | (<number> . <list-of-numbers>)
Another simplification to BNF is the Kleene star, expressed by the notation {. . .}*. When this appears on a right-hand side of a rule, it indicates a sequence of zero or more instances of whatever appears between the braces. If there are zero instances, we get the empty list.
Using Kleene star, the definition of <list-of-numbers> in list notation is simply:
<list-of-numbers> ::= ({<number>}*)
To
indicate a sequence
of one or more instances,
we use a variant of the star notation: Kleene
plus
{. . .}+.
A syntactic derivation may be used to prove that a given data value is a member of the type. Such a derivation starts with the nonterminal corresponding to the type. At each step, indicated by an arrow =>, a nonterminal is replaced by the right-hand side of a corresponding rule, or with a known member of its syntactic class if the class was left undefined.
Following is a syntactic derivation demonstrating that (14 . ( )) is a list of numbers:
<list-of-numbers>
=> (<number> . <list-of-numbers>)
=> (<14> . <list-of-numbers>)
=> (<14> . ( ))
The term datum
refers to any literal data representation. BNF may be used to specify
concisely the syntactic category of data in Scheme. The following is
a set of BNF rules, also called a grammar:
<list> ::= ({<datum>}*)
<dotted-datum> ::= ({<datum>}+) . <datum>)
<vector> ::= #({<datum>}*)
<datum>
::= <number> | <symbol> | <boolean> |
<string> | <list> | <dotted-datum> | <vector>
BNF rules are said to be context free because a rule defining a given syntactic category may be applied in any context that makes reference to that syntactic category.
Particular
data types may have context
sensitive
constraints, but for simplicity we usually only specify a
context-free grammar using BNF.
Following
are BNF definitions of common data types:
Many symbol manipulation procedures are designed to operate on lists that contain only symbols and other similarly restricted lists:
<s-list> ::= ({<symbol-expression>}*)
<symbol-expression>
::= <symbol> | <s-list>
A binary tree with numeric leaves and interior nodes labelled with symbols:
<tree>
::= <number> | (<symbol> <tree> <tree>)
A node in a binary search tree is either empty or contains a key and two subtrees:
<bin-search-tree>
::= ( ) | (<key> <bin-search-tree> <bin-search-tree>)
Recursive procedures are defined so that
the
structure of a program reflects the structure of it data and
Two or more procedures are mutually recursive when they call one another in a recursive manner.
It may be helpful or necessary to introduce auxiliary procedures to solve a problem.
Static Properties of Variables
Those properties of a program that can be determined by analysing the text of a program are said to be static, as opposed to the dynamic properties that are determined by run-time inputs.
Static properties can be analysed by a compiler to detect errors before run time and to improve the efficiency of object code.
In Scheme the relation between a variable reference and the formal parameter to which it refers is a static property.
Lambda Calculus
In order to study variable binding in an abstract context, we introduce a new language called Lambda Calculus. This language has only variable references, lambda expression with a single formal parameter, and procedure calls:
<exp>
::= <varref>
|
(lambda (<var>) <exp>)
|
(<exp> <exp>)
A variable reference is said to be bound in an expression if it refers to a formal parameter introduced in the expression. A variable reference that is not bound to a formal parameter in the expression is said to be free.
A variable is said to occur bound in an expression if the expression contains a bound reference to the variable. A variable is said to occur free in an expression if the expression contains a free reference to the variable.
All variable references must have some associated value when they are evaluated at run time if they are bound to a formal parameter, they are said to be lexically bound. Otherwise, they must either be bound at top level by definitions or be supplied by the system. In this case, they are said to be globally bound. It is an error to reference a variable that is neither lexically nor globally bound.
The value of an expression depends only on the values associated with the variables that occur free within the expression.
The value
off an expression is independent of bindings for variables that do
not occur free in the expression.
Lambda
expressions without free variables are called combinators.
A variable x occurs free in an expression E iff
E is a
variable reference and E is the same as x; or
A variable x occurs bound in an expression E iff
E is of
the from (E1 E2)
and x occurs bound in E1
or E2;
or
No variable occurs bound in an expression consisting of just a single variable.
Scope and Lexical Address
The area
of program within which a variable declaration is effective is a region.
In Scheme the region of a formal parameter is the body of the lambda
expression, and the region off a top-level definition is the entire
program. The scope
of a variable declaration is the text within which references to the
variable refer to the declaration. The region and scope of a variable
may be the same, but the scope does not include the entire region.
Declarations have a limited scope so that the same variable name may
be sued different purposes in different parts of a program. In most
languages, including Scheme, a declaration's region and scope can be
determined statically. Such languages are said to be lexically
or statically
scoped.
Most modern languages allow regions to be nested
within each other, as when one lambda expression appears in the body
of another. Regions are sometimes called blocks,
and such languages are said to be block
structured.
The
declaration of a variable v
has a scope that includes all references to that occur
free in
the region associated with declaration. Those references to v
that occur bound
in the region associated with its declaration are shadowed by inner declarations.
Boarders of regions are called contours. A number of contours may be crossed before arriving at the associated declaration. The number of contours crossed is called the lexical (or static) depth of the variable reference.
The declarations associated with a region may be numbered in the order of their appearance in the text. Each variable reference may then be associated with two numbers: its lexical depth and the position of its declaration in the declaring contour (its declaration position).
Taken
together, these two numbers are the lexical
address
of the variable reference. Thus a variable reference v
in an expression, may be replaced by (v
: d p),
where d
is its lexical depth and p
is its declaration position, (using zero-base indexing for both d and p).
We wish to
obtain a general program transformation rule, but we encounter two
difficulties. First, the new variable name should not conflict with
any other variable names used in that part of the program. The second
difficulty occurs if an inner formal parameter declaration creates a
hole in the scope of the outer formal parameter: the references to
the inner declaration should not be changed.
The result of substituting an expression y for all free occurrences of a variable x in an expression exp is written exp[y/x], read "expression with y for x." The general variable renaming rule for lambda expressions with one formal parameter may be expressed as (lambda (var) exp) = (lambda (var') exp[var'/var]) where var' is any variable that does not occur free in exp. In the lambda calculus this is called a-conversion (alpha-conversion).
If you're going
to get anything out of this entire page, take note of these amazing
tables that segregate many forms of syntax used in Scheme.
PROCEDURES |
SYNTAX |
FUNCTION |
zero? |
tests whether its argument is zero |
|
append |
(append list_1 list_2) |
Add all elements of list_1 to the front of list_2 |
car |
Car (list) |
Extracts the first element of a list |
cdr |
Cdr (list) |
Extracts everything following the first element of a list |
length |
(length list) |
Determines the number of elements in a list |
list-ref |
(list-ref list index) |
Returns the element indicated by the index |
vector-ref |
(vector-ref vector index) |
Returns the value of the element indicated by the index |
vector-length |
(vector-length vector) |
Returns the number of elements in a vector |
vector->list |
Converts the vector compound data type into a list |
|
list->vector |
Converts a list compound data type into a vector |
|
apply |
Takes a procedure and a list and returns the result of calling the procedure with the values given in the list |
|
map |
(map procedure list) |
The procedure must take one argument and the list may be of any length. It builds a new list whose elements are obtained by calling the procedure with the elements of the original list |
andmap |
(andmap procedure list) |
The procedure must take one argument and the list may be of any length. It applies the procedure to each element of the list and returns true if all are true |
error |
(error "message to std output") |
Signals an error y printing its arguments, in this case a single string, and then aborting the computation. Error is not a standard procedure |
SPECIAL FORMS |
SYNTAX |
FUNCTION |
define |
(define variable expression) |
binds variable to the value of the expression |
if |
(if test-exp then-exp else-exp) |
if test-exp is true, then-exp is evaluated, otherwise, else-exp is evaluated |
Quoted literal |
(quote datum) or 'datum |
Symbolic literals are specially marked, otherwise they would be indistinguishable from variable references |
lambda |
(lambda formals body) |
Creates a procedure |
and |
(and test1 test2 ... test_n) |
Forms a logical expression that is true iff all its subexpressions are true |
or |
(or test1 test2 ... test_n) |
Forms a logical expression that is true if any of its subexpressions are true |
TYPE PREDICATES |
FUNCTION |
|
symbol? |
Type predicate |
Tests an arbitrary value to see if it is a symbol |
number? |
Type predicate |
Takes an arbitrary value to see if it is a number |
boolean? |
Type predicate |
Tests an arbitrary value to see if it is a boolean |
list? |
Type predicate |
Tests an arbitrary value to see if it is a list |
pair? |
Type predicate |
Tests an arbitrary value to see if it is a pair |
vector? |
Type predicate |
Tests an arbitrary value to see if it is a vector |
null? |
Predicate |
Tests its argument to see if it is the empty list |
procedure? |
Type predicate |
Tests its argument to see if it is a procedure |
eq? |
Tests if any two objects are the same object |
|
not |
Performs logical negations |
CHARACTER PREDICATES |
FUNCTION |
|
char? |
Type predicate |
Takes an arbitrary value and returns true if its argument is a character and false otherwise |
char=? |
Procedure |
Compares two characters for equality |
char<? |
Procedure |
Determines the order of characters |
char->integer |
Procedure |
Takes a character and returns an integer representation of the character |
char-alphabetic? |
Procedure |
Determines the class of a character |
char-numeric? |
Procedure |
Determines the class of a character |
char-whitespace? |
Procedure |
Determines the class of a character by returning true when its argument is a space, return or linefeed character |
STRING PREDICATES |
FUNCTION |
|
string? |
Type predicate |
Takes an arbitrary value and returns true if its argument is a string and false otherwise |
sting-length |
Procedure |
Takes a string and returns an integer indicating the number of characters in the string |
string-append |
Procedure |
Concatenates its arguments to form a new string |
string->symbol |
Procedure |
Converts a string into a symbol |
string->number |
Procedure |
Converts a string into a number |
string->list |
Procedure |
Converts a string into a list of characters |
string |
Procedure |
Converts characters into a string |
string-ref |
Procedure |
Takes a string and a nonnegative integer less than the length of the string and returns the character indexed by the integer |
Compiled by Alex Batko
<abatko AT cs.mcgill.ca> during May and June, 1998.
Please send all errors, additions, and comments to me.
Thanks to all those who have sent such mail in the past.
Summary of Essentials
of Programming Languages
(8th printing)
by Daniel
P. Friedman, Mitchell
Wand, and Christopher
T. Haynes.
Visit the EOPL
ftp site.
©1992
by The
MIT Press