NOTES ON SCHEME
[A summary of the first four chapters of
Essentials of Programming Languages]

(last updated: January 6, 1999)

If you find any ERRORS or have COMMENTS,
e-mail me <abatko AT cs.mcgill.ca> ASAP.
Thanks to all those who have sent such mail in the past.


A statement is a programming language construct that is evaluated only for its effect.
Ex: assignment statements, input/output statements, control statements, ... etc.

An expression is a construct that is evaluated to obtain a value.
Ex: arithmetic expressions.


EXPRESSIONS

A literal (or constant) always returns the indicated value.
Ex: strings, Boolean values, characters.

A variable reference is the value currently associated with, or bound to, the variable. A variable is said to denote the value of its binding. The data that can be bound to variables constitute the denoted values.

Identifiers represent variables.

Procedures that are the values of standard bindings are called standard procedures.

Functions return values, while expressions and procedures do not return values.

Function calls are expressions and procedure calls are statements but Scheme does not have statements, thus it does not make the distinction. Nevertheless, Scheme functions are usually called procedures, so its function calls are usually referred to as procedure calls.

Procedure (or function) calls are sometimes referred to as applications or combinations.

The general syntax of procedure calls is (operator operand_1 operand_2 . . . operand_n). The operator and each operand are components that are themselves expressions, called subexpressions. The operator subexpression is evaluated to obtain a procedure, while the operand subexpressions are evaluated to obtain the arguments of the call before invoking the procedure. Arguments are also referred to as actual parameters, or simply parameters. In Scheme, the order in which the operator and operand subexpressions are evaluated is not specified. Procedures that return procedures are call higher-order procedures, and expressions that return procedures are called higher-order expressions.

Special form identifiers called keywords express operations that cannot be expressed as procedure calls. Each special form has its own sequencing rule. Conditional expressions are a second situation in which a special form is required.

The typical programming environment in which the Scheme language is used is the interactive mode, during which the read-eval-print loop is repeatedly executed. By making a clear distinction between a programming language and programming environments that support it, we treat the language itself as an abstraction allowing the same language to be used in many different environments.

" > " is the Scheme prompt, " ; " is the comment marker, and " #<procedure> " indicates that a procedure has been returned. Characters that are visible when printed are represented as literals by preceding them with #\. Nonprinting characters also have representations, such as #\space and #\newline.

Indexing is zero based.

When identifiers are treated as values, they are called symbols.

The only literals that are "self quoting" are numbers, booleans, strings, and characters.


Lists

A list is an ordered sequence of elements, which may be of arbitrary types. Lists provide a way of combining multiple values into a single compound object. Lists are represented by surrounding representations of its elements with a pair of parentheses.

constructors: list, cons

cons always takes two arguments. The first may be any Scheme value, and the second must be a list. The name cons stands for construct, because cons constructs a new compound object. If the first argument is v and its second argument is the list (v0 v1 ... vn-1), then cons returns the list (v v0 v1 ... vn-1).

(Actually, cons's second argument may be anything, but if it is not a list, then the value returned will not be list either).

(cons 'a '(c d))
=> (a c d)

Lists are a derived data type because they are built using primitive data types: the cons cell and the empty list. The simplest way to divide a list is between the first element and the rest of the list.

selectors: car, cdr

The first element of a list is known as its car and the rest of the list is known as its cdr.

Car and cdr undo what cons does. You can take compositions of car and cdr by creating procedures such as cadr or caddar.

Take note that it is an error to call car or cdr with the empty list.

You can test for an empty list by using the predicate null.


Pairs

Nonempty lists are represented as pairs. A pair is a structure with two fields, called car and cdr. The procedure cons creates a new pair with the car and cdr fields initialized to the values of its first and second arguments, respectively. In fact, these cells may be used to construct cells of arbitrary length. Note that a pair is sometimes called a dotted pair or cons cell.

Box diagrams can be used to illustrate structures of values built from pairs. Thus a list is represented by either the empty list or a chain of pairs, linked by their cdr fields, with the empty list in the cdr field of the last pair of the chain.

A cdr-linked chain of cons cells that does not end in the empty list is called an improper list, even though it is not a list at all.

We can denote such data structures in a linear format by writing (a . d) for a pair whose car is a and whose cdr is d. Dot notation may be intermixed with conventional list notation. Dot notation is required only when writing improper lists.

Pairs may be shared, since different variable bindings and pair fields may refer to the same pair. Standard printed notation does not represent sharing, but box diagrams do. Note that the sharing of literals is not specified.


Vectors

Heterogeneous means that the elements of an ADT may differ in their type.

Homogeneous means that the elements of an ADT must be the same type.

An ADT provides random access to its components if each of its elements be accessed in the same amount of time.

Lists do not provide names for all their elements or random access to them, while Vectors provide random access via index numbers, and may be heterogeneous. The standard procedure vector takes an arbitrary number of arguments, and constructs a vector whose elements contain the argument values. Vectors are written like lists, but with hash (#) immediately preceding the left parenthesis. Vectors must be quoted when they appear in programs as literals.

selector: vector-ref takes a vector and a zero-based index and returns the value of the element indicated by the index.


Cells

A cell is a "one-element" data structure consisting of a one-element vector. Procedure make-cell, constructs a cell, cell-ref references a cell, and cell? determines if its argument is a cell.


Procedures

It is possible to create a new procedure, which may not be bound to variables. Naming can be accomplished by another expression, such as define. The special form for creating a new procedure is lambda. Its most common syntax is (lambda formals body). Here formals is a (possibly empty) list of variables, and body is any expression. The listed variables are said to be formal parameters, or bound variables, of the procedure. Scheme automatically keeps track of types at run time, so declarations are not required. When the procedure is called, the formal parameters (if any) are first bound to (associated with) the arguments, and then the body is evaluated. Within the body, the argument values may be obtained by variables that correspond to the formal parameters. Lambda bindings are not accessible outside the body of the procedure: they are said to be local to the procedure's body.

Procedures without names, which are not the binding of a variable, are said to be anonymous, and are often used as arguments. Procedures map and andmap are both anonymous.

A value is said to be first class if it may be passed to and returned from procedures and stored in data structures. In Scheme, all values are first class, including procedures.

Having functions of more than one argument is certainly convenient, but it is not absolutely necessary. Any procedure p of n ³ 2 arguments can always be transformed into a procedure p' of one argument that returns a procedure of n-1 arguments such that ((p' x1) x2 ... xn) = (p x1 x2 ... xn). By repeating this transformation n-1 times, we obtain a procedure p" such that (... ((p" x1) x2) ... xn) = (p x1 x2 ... xn). This transformation is known as currying.

A curried version of a procedure takes only one argument. If an existing procedure is to be replaced by a curried version, all calls to the procedure must be changed.

The arity of a procedure is the number of arguments that it takes. Most lambda expressions have fixed arity, although it is possible to define new procedures of variable arity with a lambda expression of the form (lambda formal body) where formal is a single variable. When the resulting procedure is invoked, this variable is bound to a list of the argument values.


Inductively Specified Data

A data type consists of a set of values along with a set of operations on those values. Inductive specification is a powerful method of specifying a set of values.

For example, the data type list-of-numbers is the smallest set of values satisfying the two properties:

  1. The empty list is a list-of-numbers, and

  2. If l is a list-of-numbers and n is a number, then the pair (n . l) is a list-of-numbers.

A notation called Backus-Naur Form, or BNF has been developed to express the idea of inductive specification. BNF can be used to inductively define a number of sets at once. These sets are called syntactic categories, or sometimes nonterminals, and are customarily written with angle brackets around the name of the set: <list-of-numbers>.

Thus the BNF definition of list-of-numbers has two rules that correspond to the two properties:

<list-of-numbers> ::= ( )

<list-of-numbers> ::= (<number> . <list-of-numbers>)

Each rule begins by naming the syntactic category being defined, followed by ::= (read is). The right-hand side of each rule specifies a method for constructing members of the syntactic category in terms of other syntactic categories and terminal symbols, such as the left and right parentheses, and the period in the previous example.

BNF can be simplified by combining two rules into one, using the special symbol | (read or):

<list-of-numbers> ::= ( ) | (<number> . <list-of-numbers>)

Another simplification to BNF is the Kleene star, expressed by the notation {. . .}*. When this appears on a right-hand side of a rule, it indicates a sequence of zero or more instances of whatever appears between the braces. If there are zero instances, we get the empty list.

Using Kleene star, the definition of <list-of-numbers> in list notation is simply:

<list-of-numbers> ::= ({<number>}*)

To indicate a sequence of one or more instances, we use a variant of the star notation: Kleene plus {. . .}+.

A syntactic derivation may be used to prove that a given data value is a member of the type. Such a derivation starts with the nonterminal corresponding to the type. At each step, indicated by an arrow =>, a nonterminal is replaced by the right-hand side of a corresponding rule, or with a known member of its syntactic class if the class was left undefined.

Following is a syntactic derivation demonstrating that (14 . ( )) is a list of numbers:

<list-of-numbers>
  => (<number> . <list-of-numbers>)
  => (<14> . <list-of-numbers>)
  => (<14> . ( ))

 
The term datum refers to any literal data representation. BNF may be used to specify concisely the syntactic category of data in Scheme. The following is a set of BNF rules, also called a grammar:

<list> ::= ({<datum>}*)

<dotted-datum> ::= ({<datum>}+) . <datum>)

<vector> ::= #({<datum>}*)

<datum> ::= <number> | <symbol> | <boolean> | <string> | <list> | <dotted-datum> | <vector>

BNF rules are said to be context free because a rule defining a given syntactic category may be applied in any context that makes reference to that syntactic category.

Particular data types may have context sensitive constraints, but for simplicity we usually only specify a context-free grammar using BNF.

Following are BNF definitions of common data types:

Many symbol manipulation procedures are designed to operate on lists that contain only symbols and other similarly restricted lists:

<s-list> ::= ({<symbol-expression>}*)

<symbol-expression> ::= <symbol> | <s-list>

A binary tree with numeric leaves and interior nodes labelled with symbols:

<tree> ::= <number> | (<symbol> <tree> <tree>)

A node in a binary search tree is either empty or contains a key and two subtrees:

<bin-search-tree> ::= ( ) | (<key> <bin-search-tree> <bin-search-tree>)

Recursive procedures are defined so that

  1. the structure of a program reflects the structure of it data and

  2. recursive calls are employed at points where recursion is used in the data type's inductive definition.

Two or more procedures are mutually recursive when they call one another in a recursive manner.

It may be helpful or necessary to introduce auxiliary procedures to solve a problem.


Static Properties of Variables

Those properties of a program that can be determined by analysing the text of a program are said to be static, as opposed to the dynamic properties that are determined by run-time inputs.

Static properties can be analysed by a compiler to detect errors before run time and to improve the efficiency of object code.

In Scheme the relation between a variable reference and the formal parameter to which it refers is a static property.


Lambda Calculus

In order to study variable binding in an abstract context, we introduce a new language called Lambda Calculus. This language has only variable references, lambda expression with a single formal parameter, and procedure calls:

<exp> ::= <varref>
  | (lambda (<var>) <exp>)
  | (<exp> <exp>)

A variable reference is said to be bound in an expression if it refers to a formal parameter introduced in the expression. A variable reference that is not bound to a formal parameter in the expression is said to be free.

A variable is said to occur bound in an expression if the expression contains a bound reference to the variable. A variable is said to occur free in an expression if the expression contains a free reference to the variable.

All variable references must have some associated value when they are evaluated at run time if they are bound to a formal parameter, they are said to be lexically bound. Otherwise, they must either be bound at top level by definitions or be supplied by the system. In this case, they are said to be globally bound. It is an error to reference a variable that is neither lexically nor globally bound.

The value of an expression depends only on the values associated with the variables that occur free within the expression.

The value off an expression is independent of bindings for variables that do not occur free in the expression.

Lambda expressions without free variables are called combinators.

A variable x occurs free in an expression E iff

  1. E is a variable reference and E is the same as x; or

  2. E is of the form (E1 E2) and x occurs free in E1 or E2; or
  3. E is of the form (lambda (y) E'), where y is different from x and x occurs free in E'.

A variable x occurs bound in an expression E iff

  1. E is of the from (E1 E2) and x occurs bound in E1 or E2; or

  2. E is off the from (lambda (y) E'), where x occurs bound in E' or x and y are the same variable and y occurs free in E'.

No variable occurs bound in an expression consisting of just a single variable.


Scope and Lexical Address

The area of program within which a variable declaration is effective is a region. In Scheme the region of a formal parameter is the body of the lambda expression, and the region off a top-level definition is the entire program. The scope of a variable declaration is the text within which references to the variable refer to the declaration. The region and scope of a variable may be the same, but the scope does not include the entire region. Declarations have a limited scope so that the same variable name may be sued different purposes in different parts of a program. In most languages, including Scheme, a declaration's region and scope can be determined statically. Such languages are said to be lexically or statically scoped. Most modern languages allow regions to be nested within each other, as when one lambda expression appears in the body of another. Regions are sometimes called blocks, and such languages are said to be block structured.

The declaration of a variable v has a scope that includes all references to that occur free in the region associated with declaration. Those references to v that occur bound in the region associated with its declaration are shadowed by inner declarations.

Boarders of regions are called contours. A number of contours may be crossed before arriving at the associated declaration. The number of contours crossed is called the lexical (or static) depth of the variable reference.

The declarations associated with a region may be numbered in the order of their appearance in the text. Each variable reference may then be associated with two numbers: its lexical depth and the position of its declaration in the declaring contour (its declaration position).

Taken together, these two numbers are the lexical address of the variable reference. Thus a variable reference v in an expression, may be replaced by (v : d p), where d is its lexical depth and p is its declaration position, (using zero-base indexing for both d and p).

We wish to obtain a general program transformation rule, but we encounter two difficulties. First, the new variable name should not conflict with any other variable names used in that part of the program. The second difficulty occurs if an inner formal parameter declaration creates a hole in the scope of the outer formal parameter: the references to the inner declaration should not be changed.

The result of substituting an expression y for all free occurrences of a variable x in an expression exp is written exp[y/x], read "expression with y for x." The general variable renaming rule for lambda expressions with one formal parameter may be expressed as (lambda (var) exp) = (lambda (var') exp[var'/var]) where var' is any variable that does not occur free in exp. In the lambda calculus this is called a-conversion (alpha-conversion).


If you're going to get anything out of this entire page, take note of these amazing tables that segregate many forms of syntax used in Scheme.


PROCEDURES

SYNTAX

FUNCTION

zero?

tests whether its argument is zero

append

(append list_1 list_2)

Add all elements of list_1 to the front of list_2

car

Car (list)

Extracts the first element of a list

cdr

Cdr (list)

Extracts everything following the first element of a list

length

(length list)

Determines the number of elements in a list

list-ref

(list-ref list index)

Returns the element indicated by the index

vector-ref

(vector-ref vector index)

Returns the value of the element indicated by the index

vector-length

(vector-length vector)

Returns the number of elements in a vector

vector->list

Converts the vector compound data type into a list

list->vector

Converts a list compound data type into a vector

apply

Takes a procedure and a list and returns the result of calling the procedure with the values given in the list

map

(map procedure list)

The procedure must take one argument and the list may be of any length.

It builds a new list whose elements are obtained by calling the procedure with the elements of the original list

andmap

(andmap procedure list)

The procedure must take one argument and the list may be of any length.

It applies the procedure to each element of the list and returns true if all are true

error

(error "message to std output")

Signals an error y printing its arguments, in this case a single string, and then aborting the computation. Error is not a standard procedure

 

SPECIAL FORMS

SYNTAX

FUNCTION

define

(define variable expression)

binds variable to the value of the expression

if

(if test-exp then-exp else-exp)

if test-exp is true, then-exp is evaluated, otherwise, else-exp is evaluated

Quoted literal

(quote datum) or 'datum

Symbolic literals are specially marked, otherwise they would be indistinguishable from variable references

lambda

(lambda formals body)

Creates a procedure

and

(and test1 test2 ... test_n)

Forms a logical expression that is true iff all its subexpressions are true

or

(or test1 test2 ... test_n)

Forms a logical expression that is true if any of its subexpressions are true

 

TYPE PREDICATES

FUNCTION

symbol?

Type predicate

Tests an arbitrary value to see if it is a symbol

number?

Type predicate

Takes an arbitrary value to see if it is a number

boolean?

Type predicate

Tests an arbitrary value to see if it is a boolean

list?

Type predicate

Tests an arbitrary value to see if it is a list

pair?

Type predicate

Tests an arbitrary value to see if it is a pair

vector?

Type predicate

Tests an arbitrary value to see if it is a vector

null?

Predicate

Tests its argument to see if it is the empty list

procedure?

Type predicate

Tests its argument to see if it is a procedure

eq?

Tests if any two objects are the same object

not

Performs logical negations

 

CHARACTER PREDICATES

FUNCTION

char?

Type predicate

Takes an arbitrary value and returns true if its argument is a character and false otherwise

char=?

Procedure

Compares two characters for equality

char<?

Procedure

Determines the order of characters

char->integer

Procedure

Takes a character and returns an integer representation of the character

char-alphabetic?

Procedure

Determines the class of a character

char-numeric?

Procedure

Determines the class of a character

char-whitespace?

Procedure

Determines the class of a character by returning true when its argument is a space, return or linefeed character

 

STRING PREDICATES

FUNCTION

string?

Type predicate

Takes an arbitrary value and returns true if its argument is a string and false otherwise

sting-length

Procedure

Takes a string and returns an integer indicating the number of characters in the string

string-append

Procedure

Concatenates its arguments to form a new string

string->symbol

Procedure

Converts a string into a symbol

string->number

Procedure

Converts a string into a number

string->list

Procedure

Converts a string into a list of characters

string

Procedure

Converts characters into a string

string-ref

Procedure

Takes a string and a nonnegative integer less than the length of the string and returns the character indexed by the integer


Compiled by Alex Batko <abatko AT cs.mcgill.ca> during May and June, 1998.
Please send all errors, additions, and comments to me.
Thanks to all those who have sent such mail in the past.

Summary of Essentials of Programming Languages (8th printing)
by Daniel P. Friedman, Mitchell Wand, and Christopher T. Haynes.
Visit the EOPL ftp site.
©1992 by The MIT Press