From deleeuw@stat.ucla.edu Wed Mar  8 10:25:04 2000
Date: Sun, 5 Mar 2000 10:13:55 -0800
From: Jan de Leeuw 
To: stat-lisp-news@stat.umn.edu
Subject: The complete idiot's guide to special variables

          The complete idiot's guide to special variables.
                           by Erann gat
                      DRAFT -- Comments welcome.

In order to understand special variables you first have to understand
variables, and to explain variables requires some unusual terminological
precision.  So I have to begin by defining some terms.

Symbols and names

A SYMBOL is an object that is associated with (among other things) a
NAME, which is a string of characters.  A symbol object is an instance
of a class called SYMBOL.  You can make symbols by calling the function
MAKE-SYMBOL, which takes one argument, the symbol's name.  For example:
(make-symbol "FOO") returns a symbol whose name is the string "FOO".
(There is also a function, SYMBOL-NAME, which is sort of the opposite of
make-symbol: given a symbol it returns the string that is the symbol's
name.)

Different symbols can have the same name.  If you call (make-symbol
"FOO") again you will get a second, distinct symbol whose name is also
"FOO". (Different symbols can even have names that are EQ.)

Packages

Symbols can be collected together in aggregations called PACKAGES, which
have the useful property that they may not contain distinct symbols with
the same name.  To add a symbol to a package you use a function called
INTERN.  INTERN works kind of like MAKE-SYMBOL in that it takes a string
as an argument and returns a symbol whose name is that string.  But
unlike MAKE-SYMBOL, two different calls to INTERN with the same name
will return the SAME symbol, not two different symbols with the same
name.  That is:

(eq (make-symbol "FOO") (make-symbol "FOO")) --> NIL
(eq (intern "FOO") (intern "FOO")) --> T

Uninterned symbols are printed with a leading "#:" to alert you to the
fact that two of them may not be the same symbol even though they look
alike.

Bindings

A BINDING is an association between an object and some other object.
Strictly speaking, the association between a symbol and its name is a
binding, although it's not usually spoken of that way.  Symbols also
have other bindings.  For example, in addition to its name (which must
be a string) a symbol is associated with a VALUE (which can be any
object) and a FUNCTION (which must be a function object) and a PROPERTY
LIST.  That is, a symbol has a value binding, and a function binding,
and a property list binding (and a name binding).  Symbols can also have
other kinds of bindings as well, as we'll see shortly.

N.B.  One of the major differences between Common Lisp and Scheme is
that Common Lisp symbols have all these bindings simultaneously, while
Scheme symbols only have one binding, the value binding.

Now, it is important to distinguish between a BINDING, which you can
think of as a memory location associated with an object (typically a
symbol) and the CONTENTS of that binding or memory location.  (The
contents of the binding is often referred to as the "value" of the
binding, but I want to avoid that term to avoid confusion with a
symbol's value binding.  The Standard uses the phrase "that which the
binding denotes," but that much too pedantic for my tastes.)  You
can change the contents of a binding without changing the binding
itself.  That's what you do when you use SETQ.  For example: (setq x 1)
changes the contents of the value binding of the symbol whose name is
"X" to the fixnum 1.

In theory, when a symbol is created it doesn't *have* a symbol-value
slot (i.e. it doesn't have a value binding), and so is said to be
"unbound".  In practice, newly created symbols have a symbol-value slot
(same as a dynamic or special binding), but the contents of that slot
are set to a privileged  object.  If an attempt is made to
access the contents of a symbol-value slot that contains the 
object the system traps this condition and signals an "Unbound variable"
error.

If a symbol truly didn't have a value binding then it would be an error
to assign a value to the symbol.  SETQ only CHANGES an EXISTING binding,
it doesn't CREATE a binding.  In practice, SETQ will usually replace the
 object in the (actually existing) binding of a symbol.

The only really kosher way to CREATE a (global) binding for a symbol is
to use DEFVAR, but that has some unfortunate side-effects as we will see
shortly.  The people who wrote CMUCL apparently were purists, so when
you try to SETQ a symbol it gets automatically DEFVARed for you.  Most
people consider this highly annoying (for reasons that will become plain
shortly) but strictly speaking it's correct behavior.

Dynamic, Special and Lexical bindings

Symbols can have two different kinds of value bindings.  There are
DYNAMIC or SPECIAL value bindings (dynamic and special mean exactly the
same thing in this context), and LEXICAL value bindings.  The value
binding is the only binding that has this distinction, so the "value"
qualifier is often dropped from discussions and one speaks only of
dynamic (or special -- same thing) bindings and lexical bindings of
symbols.

Now we come to Very Important Fact #1:

   The (current) dynamic/special binding of a symbol is the same
   thing as its symbol-value slot.

Install that fact indellibly into your mind before proceeding.

Now consider a simple function call in Lisp:

(defun foo (x) (+ x x))

The result of calling, say, (FOO 3) is the result of evaluating the
body of FOO with the value of the symbol X bound to the number 3.  There
are three ways we can make this happen.

Method 1:  Set X to 3 and evaluate the body of the lambda expression.

But this is obviously wrong because is destroys the previous value of x.
If we do:

(setq x 1)
(+ (foo 3) x)

we would expect to get back 7, but using method 1 we will get 9
instead because calling the function changes (the contents of the
dynamic binding of) X.

(N.B.  Despite method 1 being obviously wrong it was nonetheless used
by early versions of languages like BASIC and FORTRAN.)

We can fix the problems with method 1 by using METHOD 2:

1.  Allocate some memory.
2.  Store the current value of X in this newly allocated memory.
3.  Change the value of X to 3 and evaluate the body of the lambda experession.
4.  Before returning, restore the value of X saved in step 2.

This is the method used in the original Lisp invented by John
McCarthy.  The problem with method 2 is that it doesn't quite do the
Right Thing when you try to write a function that returns another
function, e.g.:

(defun make-adder (x)
   (lambda (y) (+ x y)))

We would like (make-adder 5) to return a function that takes one
argument and adds 5 to it.  But if we use method 2 then when make-adder
returns the value of X will be restored to whatever value it had when
make-adder was called.  The 5 that we passed in is gone forever.
This is the infamous FUNARG problem, and it led to the invention of...

METHOD 3:

1.  Allocate some memory like before.
2.  Store the value of the argument INTO THIS NEWLY ALLOCATED MEMORY
3.  Change the BINDING of X (not its value) to associate X with this
     newly allocated storage
4.  Evaluate the body.
5.  Before returning, change the BINDING of X to associate X with its
     previous binding (and thus its previous value)

Now MAKE-ADDER will work as expected because when it is called
it allocates new storage for X, stores the argument into that new
storage, and arranges for X to refer to that new storage.  When MAKE-ADDER
returns, that new binding continues to exist (through the magic of lexical
closures), and the symbol X inside the body of the lambda continues to refer
to it.

Method 3 is called LEXICAL BINDING because the places in the program
where the symbol X refers to the newly created binding (this is called
the binding's SCOPE) is determined by the lexical structure of the program.
All occurrences of the symbol X that are lexically within the body of
a form that binds X refer to the new binding.  All FREE occurrences of
the symbol X (those that are not within any form that creates a lexical
binding for X) refer to the dynamic binding of X.

Scheme uses lexical binding exclusively.  Common Lisp has both lexical
and dynamic binding.  Lexical binding is the default.  There are two
ways to get dynamic binding. The first is to declare a symbol SPECIAL, e.g.

(defun foo (x)
   (declare (special x)) ; Use dynamic binding for X
   ...)

The second is to use DEFVAR.  When you DEFVAR a symbol, ALL occurrences of
that symbol ANYWHERE in the program use dynamic binding.  There is no way
to undo the pervasive effect of a DEFVAR, so to avoid running into
trouble there is a universal convention is to use DEFVAR only on symbols
whose names begin and end with an asterisk, e.g. *X*.

Variables

Now, at long last, we can start to talk about variables.

The Common Lisp Standard defines a variable as "a binding in the variable
namespace."  What this really means is that a variable IS a symbol's
value binding (the binding itself, not the contents of the binding --
that's the value of the variable).  This can be either a dynamic binding
(i.e. an association between the symbol and its symbol-value slot) or a
lexical binding (i.e. an association between the symbol and a newly
allocated storage location).  If the binding is dynamic then we say
that the variable is dynamic or special, and if it's lexical then the
variable is lexical.

Note that a symbol that has a lexical binding doesn't lose its dynamic
binding.  It's still there, and you can access it with the SYMBOL-VALUE
function, e.g.:

(setq x 1)
(funcall (lambda (x) (list x (symbol-value 'x))) 2) --> (1 2)

The 'default' value binding for a symbol is its lexical binding, if it
has one.

Note also that when the standard says, e.g. that LET "creates new variable
bindings" it's only telling three-quarters of the truth.  It creates new
LEXICAL bindings, but saying that it creates a new DYNAMIC binding is
very misleading.  When you write (let ( (x 1) ) (declare (special x))
you are allocating new storage (for the old contents of the symbol-value
slot), but the contents of this storage is INACCESSIBLE except for
one thing: it is used to restore the old CONTENTS of the SYMBOL-VALUE slot
upon exiting the (dynamic) scope of the binding.

So you can never "rebind" a dynamic variable.  A reference to a dynamic
variable ALWAYS refers to the CURRENT dynamic binding which is ALWAYS the
symbol-value slot of the symbol.  When you "rebind" a dynamic variable
what you are really doing is creating a THIRD TYPE of binding that is
CALLED a dynamic binding, but is not the same thing as the CURRENT dynamic
binding.  It's just temporary storage to hold the old value of the
symbol-value slot.

In other words:

(lambda (x)
   (declare (special x))
   ...)

can be implemented:

(lambda (#1=:#temp)
   (unwind-protect
     (progn
       (rotatef (symbol-value 'x) #1#) ; Store the old value and swap in the new
       (symbol-macrolet ( (x (symbol-value 'x)) )
         ...))
     (rotatef (symbol-value 'x) #1#) ; Restore the old value
)))

The "new" dynamic binding is the (lexical!) binding of the uninterned
symbol #:temp.  Clearly this is a completely different beast than a
symbol-value slot.

Editorial comments

You can really screw yourself (and others) by DEFVARing the wrong thing.
There is no way to undo a DEFVAR, and its effects are pervasive.  This
is the reason that there is the universal convention to only use DEFVAR
on symbols whose names start and end with asterisks.  This is the only
way to insure that you are getting a lexical binding.

IMO, any language feature that relies on programmers adhering to a
lexical convention in order to insure that the semantics of their program
are correct is a bad feature.  In my book, therefore, DEFVAR is EVIL
and should not be used at all.  (The same goes for DEFINE-SYMBOL-MACRO.)
You should ALWAYS use (declare (special ...)) anywhere you want a dynamic
binding.  IMO.

But there is a deeper philosphical issue.  The semantics of (lambda (x) ...)
are *radically* different from those of (lambda (x) (declare (special x)) ...)
as shown above.  This is the reason that the Standard specifically calls
out SPECIAL as one of the few declarations that implementations may not
ignore.  Whenever you start writing exceptions to a rule that's a good
indication that you've gotten something wrong.  Having one kind of
declaration among many that changes the semantics of the program (in a
really big way) is an inelegance of the first order and ought to be
changed.

IMO.
-- 
===
Jan de Leeuw; Professor and Chair, UCLA Department of Statistics;
US mail: 8142 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw@stat.ucla.edu
    http://www.stat.ucla.edu/~deleeuw and http://home1.gte.net/datamine/
============================================================================
          No matter where you go, there you are. --- Buckaroo Banzai
============================================================================