From deleeuw@stat.ucla.edu Wed Mar 8 10:25:04 2000 Date: Sun, 5 Mar 2000 10:13:55 -0800 From: Jan de LeeuwTo: stat-lisp-news@stat.umn.edu Subject: The complete idiot's guide to special variables The complete idiot's guide to special variables. by Erann gat DRAFT -- Comments welcome. In order to understand special variables you first have to understand variables, and to explain variables requires some unusual terminological precision. So I have to begin by defining some terms. Symbols and names A SYMBOL is an object that is associated with (among other things) a NAME, which is a string of characters. A symbol object is an instance of a class called SYMBOL. You can make symbols by calling the function MAKE-SYMBOL, which takes one argument, the symbol's name. For example: (make-symbol "FOO") returns a symbol whose name is the string "FOO". (There is also a function, SYMBOL-NAME, which is sort of the opposite of make-symbol: given a symbol it returns the string that is the symbol's name.) Different symbols can have the same name. If you call (make-symbol "FOO") again you will get a second, distinct symbol whose name is also "FOO". (Different symbols can even have names that are EQ.) Packages Symbols can be collected together in aggregations called PACKAGES, which have the useful property that they may not contain distinct symbols with the same name. To add a symbol to a package you use a function called INTERN. INTERN works kind of like MAKE-SYMBOL in that it takes a string as an argument and returns a symbol whose name is that string. But unlike MAKE-SYMBOL, two different calls to INTERN with the same name will return the SAME symbol, not two different symbols with the same name. That is: (eq (make-symbol "FOO") (make-symbol "FOO")) --> NIL (eq (intern "FOO") (intern "FOO")) --> T Uninterned symbols are printed with a leading "#:" to alert you to the fact that two of them may not be the same symbol even though they look alike. Bindings A BINDING is an association between an object and some other object. Strictly speaking, the association between a symbol and its name is a binding, although it's not usually spoken of that way. Symbols also have other bindings. For example, in addition to its name (which must be a string) a symbol is associated with a VALUE (which can be any object) and a FUNCTION (which must be a function object) and a PROPERTY LIST. That is, a symbol has a value binding, and a function binding, and a property list binding (and a name binding). Symbols can also have other kinds of bindings as well, as we'll see shortly. N.B. One of the major differences between Common Lisp and Scheme is that Common Lisp symbols have all these bindings simultaneously, while Scheme symbols only have one binding, the value binding. Now, it is important to distinguish between a BINDING, which you can think of as a memory location associated with an object (typically a symbol) and the CONTENTS of that binding or memory location. (The contents of the binding is often referred to as the "value" of the binding, but I want to avoid that term to avoid confusion with a symbol's value binding. The Standard uses the phrase "that which the binding denotes," but that much too pedantic for my tastes.) You can change the contents of a binding without changing the binding itself. That's what you do when you use SETQ. For example: (setq x 1) changes the contents of the value binding of the symbol whose name is "X" to the fixnum 1. In theory, when a symbol is created it doesn't *have* a symbol-value slot (i.e. it doesn't have a value binding), and so is said to be "unbound". In practice, newly created symbols have a symbol-value slot (same as a dynamic or special binding), but the contents of that slot are set to a privileged object. If an attempt is made to access the contents of a symbol-value slot that contains the object the system traps this condition and signals an "Unbound variable" error. If a symbol truly didn't have a value binding then it would be an error to assign a value to the symbol. SETQ only CHANGES an EXISTING binding, it doesn't CREATE a binding. In practice, SETQ will usually replace the object in the (actually existing) binding of a symbol. The only really kosher way to CREATE a (global) binding for a symbol is to use DEFVAR, but that has some unfortunate side-effects as we will see shortly. The people who wrote CMUCL apparently were purists, so when you try to SETQ a symbol it gets automatically DEFVARed for you. Most people consider this highly annoying (for reasons that will become plain shortly) but strictly speaking it's correct behavior. Dynamic, Special and Lexical bindings Symbols can have two different kinds of value bindings. There are DYNAMIC or SPECIAL value bindings (dynamic and special mean exactly the same thing in this context), and LEXICAL value bindings. The value binding is the only binding that has this distinction, so the "value" qualifier is often dropped from discussions and one speaks only of dynamic (or special -- same thing) bindings and lexical bindings of symbols. Now we come to Very Important Fact #1: The (current) dynamic/special binding of a symbol is the same thing as its symbol-value slot. Install that fact indellibly into your mind before proceeding. Now consider a simple function call in Lisp: (defun foo (x) (+ x x)) The result of calling, say, (FOO 3) is the result of evaluating the body of FOO with the value of the symbol X bound to the number 3. There are three ways we can make this happen. Method 1: Set X to 3 and evaluate the body of the lambda expression. But this is obviously wrong because is destroys the previous value of x. If we do: (setq x 1) (+ (foo 3) x) we would expect to get back 7, but using method 1 we will get 9 instead because calling the function changes (the contents of the dynamic binding of) X. (N.B. Despite method 1 being obviously wrong it was nonetheless used by early versions of languages like BASIC and FORTRAN.) We can fix the problems with method 1 by using METHOD 2: 1. Allocate some memory. 2. Store the current value of X in this newly allocated memory. 3. Change the value of X to 3 and evaluate the body of the lambda experession. 4. Before returning, restore the value of X saved in step 2. This is the method used in the original Lisp invented by John McCarthy. The problem with method 2 is that it doesn't quite do the Right Thing when you try to write a function that returns another function, e.g.: (defun make-adder (x) (lambda (y) (+ x y))) We would like (make-adder 5) to return a function that takes one argument and adds 5 to it. But if we use method 2 then when make-adder returns the value of X will be restored to whatever value it had when make-adder was called. The 5 that we passed in is gone forever. This is the infamous FUNARG problem, and it led to the invention of... METHOD 3: 1. Allocate some memory like before. 2. Store the value of the argument INTO THIS NEWLY ALLOCATED MEMORY 3. Change the BINDING of X (not its value) to associate X with this newly allocated storage 4. Evaluate the body. 5. Before returning, change the BINDING of X to associate X with its previous binding (and thus its previous value) Now MAKE-ADDER will work as expected because when it is called it allocates new storage for X, stores the argument into that new storage, and arranges for X to refer to that new storage. When MAKE-ADDER returns, that new binding continues to exist (through the magic of lexical closures), and the symbol X inside the body of the lambda continues to refer to it. Method 3 is called LEXICAL BINDING because the places in the program where the symbol X refers to the newly created binding (this is called the binding's SCOPE) is determined by the lexical structure of the program. All occurrences of the symbol X that are lexically within the body of a form that binds X refer to the new binding. All FREE occurrences of the symbol X (those that are not within any form that creates a lexical binding for X) refer to the dynamic binding of X. Scheme uses lexical binding exclusively. Common Lisp has both lexical and dynamic binding. Lexical binding is the default. There are two ways to get dynamic binding. The first is to declare a symbol SPECIAL, e.g. (defun foo (x) (declare (special x)) ; Use dynamic binding for X ...) The second is to use DEFVAR. When you DEFVAR a symbol, ALL occurrences of that symbol ANYWHERE in the program use dynamic binding. There is no way to undo the pervasive effect of a DEFVAR, so to avoid running into trouble there is a universal convention is to use DEFVAR only on symbols whose names begin and end with an asterisk, e.g. *X*. Variables Now, at long last, we can start to talk about variables. The Common Lisp Standard defines a variable as "a binding in the variable namespace." What this really means is that a variable IS a symbol's value binding (the binding itself, not the contents of the binding -- that's the value of the variable). This can be either a dynamic binding (i.e. an association between the symbol and its symbol-value slot) or a lexical binding (i.e. an association between the symbol and a newly allocated storage location). If the binding is dynamic then we say that the variable is dynamic or special, and if it's lexical then the variable is lexical. Note that a symbol that has a lexical binding doesn't lose its dynamic binding. It's still there, and you can access it with the SYMBOL-VALUE function, e.g.: (setq x 1) (funcall (lambda (x) (list x (symbol-value 'x))) 2) --> (1 2) The 'default' value binding for a symbol is its lexical binding, if it has one. Note also that when the standard says, e.g. that LET "creates new variable bindings" it's only telling three-quarters of the truth. It creates new LEXICAL bindings, but saying that it creates a new DYNAMIC binding is very misleading. When you write (let ( (x 1) ) (declare (special x)) you are allocating new storage (for the old contents of the symbol-value slot), but the contents of this storage is INACCESSIBLE except for one thing: it is used to restore the old CONTENTS of the SYMBOL-VALUE slot upon exiting the (dynamic) scope of the binding. So you can never "rebind" a dynamic variable. A reference to a dynamic variable ALWAYS refers to the CURRENT dynamic binding which is ALWAYS the symbol-value slot of the symbol. When you "rebind" a dynamic variable what you are really doing is creating a THIRD TYPE of binding that is CALLED a dynamic binding, but is not the same thing as the CURRENT dynamic binding. It's just temporary storage to hold the old value of the symbol-value slot. In other words: (lambda (x) (declare (special x)) ...) can be implemented: (lambda (#1=:#temp) (unwind-protect (progn (rotatef (symbol-value 'x) #1#) ; Store the old value and swap in the new (symbol-macrolet ( (x (symbol-value 'x)) ) ...)) (rotatef (symbol-value 'x) #1#) ; Restore the old value ))) The "new" dynamic binding is the (lexical!) binding of the uninterned symbol #:temp. Clearly this is a completely different beast than a symbol-value slot. Editorial comments You can really screw yourself (and others) by DEFVARing the wrong thing. There is no way to undo a DEFVAR, and its effects are pervasive. This is the reason that there is the universal convention to only use DEFVAR on symbols whose names start and end with asterisks. This is the only way to insure that you are getting a lexical binding. IMO, any language feature that relies on programmers adhering to a lexical convention in order to insure that the semantics of their program are correct is a bad feature. In my book, therefore, DEFVAR is EVIL and should not be used at all. (The same goes for DEFINE-SYMBOL-MACRO.) You should ALWAYS use (declare (special ...)) anywhere you want a dynamic binding. IMO. But there is a deeper philosphical issue. The semantics of (lambda (x) ...) are *radically* different from those of (lambda (x) (declare (special x)) ...) as shown above. This is the reason that the Standard specifically calls out SPECIAL as one of the few declarations that implementations may not ignore. Whenever you start writing exceptions to a rule that's a good indication that you've gotten something wrong. Having one kind of declaration among many that changes the semantics of the program (in a really big way) is an inelegance of the first order and ought to be changed. IMO. -- === Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; US mail: 8142 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.ucla.edu http://www.stat.ucla.edu/~deleeuw and http://home1.gte.net/datamine/ ============================================================================ No matter where you go, there you are. --- Buckaroo Banzai ============================================================================