Old FSO Page with SAS code


N.B.  This is the old page.  I no longer maintain it or update it.  It is here only as a service to those who would like the old SAS code. 

I've developed SAS code to perform FSO on data sets comprising of binary data (presence/absence) and relative abundance data. I'm publishing them on the web so they will be widely and freely available. SAS is not the optimal language for ordination, as its capabilities for matrix handling are limited. However, most researchers have access to a mainframe computer that runs UNIX or some other popular operating system, and SAS is widely available at most universities (a version of SAS for desktop computers is also available; check http://www.sas.com/ for more details). Also, a program written in BASIC or C++ for Macintosh won't run in Windows; something written in Windows 95 might not run in any other Windows version. If someone wants to adapt these SAS programs into something else, go right ahead!

All of these programs given below will have to be modified to run with your data. The matrix sizes will need to be changed to reflect the number of sites and species, as SAS does not currently allow matrix sizes to be defined by variable names. I've included comment statements showing where changes should be made.

Sample data files are included for each kind of analysis. For the abundance data, I used the part of the dataset that was used by Boyce (1998), while data from Ellison & Boyce (2001) was used for the binary data set. The amount of CPU time that each program took when I ran it on my mainframe are given to give you some idea of how long it will take to run your data. For large data sets like the abundance set, you may run out of memory if you run in UNIX. One way to avoid this is to use the following command when invoking SAS:

sas -memsize 64M
Click here for more information if using SAS version 6 or here if using 7 or higher. You may also need to use a _NULL_ DATA set for large datasets; click here for more information.

I recommend that you use the step-across version of each program, because they eliminate the "curlover" distortion inherent to FSO; this is FSO's version of the arch or horseshoe effect. What does curlover look like? This figure shows simulated abundance data from 11 equally spaced sites on an elevational gradient. There are 7 species with symetric, quasi-Gaussian distributions that differ only in where the maxima are located. So, when the actual elevation is plotted against the apparent or ordinated elevation, we should see a straight line. Using the percent-similarity index with the step-across routine, there is little distortion. Without the step-across routine, however, the line curls over on the ends.

Simulated abundance data after FSO with and without step-across

What is step-across? It's a way of more accurately determining the similarity (or its complement, the distance) between sites that have no species in common. Good references are Williamson (1978), who developed it, and Bradfield and Kendel (1987); I use a Dijkstra shortest-path algorithm which I adapted from Minieka (1978). However, if you use the step-across version, make sure that all of your sites have at least one species in common with at least one other site. If there is no connection in species composition between a particular site and at least one other, the program will run forever trying to find a shortest path that does not exist. Note also that for very large data sets, the step-across routine will take significant amounts of computer time. For example, with a data set of 383 sites and 20 species, to took me almost 1.5 h of CPU time on a UNIX mainframe!
 

FSO requires the use of a similarity index that falls between 0 and 1. There are about a dozen that work for abundance data and probably about the same for binary data. For binary data, I now (Boyce & Ellison 2001) recommend any of the following five:  Baroni-Urbani & Buser, Jaccard, Kulczynski, Ochiai or Sørensen (Krebs 1989; Legendre & Legendre 1998) . I've obtained good preliminary results using percent similarity for relative abundance data, but I'm currently working on some simulations to determine which index works best. I've not tried to do anything yet with absolute abundances.


SAS Programs

*N.B.: These programs were updated on 18 January 1999, after I discovered that the relative abundance elevation data set was not transformed. The abundance programs now take raw elevations and transform them to fuzzy sets that range from 0 to 1. I have also trimmed the abundance data set down to 100 sites, as it runs much faster. The elevation data set for the binary programs has already been transformed to a fuzzy set; if you need to use raw elevations, just copy the transformation routine from the abundance programs.


Disclaimer and Request: All of these programs are works-in-progress and probably have some hidden bugs I haven't found yet. You download and use them at your own risk. I am not responsible for any injury, real or imagined, that occurs to you if you download or use them. After all, they are free! All I do ask in return is that you reference me if you publish any results that use these programs. Feedback on anything and everything on this page is most welcome.

Click here to email Rick Boyce