This lab is an introduction to the geostatistical way of mapping. There are two versions of this lab: the Geo-EAS version (this one), and a GS+ version. The first is unix software, while the second is Windows software.
There are two separate, and very important topics of this lab, reflecting the discussion of the geostatistics module. The first is variography, the study of spatial autocorrelation in a data set using the variogram; the second is kriging interpolation, or the creation of grids (maps) from scattered data.
While many software packages such as ArcView offer you the possibility of interpolation, we must be wary of it for several reasons:
In this lab, you will
ArcView your data:
Once you've added the table, you can "Add Event Theme" to the view, which you do from the View menu of the View window. Make sure that you use the X and Y variables for geographic coordinates, rather than lat and long (which will be ArcView's default choice). The X and Y in the file are in UTM coordinates.
Other than that, just hit "Ok" buttons! ArcView will process your request, interpolate the data, and add the contour theme to your view.
Now on to Geoeas, and the geostatistical approach to the same problem. The big picture here is that, rather than making ad hoc and generally inappropriate assumptions about the spatial autocorrelation evidenced in the data, we are incorporating it into the interpolation process via the variogram. More information should result in better maps.
Interpolation like ArcView's IDW (Inverse Distance Weighting) actually makes an assumption about the spatial autocorrelation - it's just based on a whim. It does not reflect (except by chance) the realities of your data, and the process you are studying.
Variogram modelling is the crucial step in which we analyze and model the spatial autocorrelation structure. If your project data lend themselves to this kind of analysis, then by all means take some time before you're done with your projects to check out the spatial autocorrelation via its variogram structure. In the meantime, we'll practice on the illinois data.
mkdir geoeas cd geoeas cp ~aelon/Public/Html/etc/data/illinois/lab/ill.dat .You now have the data you'll need for this part of the exercise.
setenv GEO_EAS /group/dengue/unix_version setenv PATH ~aelon/bin:$PATH
geoeas prevar
A screen should pop up with a little info about prevar. For the most part, just follow directions.
When you "press any button" you should get a screen with a menu at the bottom. Your arrow keys work to move between menu items. Alternatively you can select a menu item by typing the first letter.
Type an "f", and the file menu will be selected. Type in the data
file name: ill.dat. Follow the directions to create a file
ill.pcf.
ill.pcf will be created, and then you may quit. Not very
exciting, is it!? It's just to prepare the pair comparison file
(.pcf) for the other programs. Each pair of data points is represented in
this file.
Fire up vario:
geoeas vario
ill.pcf by default. It automatically
moves to the variable menu, as you must choose a variable. Space through
until you get to scv1 (we'll replicate the analysis of the the geostatistics module, to some
extent).
Go back to the Model screen, to see the model that the automated procedure came up with (bottom right of your screen). You'll note that the model is nested: that is, it is a weighted (positive) linear combination of several models, each with a sill and a range. There is also a substantial nugget (the nugget is the "y-intercept" of your model). (Do you remember the practical implication of having a large nugget in a variogram model?)
ill.kpf).
When you're done goofing off, quit vario by hitting the "q" key several times, until you're excused.
There are lots of things we haven't done (and should have): we didn't adjust lag sizes; we didn't check for anisotropy (by looking at different directions); we didn't look at other variables. So little time! These are really important steps, and I hope that you will someday have an opportunity to try them out.
You realize, I hope, that this is the heart and soul of the difference between doing what ArcView did to create its surface/contours: we have actually modelled the spatial autocorrelation structure of the data, and so can hope that the characterization of the spatial autocorrelation will result in an improved map. While flipping through the defaults with ArcView we made no such attempt.
geoeas krig
We'll Load parameters (the ill.kpf file we just
created). The Krige Options screen appears, and if you're happy with what
you see - and why on earth wouldn't you be, for pity sake! - we're ready to
execute. Notice that according to this screen, we're doing Ordinary, point
kriging by default, like they did in the papers we read. You did read the
papers, didn't you?!
This is a cheap sort of contour map. Okay: "q" your way out of krig.
We're ready to bring our results into ArcView.
/group/dengue/bin/geoeas2arcview ill.grdwhich should produce a file called
grid.txt
grid.txt) to your itd account,
from your telnet session:
ftp login.itd login with username and password ascii put grid.txtThe file can now be found on your IFS home directory. Alternatively, you can use a Windows ftp program to ftp to your sph account and bring the file back to your local machine.
Include the ill.csv theme and the two contour plots in your view. Flip one contour off and the other on, to see how differently they've mapped the data. As you see here, and as you saw in the geostatistics module, the results can be wildly different!
Why would you choose between one map or the other? If one seems to "trust the data" less, why is that?
Optional exercises (for the courageous):
geoeas xvalid
Owosina et al. [1992] compare two multivariate kernel regression estimators, MARS, LOESS, TPSS and Kriging for reconstructing spatial surfaces from a variety of irregularly sampled synthetic (with varying signal to noise ratios) and ground water data sets. Model parameters were chosen automatically using cross validatory measures in all cases. In terms of RMSE and Mean Absolute Deviation, overall algorithm ordering (best to worst) across the data sets was TPSS, LOESS, KERNEL, MARS, KRIGING. The differences between the best and worst algorithm were dramatic in some cases. Methods for interpolating ground water data irregularly sampled in space and time were also illustrated.
Andy's note: I haven't read this one, but am anxious to see why kriging did so poorly!
It discusses some common misconceptions concerning cross-validation. For example, use of statistical criteria supposedly yields an optimal semivariogram from among competing models. But Davis states that the semivariogram is only "best" with respect to "choice of discrepancy measure, partition set size, predictive function, and number of models to be evaluated."
Page by Andy Long. Comments appreciated.
aelon@sph.umich.edu