## The General G(d) Statistic

The General G(d) statistic is a multiplicative measure of overall spatial association of values which fall within a given distance of each other. It was developed by Getis and Ord (1992).

Input

1. The input data file should contain the X,Y coordinates and the value at each point.
2. The maximum distance of study.
3. The number of distance increments within the maximum distance of study.
4. The output file name.

Analysis

A G(d) value higher than the expected G(d) indicates a clustering of high values, and a G(d) lower than the expected G(d) indicates a clustering of low values. The variance of G(d) and a Z-value (standard variates) are calculated to determine the level of significance.

Formula

For a chosen critical distance d, G(d) is

where is the value of the ith point and

is the weight for point i and j for distance d.

The expected mean value of G(d) is

The variance of G(d) is

where

The Z-value is calculated as:

Output

1. The number of points
2. G(d), Expected G(d), Var(G), and Z-value for each specified distance

Example

For this example we will consider the distribution of AIDS rates for the counties of California. The data are taken from the Department of Health Services of the State of California (1999). The rates are cumulative incidences since 1981 per 100,000 population. The data are shown in Table 1. A map showing the AIDS rates by county is shown in Figure 1.

Table 1: Cumulative AIDS rates of California Counties 1981-1999

 County X Y Rate Alameda 195 500 389.13 Alpine 318 560 0.00 Amador 265 550 99.37 Butte 220 630 85.86 Calaveras 280 530 29.70 Colusa 195 598 62.38 Contra Costa 192 515 222.30 Del Norte 100 790 61.57 El Dorado 260 580 87.65 Fresno 320 425 119.26 Glenn 180 630 31.57 Humboldt 90 705 132.20 Imperial 648 56 75.39 Inyo 450 403 56.38 Kern 396 256 130.65 Kings 315 380 144.30 Lake 155 597 184.04 Lassen 270 710 141.50 Los Angeles 436 168 403.26 Madera 315 455 71.84 Marin 175 510 568.01 Mariposa 305 485 67.43 Mendocino 125 602 173.02 Merced 285 470 56.90 Modoc 265 765 9.23 Mono 380 515 18.48 Monterey 212 415 186.08 Napa 185 545 151.79 Nevada 255 610 118.37 Orange 468 112 188.78 Placer 270 595 57.22 Plumas 272 660 27.49 Riverside 600 120 239.32 Sacramento 235 548 219.73 San Benito 220 430 63.14 San Bernadino 584 216 140.66 San Diego 544 52 353.04 San Fransisco 185 503 3041.87 San Joaquin 236 520 120.20 San Luis Obispo 272 260 177.28 San Mateo 190 490 246.13 Santa Barbara 300 200 151.54 Santa Clara 202 475 177.01 Santa Cruz 200 450 185.09 Shasta 197 712 65.25 Sierra 275 630 119.40 Siskiyou 180 782 68.14 Solano 192 540 252.59 Sonoma 170 535 352.65 Stanislaus 265 491 108.42 Sutter 210 590 61.69 Tehama 193 680 37.35 Trinity 140 702 77.64 Tulare 365 385 57.78 Tuolumne 303 515 92.80 Ventura 372 176 99.11 Yolo 205 570 91.98 Yuba 228 604 71.66

Figure 1: Cumulative AIDS Rates of California Counties

The G(d) statistic is computed for 50 mile increments from 50 to 250 miles. The output file is shown as Table 2. The highest Z-value (4.93) is found at a distance of 50 miles, and the Z-values decrease as the distance is increased. The Z value from the tables of the normal distribution for a =0.05 (2-tail) is +/-1.96. At the a =0.05 level, there is significant clustering of high AIDS rates for distances of 50 and 100 miles. This clustering is most evident in the San Fransisco Bay area (Figure 1). As the distance of study is increased, the clustering tendancies of high AIDS rates decrease.

Table 2: Output

```The input data file: aids.dat
The total number of points:  58
Distance      G(d)    Expected G(d)  Variance      Z-value
50.0000    0.19054      0.0587       0.00071       4.9314
100.0000    0.35785      0.2202       0.00483       1.9814
150.0000    0.49061      0.3975       0.00983       0.9393
200.0000    0.57358      0.5299       0.01149       0.4070
250.0000    0.64102      0.6231       0.01116       0.1696
```

References

Getis, A. and Ord, J.K. (1992) The Analysis of Spatial Assosciation by Use of Distance Statistics, Geographical Analysis, 24: 189-206.

Getis, A., and Ord, J.K. (1998), "Spatial Modelling of Disease Dispersion Using a Local Statistic: The Case of AIDS," Chapter 12 in D.A. Griffith, C.G Amrhein, and J-M Huriot (eds.) Econometric Advances in Spatial Modelling and Methodology: Essays in Honour of Jean Paelinck, Kluwer.

State of California Department of Health Services (1999). 1998 Report Health Data Summaries for California Counties.