Local Gi(d) and Gi*(d)
Gi(d) and Gi*(d) are described by Ord and Getis (1995). They indicate the extent to which a location is surrounded by a cluster of high or low values. The Gi(d) statistic excludes the value at i from the summation and is used for spread or diffusion studies, while the Gi*(d) includes the value at i in the summation and is most often used for studies of clustering.
Input
You’ll be asked to enter the input data file. This file should contain N rows coordinates, and the corresponding value of the test variable (x).
Analysis
The null hypothesis in both of these tests is that there is no association between the value found at one site and its neighbors within the specified distance. The expected value under the null hypothesis is 0, and the variance is 1. Therefore, the Gi(d) or Gi*(d ) statistics may be examined as a standard normal variate. Positive Gi(d) or Gi*(d) indicate spatial association of high values, whereas negative Gi(d) or Gi*(d) indicate spatial association of low values.
Formula
[1]
where

[2]
where

Output
The output file includes the total number of points in the analysis, and a listing of the Gi(d) or Gi*(d) value for each point at each distance specified.
Example Gi*(d)
For this example we will examine the clustering of high versus low cancer rates in New Mexico’s counties using the Gi*(d) statistic. The data are taken from the National Cancer Institute Biometry Research Group Datasets, and it is available on their website (
http://dcp.nci.nih.gov/bb/datasets.html). The total number of cases diagnosed between 1980 and 1989 were divided by the 1985 population estimate to determine the cancer rate (cases per 10,000 population) of each county. These data are shown in Table 1. The points used for this analysis are the county seats, and the coordinate units are kilometers. Figure 1 is a map of these county seatsTable 1: Cancer Rates of New Mexico Counties
|
ID Number |
County |
Coordinates |
Cancer Rate |
||
|
1 |
Bernalillo |
219 |
338 |
40.29 |
|
|
2 |
Catron |
027 |
189 |
33.12 |
|
|
3 |
Chaves |
418 |
156 |
76.91 |
|
|
4 |
Colfax |
408 |
538 |
47.37 |
|
|
5 |
Curry |
534 |
262 |
35.19 |
|
|
6 |
DeBaca |
438 |
272 |
73.98 |
|
|
7 |
DonaAna |
212 |
037 |
36.67 |
|
|
8 |
Eddy |
451 |
037 |
68.35 |
|
|
9 |
Grant |
073 |
083 |
37.90 |
|
|
10 |
Guadelupe |
428 |
322 |
31.24 |
|
|
11 |
Harding |
458 |
418 |
27.23 |
|
|
12 |
Hidalgo |
033 |
040 |
44.43 |
|
|
13 |
Lea |
528 |
103 |
41.38 |
|
|
14 |
Lincoln |
295 |
179 |
63.39 |
|
|
15 |
Los Alamos |
246 |
425 |
25.56 |
|
|
16 |
Luna |
119 |
030 |
80.02 |
|
|
17 |
McKinley |
030 |
385 |
11.78 |
|
|
18 |
Mora |
335 |
435 |
42.69 |
|
|
19 |
Otero |
289 |
100 |
32.19 |
|
|
20 |
Quay |
481 |
345 |
68.23 |
|
|
21 |
Rio Arriba |
222 |
514 |
24.21 |
|
|
22 |
Roosevelt |
518 |
239 |
38.90 |
|
|
23 |
Sandoval |
229 |
365 |
48.29 |
|
|
24 |
San Juan |
096 |
531 |
27.30 |
|
|
25 |
San Miguel |
345 |
398 |
31.54 |
|
|
26 |
SantaFe |
279 |
402 |
34.77 |
|
|
27 |
Sierra |
166 |
127 |
140.73 |
|
|
28 |
Socorro |
199 |
222 |
37.79 |
|
|
29 |
Taos |
315 |
484 |
26.60 |
|
|
30 |
Torrance |
272 |
302 |
66.59 |
|
|
31 |
Union |
521 |
488 |
52.70 |
|
|
32 |
Valencia |
160 |
329 |
45.19 |
|
Figure 1: New Mexico County Seats 
Sierra County, with a rate of 140.73 per 10,000 population, has a much higher rate than all other counties. The next highest cancer rate is 80.02 cases per 10,000 people in Luna County. We will examine the county data using the G* statistic to determine if this disparity stands alone or is the center of a cluster. Figure 2 is a map of the distribution of cancer rates by county.
Figure 2: Cancer rates of New Mexico Counties

A summary table of the Gi*(d) values for Sierra County is shown in Table 2. Recall that the Gi*(d) statistic is based on accumulated clustering tendencies. The complete output file is included in Table 3.
Table 2: Gi* values for Sierra County (observation #27)
|
Distance(d) |
50km |
100km |
150km |
200km |
250km |
300km |
|
Gi*(d) |
3.95 |
3.95 |
1.81 |
1.40 |
1.49 |
1.54 |
Clustering is in evidence at 50km and 100km, but as distance increases the tendancy for clustering decreases. Since the next closest point to the Sierra County Seat is over 100km away, this indicates that Sierra County stands alone as a cluster. As the neighboring counties are considered the Gi*(d) values drop, and this implies that anti-clustering forces are in command.
Table 3: Output File
The input data file: nmlung.txt
The total number of points: 32
Distance: 50.000000
# Gi*(d)
1 -0.14201
2 -0.56799
3 1.27146
4 0.03060
5 -0.57951
6 1.14838
7 -0.41887
8 0.91189
9 -0.36720
10 -0.64696
11 -0.81540
12 -0.09290
13 -0.22102
14 0.70354
15 -0.99498
16 1.40210
17 -1.46440
18 -0.57529
19 -0.60705
20 0.90685
21 -0.94226
22 -0.57951
23 -0.14201
24 -0.81246
25 -0.57529
26 -0.99498
27 3.95229
28 -0.37182
29 -0.84187
30 0.83796
31 0.25449
32 -0.06097
Distance: 100.000000
# Gi*(d)
1 -0.35877
2 -0.56799
3 1.27146
4 0.03060
5 0.65710
6 0.28850
7 0.22455
8 0.91189
9 0.56230
10 0.84063
11 0.20650
12 0.56230
13 -0.22102
14 0.06935
15 -1.48652
16 0.27522
17 -1.46440
18 -1.45026
19 -0.19244
20 0.05359
21 -1.59361
22 0.20424
23 -0.35877
24 -0.81246
25 -1.12633
26 -1.35693
27 3.95229
28 -0.37182
29 -1.76916
30 0.38227
31 -0.40318
32 -0.15433
Distance: 150.000000
# Gi*(d)
1 -0.72769
2 -0.61369
3 1.21299
4 -0.73709
5 0.28850
6 0.44533
7 1.89823
8 1.17137
9 1.74185
10 -0.40733
11 0.03118
12 0.19677
13 0.86130
14 2.57943
15 -1.27227
16 2.14461
17 -1.09643
18 -1.64603
19 2.34872
20 -0.23837
21 -1.71618
22 0.69550
23 -1.37782
24 -1.26129
25 -1.07759
26 -1.27227
27 1.80685
28 2.04694
29 -1.55458
30 -0.47802
31 0.19809
32 -1.06122
Distance: 200.000000
# Gi*(d)
1 -1.60199
2 0.97606
3 0.99883
4 -1.74106
5 0.33635
6 0.56721
7 1.68767
8 0.71302
9 1.48808
10 0.25014
11 -0.74365
12 1.74185
13 1.02729
14 2.07760
15 -1.49910
16 1.38907
17 -1.52031
18 -0.85436
19 2.74986
20 0.19228
21 -1.83500
22 0.33635
23 -1.09888
24 -2.15949
25 -0.87391
26 -1.82133
27 1.40557
28 0.86386
29 -1.71076
30 -0.57175
31 -0.45067
32 -1.75313
Distance: 250.000000
# Gi*(d)
1 -0.72829
2 0.65445
3 0.95940
4 -1.49865
5 0.50154
6 -0.24102
7 2.13038
8 0.76704
9 1.40557
10 -0.33502
11 -0.71390
12 1.48808
13 0.99883
14 2.43843
15 -1.89907
16 1.40557
17 -2.17783
18 -0.87391
19 2.82523
20 -0.12649
21 -1.99650
22 0.74490
23 -0.59005
24 -2.20728
25 -1.07311
26 -1.66895
27 1.49201
28 0.89900
29 -1.15890
30 0.76238
31 -0.57532
32 -0.67862
Distance: 300.000000
# Gi*(d)
1 -0.41366
2 0.96351
3 1.56281
4 -0.85436
5 0.57030
6 -0.26637
7 2.33549
8 1.93695
9 1.48872
10 -0.65520
11 -0.45403
12 1.40557
13 0.99883
14 1.68321
15 -1.60165
16 1.40557
17 -0.36352
18 -0.94853
19 2.10932
20 -0.19851
21 -2.59936
22 0.57030
23 -0.25629
24 -2.03640
25 -0.94853
26 -0.01566
27 1.54522
28 0.21334
29 -1.29619
30 -1.08038
31 -1.04677
32 -1.04106
References
Ord, J.K. and Getis, A., (1995) Local Spatial Autocorrelation Statistics: Distribution Issues and an Application, Geographical Analysis, 27(4): 286-306
National Cancer Institute Biometry Research Group Datasets
http://dcp.nci.nih.gov/bb/datasets.html