Local Gi(d) and Gi*(d)

Gi(d) and Gi*(d) are described by Ord and Getis (1995). They indicate the extent to which a location is surrounded by a cluster of high or low values. The Gi(d) statistic excludes the value at i from the summation and is used for spread or diffusion studies, while the Gi*(d) includes the value at i in the summation and is most often used for studies of clustering.

Input

You’ll be asked to enter the input data file. This file should contain N rows coordinates, and the corresponding value of the test variable (x).

Analysis

The null hypothesis in both of these tests is that there is no association between the value found at one site and its neighbors within the specified distance. The expected value under the null hypothesis is 0, and the variance is 1. Therefore, the Gi(d) or Gi*(d ) statistics may be examined as a standard normal variate. Positive Gi(d) or Gi*(d) indicate spatial association of high values, whereas negative Gi(d) or Gi*(d) indicate spatial association of low values.

Formula

[1]

where

 

[2]

where

Output

The output file includes the total number of points in the analysis, and a listing of the Gi(d) or Gi*(d) value for each point at each distance specified.

Example Gi*(d)

For this example we will examine the clustering of high versus low cancer rates in New Mexico’s counties using the Gi*(d) statistic. The data are taken from the National Cancer Institute Biometry Research Group Datasets, and it is available on their website (http://dcp.nci.nih.gov/bb/datasets.html). The total number of cases diagnosed between 1980 and 1989 were divided by the 1985 population estimate to determine the cancer rate (cases per 10,000 population) of each county. These data are shown in Table 1. The points used for this analysis are the county seats, and the coordinate units are kilometers. Figure 1 is a map of these county seats

Table 1: Cancer Rates of New Mexico Counties

ID Number

County

Coordinates

 

Cancer Rate

1

Bernalillo

219

338

40.29

2

Catron

027

189

33.12

3

Chaves

418

156

76.91

4

Colfax

408

538

47.37

5

Curry

534

262

35.19

6

DeBaca

438

272

73.98

7

DonaAna

212

037

36.67

8

Eddy

451

037

68.35

9

Grant

073

083

37.90

10

Guadelupe

428

322

31.24

11

Harding

458

418

27.23

12

Hidalgo

033

040

44.43

13

Lea

528

103

41.38

14

Lincoln

295

179

63.39

15

Los Alamos

246

425

25.56

16

Luna

119

030

80.02

17

McKinley

030

385

11.78

18

Mora

335

435

42.69

19

Otero

289

100

32.19

20

Quay

481

345

68.23

21

Rio Arriba

222

514

24.21

22

Roosevelt

518

239

38.90

23

Sandoval

229

365

48.29

24

San Juan

096

531

27.30

25

San Miguel

345

398

31.54

26

SantaFe

279

402

34.77

27

Sierra

166

127

140.73

28

Socorro

199

222

37.79

29

Taos

315

484

26.60

30

Torrance

272

302

66.59

31

Union

521

488

52.70

32

Valencia

160

329

45.19

 

Figure 1: New Mexico County Seats

Sierra County, with a rate of 140.73 per 10,000 population, has a much higher rate than all other counties. The next highest cancer rate is 80.02 cases per 10,000 people in Luna County. We will examine the county data using the G* statistic to determine if this disparity stands alone or is the center of a cluster. Figure 2 is a map of the distribution of cancer rates by county.

Figure 2: Cancer rates of New Mexico Counties

A summary table of the Gi*(d) values for Sierra County is shown in Table 2. Recall that the Gi*(d) statistic is based on accumulated clustering tendencies. The complete output file is included in Table 3.

Table 2: Gi* values for Sierra County (observation #27)

Distance(d)

50km

100km

150km

200km

250km

300km

Gi*(d)

3.95

3.95

1.81

1.40

1.49

1.54

Clustering is in evidence at 50km and 100km, but as distance increases the tendancy for clustering decreases. Since the next closest point to the Sierra County Seat is over 100km away, this indicates that Sierra County stands alone as a cluster. As the neighboring counties are considered the Gi*(d) values drop, and this implies that anti-clustering forces are in command.

Table 3: Output File

The input data file: nmlung.txt
The total number of points:  32
Distance: 50.000000
    #      Gi*(d)  
       1     -0.14201
       2     -0.56799
       3      1.27146
       4      0.03060
       5     -0.57951
       6      1.14838
       7     -0.41887
       8      0.91189
       9     -0.36720
      10     -0.64696
      11     -0.81540
      12     -0.09290
      13     -0.22102
      14      0.70354
      15     -0.99498
      16      1.40210
      17     -1.46440
      18     -0.57529
      19     -0.60705
      20      0.90685
      21     -0.94226
      22     -0.57951
      23     -0.14201
      24     -0.81246
      25     -0.57529
      26     -0.99498
      27      3.95229
      28     -0.37182
      29     -0.84187
      30      0.83796
      31      0.25449
      32     -0.06097
Distance: 100.000000
    #      Gi*(d)  
       1     -0.35877
       2     -0.56799
       3      1.27146
       4      0.03060
       5      0.65710
       6      0.28850
       7      0.22455
       8      0.91189
       9      0.56230
      10      0.84063
      11      0.20650
      12      0.56230
      13     -0.22102
      14      0.06935
      15     -1.48652
      16      0.27522
      17     -1.46440
      18     -1.45026
      19     -0.19244
      20      0.05359
      21     -1.59361
      22      0.20424
      23     -0.35877
      24     -0.81246
      25     -1.12633
      26     -1.35693
      27      3.95229
      28     -0.37182
      29     -1.76916
      30      0.38227
      31     -0.40318
      32     -0.15433
Distance: 150.000000
    #      Gi*(d)  
       1     -0.72769
       2     -0.61369
       3      1.21299
       4     -0.73709
       5      0.28850
       6      0.44533
       7      1.89823
       8      1.17137
       9      1.74185
      10     -0.40733
      11      0.03118
      12      0.19677
      13      0.86130
      14      2.57943
      15     -1.27227
      16      2.14461
      17     -1.09643
      18     -1.64603
      19      2.34872
      20     -0.23837
      21     -1.71618
      22      0.69550
      23     -1.37782
      24     -1.26129
      25     -1.07759
      26     -1.27227
      27      1.80685
      28      2.04694
      29     -1.55458
      30     -0.47802
      31      0.19809
      32     -1.06122
Distance: 200.000000
    #      Gi*(d)  
       1     -1.60199
       2      0.97606
       3      0.99883
       4     -1.74106
       5      0.33635
       6      0.56721
       7      1.68767
       8      0.71302
       9      1.48808
      10      0.25014
      11     -0.74365
      12      1.74185
      13      1.02729
      14      2.07760
      15     -1.49910
      16      1.38907
      17     -1.52031
      18     -0.85436
      19      2.74986
      20      0.19228
      21     -1.83500
      22      0.33635
      23     -1.09888
      24     -2.15949
      25     -0.87391
      26     -1.82133
      27      1.40557
      28      0.86386
      29     -1.71076
      30     -0.57175
      31     -0.45067
      32     -1.75313
Distance: 250.000000
    #      Gi*(d)  
       1     -0.72829
       2      0.65445
       3      0.95940
       4     -1.49865
       5      0.50154
       6     -0.24102
       7      2.13038
       8      0.76704
       9      1.40557
      10     -0.33502
      11     -0.71390
      12      1.48808
      13      0.99883
      14      2.43843
      15     -1.89907
      16      1.40557
      17     -2.17783
      18     -0.87391
      19      2.82523
      20     -0.12649
      21     -1.99650
      22      0.74490
      23     -0.59005
      24     -2.20728
      25     -1.07311
      26     -1.66895
      27      1.49201
      28      0.89900
      29     -1.15890
      30      0.76238
      31     -0.57532
      32     -0.67862
Distance: 300.000000
    #      Gi*(d)  
       1     -0.41366
       2      0.96351
       3      1.56281
       4     -0.85436
       5      0.57030
       6     -0.26637
       7      2.33549
       8      1.93695
       9      1.48872
      10     -0.65520
      11     -0.45403
      12      1.40557
      13      0.99883
      14      1.68321
      15     -1.60165
      16      1.40557
      17     -0.36352
      18     -0.94853
      19      2.10932
      20     -0.19851
      21     -2.59936
      22      0.57030
      23     -0.25629
      24     -2.03640
      25     -0.94853
      26     -0.01566
      27      1.54522
      28      0.21334
      29     -1.29619
      30     -1.08038
      31     -1.04677
      32     -1.04106

References

Ord, J.K. and Getis, A., (1995) Local Spatial Autocorrelation Statistics: Distribution Issues and an Application, Geographical Analysis, 27(4): 286-306

National Cancer Institute Biometry Research Group Datasets http://dcp.nci.nih.gov/bb/datasets.html