Worksheet: Sampling of populations
When most people use the word "population" they are referring to all of the individual humans in a specified group, such as the population of the state of Kentucky. In science, the term population has a more general meaning; it is all the individual items (not just humans) in a specified group. Some examples of populations may help you understand this definition: faculty on NKU campus, oxygen molecules in this room, bull frog tadpoles in Lake Inferior, sugar maple trees in North America, etc. You know from experience that individual items in a population may vary; not all NKU professors are the same sex, age, or weight, not all oxygen molecules in this room are moving at the same velocity, etc.
It is common in science to attempt to describe populations with regard to some measurable characteristics. For example, a wildlife biologist may be interested in determining the age, length, and weight of all white-tailed deer in Ohio. This would obviously be a formidable, if not impossible, task. Instead, what the biologist would have to rely on is measuring a sample of the population of deer. A sample is simply a representative subset of the population. In order for a sample to be truly representative of a population, it needs to be a random sample. That is, every individual item in the population must have an equal chance of being selected (in the case of the deer, captured) for measurement. If the deer captured and measured were only from a certain part of Ohio, or they were only the slowest, easiest ones to capture, the sample would be a biased sample, and would not be a true representation of the entire deer population.
Descriptive statistics
After a scientist has sampled a population, they usually summarize their measurements by calculating and reporting the descriptive statistics of the sample. Among the descriptive statistics are measures of central tendency, measures of dispersion, and sample size.The sample size, symbolized by the letter "n", is simply the number of items in the sample. We will discuss later how large this value should be.
The sample mean is the most commonly used measure of central tendency. It is the average measurement for the sample.
Several measures of dispersion are the range, the standard deviation, and the standard error of the mean. The range of a sample is the simplest measure of dispersion; it indicates the highest and the lowest values in the data set. A better estimate of dispersion is the standard deviation (symbolized by "s"), which is the average of the differences between each measurement (each data point) and the sample mean. The standard error of the mean (designated SEM) is a measure of dispersion we could get if we took many different samples from one population, calculated the mean of each sample, and then used the sample means as our data points. Below is an introductory problem that will help teach you how to calculate and report these descriptive statistics.
Sample problem:
Assume you capture and weigh a sample of Gila aardvarks in the Chiricahua Mountains of Southeastern Arizona. The weights (in kg) of the ten aardvarks follow:
21 19 17 18 18 22 23 21 20 16
1. Calculate the mean weight of the sample of aardvarks.
2. What is the range?
3. Calculate the variance (s2) of the sample. The variance is the sum of all individual deviations squared, divided by (n-1). In other words do the following:
a. Determine the deviation of each data point (subtract the sample mean from each data point). Some of your values will be negative numbers, but that's OK.
b. Square each individual deviation.
c. Add all of the squared deviations together.
d. Divide this sum of deviations squared by the sample size minus one (n-1). The product of this division is the sample variance.
4. Calculate the sample standard deviation (s) simply by taking the square root of the sample variance.
5. The standard error of the mean (SEM) can be calculated by dividing the standard deviation by the square root of the sample size. Calculate the SEM.
6. The reporting of descriptive varies among scientific disciplines, but a common way of reporting them (and the way we will report them in this class) is as follows:
Mean + SEM, n = ?
What is the basic problem associated with a large sample size?What is the basic problem associated with a small sample size?
How do you know if your sample size is large enough? Again, opinions vary among scientists. Many biologists use this rule-of-thumb: the sample is large enough if the SEM is less than 10% of the sample mean. Recall that sample size is in the denominator of the SEM, so the larger "n" is, the smaller SEM is. Based on this guideline, would you conclude that your sample of Gila aardvarks is large enough to provide you with a reasonable description of the entire population?