SIMPLE LINEAR REGRESSION 

A college bookstore must order books two months before each semester starts. They believe that the number of books that will ultimately be sold for any particular course is related to the number of students registered for the course when the books are ordered. They would like to develop a linear regression equation to help plan how many books to order. From past records, the bookstore obtains the number of students registered, X, and the number of books actually sold for a course, Y, for 12 different semesters. These data are below.

                

A. Obtain a scatter plot of the number of books sold versus the number of registered students.

B. At a .01 level of significance is there sufficient evidence to conclude that the number of books sold is related to the number of registered students in a straight-line manner?

C. Carefully explain what the p-value found in part A means.

D. Fully interpret the strength of the straight-line relationship.

E. Give the regression equation, and interpret the coefficients in terms of this problem.

F. If appropriate, predict the number of books that would be sold in a semester when 30 students have registered. Use 95% confidence.

G. If appropriate, estimate the average number of books that would be sold in a semester for all courses with 30 students registered. Use 95% confidence.

H. If appropriate, predict the number of books that would be sold in a semester when 5 students have registered. Use 95% confidence.

 

SOLUTION

A. The following scatterplot with the fitted line was obtained using StatCrunch. 

As the number of students registered for the course increases, the number of books sold by the bookstore appears to increase in a straight-line manner.

 B.     H0:  The number of students registered and the number of books sold are not correlated

          Ha: The number of students registered and the number of books sold are correlated

          Decision Rule: Accept Ha if the calculated p-value < .01.

          Test Statistic: r = the Pearson coefficient of correlation

          Calculations from StatCrunch: r = 0.8997, p-value < 0.0001

Interpretation: At the .01 level of significance I conclude that as the number of students registered   increases, the number of books sold increases in a straight-line manner.

C. Since the p-value is less than 0.0001, this indicates that if the number of students registered and the number of books sold are not correlated (if the null hypothesis is true), then there is virtually no chance that the observed points in the scatterplot would exhibit such an obvious straight-line pattern.

D. r 2 = .809 (80.9%).     80.9% of the variability in the number of books sold is explained by the straight-line relationship with the number of registered students. 19.1% of this variability is unexplained, and due to error. This relationship is quite strong.

When no students have registered for a course, the number of books sold is 9.30 (or about 9).  This is the starting point of the straight-line when x = 0. It is not particularly meaningful in this problem since all the classes sampled had more than 25 students registered.  For each additional student registered for a course, the number of books sold increases by 0.673.

F. Since 30 students is within the range of the sampled number of students, it is appropriate to make this prediction. From Minitab the calculated prediction interval is (25.865078, 33.09856).  I am 95% confident that for a course that has 30 students registered the bookstore will sell between 25.9 and 33.1 books.

G. Since 30 students is within the range of the sampled number of students, it is appropriate to make this estimation. From Minitab the calculated confidence interval is (28.279491, 30.684145).  I am 95% confident that for all courses that have 30 students registered the bookstore will sell an average of between 28.3 and 30.7 books per semester.

H. Since 5 students is not within the range of the sampled number of students, it is not appropriate to use the regression equation to make this prediction. We do not know if the straight-line model would fit data at this point, and we should not extrapolate.

 

COMMENTS ABOUT THE SOLUTION

Simple linear regression results:
Dependent Variable: Books
Independent Variable: Students
Books = 9.3 + 0.6727273 Students
Sample size: 12
R (correlation coefficient) = 0.8997
R-sq = 0.80946046
Estimate of error standard deviation: 1.5308939

Parameter estimates: 

Parameter

Estimate

Std. Err.

DF

T-Stat

P-Value

Intercept

9.3

3.4345746

10

2.707759

0.022

Slope

0.6727273

0.10321285

10

6.5178633

<0.0001


 Analysis of variance table for regression model: 

Source

DF

SS

MS

F-stat

P-value

Model

1

99.56364

99.56364

42.482544

<0.0001

Error

10

23.436363

2.3436363

 

 

Total

11

123

 

 

 


Predicted values: 

X value

Pred. Y

s.e.(Pred. y)

95% C.I.

95% P.I.

30

29.481817

0.5396101

(28.279491, 30.684145)

(25.865078, 33.09856)

 

 

 

Prev ] Next ]

TABLE OF CONTENTS ]