Applied Linear Regression Models, Fourth Edition
By Michael H Kutner, Christopher J Nachtsheim and John Neter
Contents:
PART ONE
SIMPLE LINEAR REGRESSION 1
Chapter 1
Linear Regression with One Predictor Variable 2
1.1 Relations between Variables 2
Functional Relation between Two Variables 2
Statistical Relation between Two Variables 3
1.2 Regression Models and Their Uses 5
Historical Origins 5
Basic Concepts 5
Construction of Regression Models 7
Uses of Regression Analysis 8
Regression and Causality 8
Use of Computers 9
1.3 Simple Linear Regression Model with Distribution of Error Terms Unspecified 9
Formal Statement of Model 9
Important Features of Model 9
Meaning of Regression Parameters 11
Alternative Versions of Regression Model 12
1.4 Data for Regression Analysis 12
Observational Data 12
Experimental Data 13
Completely Randomized Design 13
1.5 Overview of Steps in Regression Analysis 13
1.6 Estimation of Regression Function 15
Method of Least Squares 15
Point Estimation of Mean Response 21
Residuals 22
Properties of Fitted Regression Line 23
1.7 Estimation of Error Terms Variance σ2 24
Point Estimator of σ2 24
1.8 Normal Error Regression Model 26
Model 26
Estimation of Parameters by Method of Maximum Likelihood 27
Cited References 33
Problems 33
Exercises 37
Projects 38
Chapter 2
Inferences in Regression and CorrelationAnalysis 40
2.1 Inferences Concerning β1 40
Sampling Distribution of b1 41
Sampling Distribution of (b1 −β1)/s{b1} 44
Confidence Interval for β1 45
Tests Concerning β1 47
2.2 Inferences Concerning β0 48
Sampling Distribution of b0 48
Sampling Distribution of (b0 −β0)/s{b0} 49
Confidence Interval for β0 49
2.3 Some Considerations on Making Inferences Concerning β0 and β1 50
Effects of Departures from Normality 50
Interpretation of Confidence Coefficient and Risks of Errors 50
Spacing of the X Levels 50
Power of Tests 50
2.4 Interval Estimation of E{Yh} 52
Sampling Distribution of ˆYh 52
Sampling Distribution of (ˆYh − E{Yh})/s{ˆYh} 54
Confidence Interval for E{Yh} 54
2.5 Prediction of New Observation 55
Prediction Interval for Yh(new) when Parameters Known 56
Prediction Interval for Yh(new) when Parameters Unknown 57
Prediction of Mean of m New Observations for Given Xh 60
2.6 Confidence Band for Regression Line 61
2.7 Analysis of Variance Approach to Regression Analysis 63
Partitioning of Total Sum of Squares 63
Breakdown of Degrees of Freedom 66
Mean Squares 66
Analysis of Variance Table 67
Expected Mean Squares 68
F Test of β1 = 0 versus β1 _= 0 69
2.8 General Linear Test Approach 72
Full Model 72
Reduced Model 72
Test Statistic 73
Summary 73
2.9 Descriptive Measures of Linear Association between X and Y 74
Coefficient of Determination 74
Limitations of R2 75
Coefficient of Correlation 76
2.10 Considerations in Applying Regression Analysis 77
2.11 Normal Correlation Models 78
Distinction between Regression and Correlation Model 78
Bivariate Normal Distribution 78
Conditional Inferences 80
Inferences on Correlation Coefficients 83
Spearman Rank Correlation Coefficient 87
Cited References 89
Problems 89
Exercises 97
Projects 98
Chapter 3
Diagnostics and Remedial Measures 100
3.1 Diagnostics for Predictor Variable 100
3.2 Residuals 102
Properties of Residuals 102
Semistudentized Residuals 103
Departures from Model to Be Studied by Residuals 103
3.3 Diagnostics for Residuals 103
Nonlinearity of Regression Function 104
Nonconstancy of Error Variance 107
Presence of Outliers 108
Nonindependence of Error Terms 108
Nonnormality of Error Terms 110
Omission of Important Predictor Variables 112
Some Final Comments 114
3.4 Overview of Tests Involving Residuals 114
Tests for Randomness 114
Tests for Constancy of Variance 115
Tests for Outliers 115
Tests for Normality 115
3.5 Correlation Test for Normality 115
3.6 Tests for Constancy of Error Variance 116
Brown-Forsythe Test 116
Breusch-Pagan Test 118
3.7 F Test for Lack of Fit 119
Assumptions 119
Notation 121
Full Model 121
Reduced Model 123
Test Statistic 123
ANOVA Table 124
3.8 Overview of Remedial Measures 127
Nonlinearity of Regression Function 128
Nonconstancy of Error Variance 128
Nonindependence of Error Terms 128
Nonnormality of Error Terms 128
Omission of Important Predictor Variables 129
Outlying Observations 129
3.9 Transformations 129
Transformations for Nonlinear Relation Only 129
Transformations for Nonnormality and Unequal Error Variances 132
Box-Cox Transformations 134
3.10 Exploration of Shape of Regression Function 137
Lowess Method 138
Use of Smoothed Curves to Confirm Fitted Regression Function 139
3.11 Case Example—Plutonium Measurement 141
Cited References 146
Problems 146
Exercises 151
Projects 152
Case Studies 153
Chapter 4
Simultaneous Inferences and Other Topics in Regression Analysis 154
4.1 Joint Estimation of β0 and β1 154
Need for Joint Estimation 154
Bonferroni Joint Confidence Intervals 155
4.2 Simultaneous Estimation of Mean Responses 157
Working-Hotelling Procedure 158
Bonferroni Procedure 159
4.3 Simultaneous Prediction Intervals for New Observations 160
4.4 Regression through Origin 161
Model 161
Inferences 161
Important Cautions for Using Regression through Origin 164
4.5 Effects of Measurement Errors 165
Measurement Errors in Y 165
Measurement Errors in X 165
Berkson Model 167
4.6 Inverse Predictions 168
4.7 Choice of X Levels 170
Cited References 172
Problems 172
Exercises 175
Projects 175
Chapter 5
Matrix Approach to Simple Linear Regression Analysis 176
5.1 Matrices 176
Definition of Matrix 176
Square Matrix 178
Vector 178
Transpose 178
Equality of Matrices 179
5.2 Matrix Addition and Subtraction 180
5.3 Matrix Multiplication 182
Multiplication of a Matrix by a Scalar 182
Multiplication of a Matrix by a Matrix 182
5.4 Special Types of Matrices 185
Symmetric Matrix 185
Diagonal Matrix 185
Vector and Matrix with All Elements Unity 187
Zero Vector 187
5.5 Linear Dependence and Rank of Matrix 188
Linear Dependence 188
Rank of Matrix 188
5.6 Inverse of a Matrix 189
Finding the Inverse 190
Uses of Inverse Matrix 192
5.7 Some Basic Results for Matrices 193
5.8 Random Vectors and Matrices 193
Expectation of Random Vector or Matrix 193
Variance-Covariance Matrix of Random Vector 194
Some Basic Results 196
Multivariate Normal Distribution 196
5.9 Simple Linear Regression Model in Matrix Terms 197
5.10 Least Squares Estimation of Regression Parameters 199
Normal Equations 199
Estimated Regression Coefficients 200
5.11 Fitted Values and Residuals 202
Fitted Values 202
Residuals 203
5.12 Analysis of Variance Results 204
Sums of Squares 204
Sums of Squares as Quadratic Forms 205
5.13 Inferences in Regression Analysis 206
Regression Coefficients 207
Mean Response 208
Prediction of New Observation 209
Cited Reference 209
Problems 209
Exercises 212
PART TWO
MULTIPLE LINEAR
REGRESSION 213
Chapter 6
Multiple Regression I 214
6.1 Multiple Regression Models 214
Need for Several Predictor Variables 214
First-Order Model with Two Predictor Variables 215
First-Order Model with More than Two Predictor Variables 217
General Linear Regression Model 217
6.2 General Linear Regression Model in Matrix Terms 222
6.3 Estimation of Regression Coefficients 223
6.4 Fitted Values and Residuals 224
6.5 Analysis of Variance Results 225
Sums of Squares and Mean Squares 225
F Test for Regression Relation 226
Coefficient of Multiple Determination 226
Coefficient of Multiple Correlation 227
6.6 Inferences about Regression Parameters 227
Interval Estimation of βk 228
Tests for βk 228
Joint Inferences 228
6.7 Estimation of Mean Response and Prediction of New Observation 229
Interval Estimation of E{Yh} 229
Confidence Region for Regression Surface 229
Simultaneous Confidence Intervals for Several Mean Responses 230
Prediction of New Observation Yh(new) 230
Prediction of Mean of m New Observations at Xh 230
Predictions of g New Observations 231
Caution about Hidden Extrapolations 231
6.8 Diagnostics and Remedial Measures 232
Scatter Plot Matrix 232
Three-Dimensional Scatter Plots 233
Residual Plots 233
Correlation Test for Normality 234
Brown-Forsythe Test for Constancy of Error Variance 234
Breusch-Pagan Test for Constancy of Error Variance 234
F Test for Lack of Fit 235
Remedial Measures 236
6.9 An Example—Multiple Regression with Two Predictor Variables 236
Setting 236
Basic Calculations 237
Estimated Regression Function 240
Fitted Values and Residuals 241
Analysis of Appropriateness of Model 241
Analysis of Variance 243
Estimation of Regression Parameters 245
Estimation of Mean Response 245
Prediction Limits for New Observations 247
Cited Reference 248
Problems 248
Exercises 253
Projects 254
Chapter 7
Multiple Regression II 256
7.1 Extra Sums of Squares 256
Basic Ideas 256
Definitions 259
Decomposition of SSR into Extra Sums of Squares 260
ANOVA Table Containing Decomposition of SSR 261
7.2 Uses of Extra Sums of Squares in Tests for Regression Coefficients 263
Test whether a Single βk = 0 263
Test whether Several βk = 0 264
7.3 Summary of Tests Concerning Regression Coefficients 266
Test whether All βk = 0 266
Test whether a Single βk = 0 267
Test whether Some βk = 0 267
Other Tests 268
7.4 Coefficients of Partial Determination 268
Two Predictor Variables 269
General Case 269
Coefficients of Partial Correlation 270
7.5 Standardized Multiple Regression Model 271
Round off Errors in Normal Equations Calculations 271
Lack of Comparability in Regression Coefficients 272
Correlation Transformation 272
Standardized Regression Model 273
X_X Matrix for Transformed Variables 274
Estimated Standardized Regression Coefficients 275
7.6 Multicollinearity and Its Effects 278
Uncorrelated Predictor Variables 279
Nature of Problem when Predictor Variables Are Perfectly Correlated 281
Effects of Multicollinearity 283
Need for More Powerful Diagnostics for Multicollinearity 289
Cited Reference 289
Problems 289
Exercise 292
Projects 293
Chapter 8
Regression Models for Quantitative and Qualitative Predictors 294
8.1 Polynomial Regression Models 294
Uses of Polynomial Models 294
One Predictor Variable—Second Order 295
One Predictor Variable—Third Order 296
One Predictor Variable—Higher Orders 296
Two Predictor Variables—Second Order 297
Three Predictor Variables—Second Order 298
Implementation of Polynomial Regression Models 298
Case Example 300
Some Further Comments on Polynomial Regression 305
8.2 Interaction Regression Models 306
Interaction Effects 306
Interpretation of Interaction Regression Models with Linear Effects 306
Interpretation of Interaction Regression Models with Curvilinear Effects 309
Implementation of Interaction Regression Models 311
8.3 Qualitative Predictors 313
Qualitative Predictor with Two Classes 314
Interpretation of Regression Coefficients 315
Qualitative Predictor with More than Two Classes 318
Time Series Applications 319
8.4 Some Considerations in Using Indicator Variables 321
Indicator Variables versus Allocated Codes 321
Indicator Variables versus Quantitative Variables 322
Other Codings for Indicator Variables 323
8.5 Modeling Interactions between Quantitative and Qualitative Predictors 324
Meaning of Regression Coefficients 324
8.6 More Complex Models 327
More than One Qualitative Predictor Variable 328
Qualitative Predictor Variables Only 329
8.7 Comparison of Two or More Regression Functions 329
Soap Production Lines Example 330
Instrument Calibration Study Example 334
Cited Reference 335
Problems 335
Exercises 340
Projects 341
Case Study 342
Chapter 9
Building the Regression Model I: Model Selection and Validation 343
9.1 Overview of Model-Building Process 343
Data Collection 343
Data Preparation 346
Preliminary Model Investigation 346
Reduction of Explanatory Variables 347
Model Refinement and Selection 349
Model Validation 350
9.2 Surgical Unit Example 350
9.3 Criteria for Model Selection 353
R2 p or SSEp Criterion 354
R2 a,p or MSEp Criterion 355
Mallows’ Cp Criterion 357
AICp and SBCp Criteria 359
PRESSp Criterion 360
9.4 Automatic Search Procedures for Model Selection 361
“Best” Subsets Algorithm 361
Stepwise Regression Methods 364
Forward Stepwise Regression 364
Other Stepwise Procedures 367
9.5 Some Final Comments on Automatic Model Selection Procedures 368
9.6 Model Validation 369
Collection of New Data to Check Model 370
Comparison with Theory, Empirical Evidence, or Simulation Results 371
Data Splitting 372
Cited References 375
Problems 376
Exercise 380
Projects 381
Case Studies 382
Chapter 10
Building the Regression Model II: Diagnostics 384
10.1 Model Adequacy for a Predictor Variable—Added-Variable Plots 384
10.2 Identifying Outlying Y Observations— Studentized Deleted Residuals 390
Outlying Cases 390
Residuals and Semistudentized Residuals 392
Hat Matrix 392
Studentized Residuals 394
Deleted Residuals 395
Studentized Deleted Residuals 396
10.3 Identifying Outlying X Observations—Hat Matrix Leverage Values 398
Use of Hat Matrix for Identifying Outlying X Observations 398
Use of Hat Matrix to Identify Hidden Extrapolation 400
10.4 Identifying Influential Cases—DFFITS, Cook’s Distance, and DFBETAS Measures 400
Influence on Single Fitted Value—DFFITS 401
Influence on All Fitted Values—Cook’s Distance 402
Influence on the Regression Coefficients—DFBETAS 404
Influence on Inferences 405
Some Final Comments 406
10.5 Multicollinearity Diagnostics—Variance Inflation Factor 406
Informal Diagnostics 407
Variance Inflation Factor 408
10.6 Surgical Unit Example—Continued 410
Cited References 414
Problems 414
Exercises 419
Projects 419
Case Studies 420
Chapter 11
Building the Regression Model III: Remedial Measures 421
11.1 Unequal Error Variances Remedial
Measures—Weighted Least Squares 421
Error Variances Known 422
Error Variances Known up to Proportionality Constant 424
Error Variances Unknown 424
11.2 Multicollinearity Remedial Measures—Ridge Regression 431
Some Remedial Measures 431
Ridge Regression 432
11.3 Remedial Measures for Influential Cases—Robust Regression 437
Robust Regression 438
IRLS Robust Regression 439
11.4 Nonparametric Regression: Lowess Method and Regression Trees 449
Lowess Method 449
Regression Trees 453
11.5 Remedial Measures for Evaluating Precision in Nonstandard Situations—Bootstrapping 458
General Procedure 459
Bootstrap Sampling 459
Bootstrap Confidence Intervals 460
11.6 Case Example—MNDOT Traffic Estimation 464
The AADT Database 464
Model Development 465
Weighted Least Squares Estimation 468
Cited References 471
Problems 472
Exercises 476
Projects 476
Case Studies 480
Chapter 12
Autocorrelation in Time Series Data 481
12.1 Problems of Autocorrelation 481
12.2 First-Order Autoregressive Error Model 484
Simple Linear Regression 484
Multiple Regression 484
Properties of Error Terms 485
12.3 Durbin-Watson Test for Autocorrelation 487
12.4 Remedial Measures for Autocorrelation 490
Addition of Predictor Variables 490
Use of Transformed Variables 490
Cochrane-Orcutt Procedure 492
Hildreth-Lu Procedure 495
First Differences Procedure 496
Comparison of Three Methods 498
12.5 Forecasting with Autocorrelated Error Terms 499
Cited References 502
Problems 502
Exercises 507
Projects 508
Case Studies 508
PART THREE
NONLINEAR REGRESSION 509
Chapter 13
Introduction to Nonlinear Regression and Neural Networks 510
13.1 Linear and Nonlinear Regression Models 510
Linear Regression Models 510
Nonlinear Regression Models 511
Estimation of Regression Parameters 514
13.2 Least Squares Estimation in Nonlinear
Regression 515
Solution of Normal Equations 517
Direct Numerical Search—Gauss-Newton Method 518
Other Direct Search Procedures 525
13.3 Model Building and Diagnostics 526
13.4 Inferences about Nonlinear Regression Parameters 527
Estimate of Error Term Variance 527
Large-Sample Theory 528
When Is Large-Sample Theory Applicable? 528
Interval Estimation of a Single γk 531
Simultaneous Interval Estimation of Several γk 532
Test Concerning a Single γk 532
Test Concerning Several γk 533
13.5 Learning Curve Example 533
13.6 Introduction to Neural Network Modeling 537
Neural Network Model 537
Network Representation 540
Neural Network as Generalization of Linear Regression 541
Parameter Estimation: Penalized Least Squares 542
Example: Ischemic Heart Disease 543
Model Interpretation and Prediction 546
Some Final Comments on Neural Network Modeling 547
Cited References 547
Problems 548
Exercises 552
Projects 552
Case Studies 554
Chapter 14
Logistic Regression, Poisson Regression, and Generalized Linear Models 555
14.1 Regression Models with Binary Response Variable 555
Meaning of Response Function when Outcome Variable Is Binary 556
Special Problems when Response Variable Is Binary 557
14.2 Sigmoidal Response Functions for Binary Responses 559
Probit Mean Response Function 559
Logistic Mean Response Function 560
Complementary Log-Log Response Function 562
14.3 Simple Logistic Regression 563
Simple Logistic Regression Model 563
Maximum Likelihood Estimation 564
Interpretation of b1 567
Use of Probit and Complementary Log-Log Response Functions 568
Repeat Observations—Binomial Outcomes 568
14.4 Multiple Logistic Regression 570
Multiple Logistic Regression Model 570
Fitting of Model 571
Polynomial Logistic Regression 575
14.5 Inferences about Regression Parameters 577
Test Concerning a Single βk: Wald Test 578
Interval Estimation of a Single βk 579
Test whether Several βk = 0: Likelihood Ratio Test 580
14.6 Automatic Model Selection Methods 582
Model Selection Criteria 582
Best Subsets Procedures 583
Stepwise Model Selection 583
14.7 Tests for Goodness of Fit 586
Pearson Chi-Square Goodness of Fit Test 586
Deviance Goodness of Fit Test 588
Hosmer-Lemeshow Goodness of Fit Test 589
14.8 Logistic Regression Diagnostics 591
Logistic Regression Residuals 591
Diagnostic Residual Plots 594
Detection of Influential Observations 598
14.9 Inferences about Mean Response 602
Point Estimator 602
Interval Estimation 602
Simultaneous Confidence Intervals for Several Mean Responses 603
14.10 Prediction of a New Observation 604
Choice of Prediction Rule 604
Validation of Prediction Error Rate 607
14.11 Polytomous Logistic Regression for Nominal Response 608
Pregnancy Duration Data with Polytomous Response 609
J − 1 Baseline-Category Logits for Nominal Response 610
Maximum Likelihood Estimation 612
14.12 Polytomous Logistic Regression for Ordinal Response 614
14.13 Poisson Regression 618
Poisson Distribution 618
Poisson Regression Model 619
Maximum Likelihood Estimation 620
Model Development 620
Inferences 621
14.14 Generalized Linear Models 623
Cited References 624
Problems 625
Exercises 634
Projects 635
Case Studies 640
Appendix A
Some Basic Results in Probability and Statistics 641
Appendix B
Tables 659
Appendix C
Data Sets 677
Appendix D
Selected Bibliography 687
Index 695