Business Statistics: Communicating with Numbers, Fourth Edition
Sanjiv Jaggia and Alison Kelly
CONTENTS
PART ONE
Introduction
CHAPTER 1
DATA AND DATA PREPARATION 2
1.1 Types of Data 4
Sample and Population Data 4
Cross-Sectional and Time Series Data 5
Structured and Unstructured Data 6
Big Data 7
Data on the Web 9
1.2 Variables and Scales of Measurement 10
The Measurement Scales 11
1.3 Data Preparation 15
Counting and Sorting 15
A Note on Handling Missing Values 20
Subsetting 20
A Note on Subsetting Based on Data Ranges 23
1.4 Writing with Data 26
Conceptual Review 28
Additional Exercises 29
PART TWO
Descriptive Statistics
CHAPTER 2
TABULAR AND GRAPHICAL METHODS 32
2.1 Methods to Visualize a Categorical Variable 34
A Frequency Distribution for a Categorical Variable 34
A Bar Chart 35
A Pie Chart 35
Cautionary Comments When Constructing or Interpreting
Charts or Graphs 39
2.2 Methods to Visualize the Relationship Between
Two Categorical Variables 42
A Contingency Table 42
A Stacked Column Chart 43
2.3 Methods to Visualize a Numerical Variable 48
A Frequency Distribution for a Numerical Variable 48
A Histogram 51
A Polygon 55
An Ogive 56
Using Excel and R Construct a Polygon and an Ogive 57
2.4 More Data Visualization Methods 61
A Scatterplot 61
A Scatterplot with a Categorical Variable 63
A Line Chart 65
2.5 A Stem-and-Leaf Diagram 68
2.6 Writing with Data 70
Conceptual Review 72
Additional Exercises 73
Appendix 2.1: Guidelines for Other Software Packages 76
CHAPTER 3
NUMERICAL DESCRIPTIVE MEASURES 80
3.1 Measures of Central Location 82
The Mean 82
The Median 84
The Mode 84
Using Excel and R to Calculate Measures of
Central Location 85
Note on Symmetry 88
Subsetted Means 89
The Weighted Mean 90
3.2 Percentiles and Boxplots 93
A Percentile 93
A Boxplot 94
3.3 The Geometric Mean 97
The Geometric Mean Return 97
Arithmetic Mean versus Geometric Mean 98
The Average Growth Rate 99
3.4 Measures of Dispersion 101
The Range 101
The Mean Absolute Deviation 102
The Variance and the Standard Deviation 103
The Coefficient of Variation 105
3.5 Mean-Variance Analysis and the Sharpe Ratio 107
3.6 Analysis of Relative Location 109
Chebyshev’s Theorem 109
The Empirical Rule 110
z-Scores 111
3.7 Measures of Association 114
3.8 Writing with Data 117
Conceptual Review 119
Additional Exercises 121
Appendix 3.1: Guidelines for Other Software Packages 123
PART THREE
Probability and Probability Distributions
CHAPTER 4
INTRODUCTION TO PROBABILITY 124
4.1 Fundamental Probability Concepts 126
Events 126
Assigning Probabilities 129
4.2 Rules of Probability 133
4.3 Contingency Tables and Probabilities 139
A Note on Independence with Empirical Probabilities 141
4.4 The Total Probability Rule and Bayes’ Theorem 144
The Total Probability Rule and Bayes’ Theorem 144
Extensions of the Total Probability Rule
and Bayes’ Theorem 146
4.5 Counting Rules 149
4.5 Writing with Data 151
Conceptual Review 154
Additional Exercises 155
CHAPTER 5
DISCRETE PROBABILITY
DISTRIBUTIONS 160
5.1 Random Variables and Discrete Probability
Distributions 162
The Discrete Probability Distribution 162
5.2 Expected Value, Variance, and Standard
Deviation 166
Summary Measures 167
Risk Neutrality and Risk Aversion 168
5.3 Portfolio Returns 171
Properties of Random Variables 171
Summary Measures for a Portfolio 172
5.4 The Binomial Distribution 175
Using Excel and R to Obtain Binomial Probabilities 180
5.5 The Poisson Distribution 183
Using Excel and R to Obtain Poisson Probabilities 186
5.6 The Hypergeometric Distribution 189
Using Excel and R to Obtain Hypergeometric
Probabilities 191
5.7 Writing with Data 193
Case Study 193
Conceptual Review 195
Additional Exercises 196
Appendix 5.1: Guidelines for Other Software
Packages 198
CHAPTER 6
CONTINUOUS PROBABILITY
DISTRIBUTIONS 200
6.1 Continuous Random Variables and the Uniform
Distribution 202
The Continuous Uniform Distribution 202
6.2 The Normal Distribution 206
Characteristics of the Normal Distribution 206
The Standard Normal Distribution 207
Finding a Probability for a Given z Value 208
Finding a z Value for a Given Probability 210
The Transformation of Normal Random Variables 212
Using R for the Normal Distribution 216
A Note on the Normal Approximation
of the Binomial Distribution 217
6.3 Other Continuous Probability Distributions 221
The Exponential Distribution 221
Using R for the Exponential Distribution 224
The Lognormal Distribution 224
Using R for the Lognormal Distribution 226
6.4 Writing with Data 229
Conceptual Review 231
Additional Exercises 232
Appendix 6.1: Guidelines for Other Software
Packages 235
PART FOUR
Basic Inference
CHAPTER 7
SAMPLING AND SAMPLING
DISTRIBUTIONS 238
7.1 Sampling 240
Classic Case of a “Bad” Sample: The Literary Digest
Debacle of 1936 240
Trump’s Stunning Victory in 2016 241
Sampling Methods 242
Using Excel and R to Generate a Simple Random Sample 244
7.2 The Sampling Distribution of the Sample Mean 245
The Expected Value and the Standard Error
of the Sample Mean 246
Sampling from a Normal Population 247
The Central Limit Theorem 248
7.3 The Sampling Distribution of the Sample
Proportion 252
The Expected Value and the Standard Error
of the Sample Proportion 252
7.4 The Finite Population Correction Factor 257
7.5 Statistical Quality Control 259
Control Charts 260
Using Excel and R to Create a Control Chart 263
7.6 Writing With Data 267
Conceptual Review 269
Additional Exercises 271
Appendix 7.1: Derivation of the Mean and the
Variance for X¯ and P¯ 273
Appendix 7.2: Properties of Point Estimators 274
Appendix 7.3: Guidelines for Other Software
Packages 275
CHAPTER 8
INTERVAL ESTIMATION 278
8.1 Confidence Interval For The Population Mean
When σ Is Known 280
Constructing a Confidence Interval for μ When σ Is
Known 281
The Width of a Confidence Interval 283
Using Excel and R to Construct a Confidence
Interval for μ When σ Is Known 285
8.2 Confidence Interval For The Population Mean
When σ Is Unknown 288
The t Distribution 288
Summary of the tdf Distribution 289
Locating tdf Values and Probabilities 289
Constructing a Confidence Interval for μ
When σ Is Unknown 291
Using Excel and R to Construct a Confidence
Interval for μ When σ Is Unknown 292
8.3 Confidence Interval for the Population
Proportion 295
8.4 Selecting the Required Sample Size 298
Selecting n to Estimate μ 299
Selecting n to Estimate p 299
8.5 Writing with Data 302
Conceptual Review 303
Additional Exercises 304
Appendix 8.1: Guidelines for Other Software
Packages 307
CHAPTER 9
HYPOTHESIS TESTING 310
9.1 Introduction to Hypothesis Testing 312
The Decision to “Reject” or “Not Reject”
the Null Hypothesis 312
Defining the Null and the Alternative Hypotheses 313
Type I and Type II Errors 315
9.2 Hypothesis Test For The Population Mean
When σ Is Known 318
The p-Value Approach 318
Confidence Intervals and Two-Tailed Hypothesis Tests 322
One Last Remark 323
9.3 Hypothesis Test For The Population Mean
When σ Is Unknown 325
Using Excel and R to Test μ When σ is Unknown 326
9.4 Hypothesis Test for the Population
Proportion 330
9.5 Writing with Data 334
Conceptual Review 336
Additional Exercises 337
Appendix 9.1: The Critical Value Approach 339
Appendix 9.2: Guidelines for Other Software
Packages 342
CHAPTER 10
STATISTICAL INFERENCE CONCERNING
TWO POPULATIONS 344
10.1 Inference Concerning the Difference Between Two
Means 346
Confidence Interval for μ1 − μ2 346
Hypothesis Test for μ1 − μ2 348
Using Excel and R for Testing Hypotheses about μ1 − μ2 350
A Note on the Assumption of Normality 353
10.2 Inference Concerning Mean Differences 357
Recognizing a Matched-Pairs Experiment 357
Confidence Interval for μD 358
Hypothesis Test for μD 358
Using Excel and R for Testing Hypotheses about μD 361
One Last Note on the Matched-Pairs Experiment 362
10.3 Inference Concerning the Difference Between Two
Proportions 366
Confidence Interval for p1 − p2 366
Hypothesis Test for p1 − p2 367
10.4 Writing with Data 372
Conceptual Review 374
Additional Exercises 375
Appendix 10.1: Guidelines for Other Software
Packages 377
CHAPTER 11
STATISTICAL INFERENCE
CONCERNING VARIANCE 380
11.1 Inference Concerning
the Population Variance 382
Sampling Distribution of S2 382
Finding χ df 2 Values and Probabilities 383
Confidence Interval for the Population Variance 385
Hypothesis Test for the Population Variance 386
Note on Calculating the p-Value for a Two-Tailed Test
Concerning σ2 387
Using Excel and R to Test σ2 387
11.2 Inference Concerning the Ratio of Two Population
Variances 391
Sampling Distribution of S 12 ∕ S 22 391
Finding F ( df 1 , df 2 ) Values and Probabilities 392
Confidence Interval for the Ratio of Two Population
Variances 394
Hypothesis Test for the Ratio of Two Population
Variances 395
Using Excel and R to Test σ 12 ∕ σ 22 397
11.3 Writing with Data 401
Conceptual Review 403
Additional Exercises 403
Appendix 11.1: Guidelines for Other Software
Packages 405
CHAPTER 12
CHI-SQUARE TESTS 408
12.1 Goodness-of-Fit Test for
a Multinomial Experiment 410
Using R to Conduct a Goodness-of-Fit Test 414
12.2 Chi-Square Test for Independence 416
Calculating Expected Frequencies 417
Using R to Conduct a Test for Independence 421
12.3 Chi-Square Tests for Normality 423
The Goodness-of-Fit Test for Normality 423
The Jarque-Bera Test 426
Writing with Data 429
Conceptual Review 431
Additional Exercises 432
Appendix 12.1: Guidelines for Other Software
Packages 435
PART FIVE
Advanced Inference
CHAPTER 13
ANALYSIS OF VARIANCE 438
13.1 One-Way Anova Test 440
Between-Treatments Estimate of σ2: MSTR 441
Within-Treatments Estimate of σ2: MSE 442
The One-Way ANOVA Table 444
Using Excel and R to Construct a One-Way ANOVA
Table 444
13.2 Multiple Comparison Methods 449
Fisher’s Least Significant Difference (LSD) Method 449
Tukey’s Honestly Significant Difference (HSD) Method 450
Using R to Construct Tukey Confidence Intervals
for μ1 − μ2 452
13.3 Two-Way Anova Test: No Interaction 456
The Sum of Squares for Factor A, SSA 458
The Sum of Squares for Factor B, SSB 459
The Error Sum of Squares, SSE 459
Using Excel and R for a Two-Way ANOVA Test—No
Interaction 460
13.4 Two-Way Anova Test: With Interaction 465
The Total Sum of Squares, SST 466
The Sum of Squares for Factor A, SSA, and the Sum of
Squares for Factor B, SSB 466
The Sum of Squares for the Interaction of Factor A and
Factor B, SSAB 467
The Error Sum of Squares, SSE 468
Using Excel and R for a Two-Way ANOVA Test—With
Interaction 468
13.5 Writing with Data 472
Conceptual Review 474
Additional Exercises 475
Appendix 13.1: Guidelines for Other Software Packages 479
CHAPTER 14
REGRESSION ANALYSIS 482
14.1 Hypothesis Test for the Correlation Coefficient 484
Testing the Correlation Coefficient ρxy 485
Using Excel and R to Conduct a Hypothesis Test for ρxy 485
14.2 The Linear Regression Model 488
The Simple Linear Regression Model 489
The Multiple Linear Regression Model 493
Using Excel and R to Estimate a Linear Regression
Model 494
14.3 Goodness-of-Fit Measures 500
The Standard Error of the Estimate 501
The Coefficient of Determination, R2 502
The Adjusted R2 504
A Cautionary Note Concerning Goodness-of-fit Measures 505
14.4 Writing with Data 507
Conceptual Review 509
Additional Exercises 510
Appendix 14.1: Guidelines for Other Software Packages 512
CHAPTER 15
INFERENCE WITH REGRESSION MODELS 514
15.1 Tests of Significance 516
Test of Joint Significance 516
Test of Individual Significance 518
Using a Confidence Interval to Determine Individual
Significance 520
A Test for a Nonzero Slope Coefficient 521
Reporting Regression Results 523
15.2 A General Test of Linear Restrictions 527
Using R to Conduct Partial F Tests 530
15.3 Interval Estimates for the Response Variable 532
Using R to Find Interval Estimates for the Response
Variable 535
15.4 Model Assumptions and Common Violations 537
Residual Plots 537
Assumption 1. 538
Detecting Nonlinearities 538
Remedy 539
Assumption 2. 539
Detecting Multicollinearity 540
Remedy 541
Assumption 3. 541
Detecting Changing Variability 541
Remedy 542
Assumption 4. 543
Detecting Correlated Observations 543
Remedy 544
Assumption 5. 544
Remedy 544
Assumption 6. 545
Summary of Regression Modeling 545
Using Excel and R for Residual Plots, and R for Robust
Standard Errors 545
15.5 Writing with Data 548
Conceptual Review 550
Additional Exercises 551
Appendix 15.1: Guidelines for Other Software Packages 553
CHAPTER 16
REGRESSION MODELS FOR
NONLINEAR RELATIONSHIPS 556
16.1 Polynomial Regression Models 558
The Quadratic Regression Model 558
Using R to Estimate a Quadratic Regression Model 563
The Cubic Regression Model 564
16.2 Regression Models with Logarithms 567
A Log-Log Model 568
The Logarithmic Model 570
The Exponential Model 571
Using R to Estimate Log-Transformed Models 575
Comparing Linear and Log-Transformed Models 575
Using Excel and R to Compare Linear and
Log-Transformed Models 576
A Cautionary Note Concerning Goodness-of-fit
Measures 577
16.3 Writing with Data 581
Conceptual Review 583
Additional Exercises 583
Appendix 16.1: Guidelines for Other Software Packages 585
CHAPTER 17
REGRESSION MODELS WITH
DUMMY VARIABLES 588
17.1 Dummy Variables 590
A Categorical Explanatory Variable with Two Categories 590
Using Excel and R to Make Dummy Variables 592
Assessing Dummy Variable Models 592
A Categorical Explanatory Variable with Multiple
Categories 593
17.2 Interactions with Dummy Variables 599
Using R to Estimate a Regression Model with a
Dummy Variable and an Interaction Variable 602
17.3 The Linear Probability Model and the Logistic
Regression Models 605
The Linear Probability Model 605
The Logistic Regression Model 606
Using R to Estimate a Logistic Regression Model 609
Accuracy of Binary Choice Models 609
Using R to Find the Accuracy Rate 611
17.4 Writing with Data 613
Conceptual Review 616
Additional Exercises 617
Appendix 17.1: Guidelines for Other Software Packages 620
PART SIX
Supplementary Topics
CHAPTER 18
FORECASTING WITH TIME SERIES DATA 622
18.1 The Forecasting Process for Time Series 624
Forecasting Methods 625
Model Selection Criteria 625
18.2 Simple Smoothing Techniques 626
The Moving Average Technique 627
The Simple Exponential Smoothing Technique 629
Using R for Exponential Smoothing 631
18.3 Linear Regression Models for Trend and
Seasonality 633
The Linear Trend Model 633
The Linear Trend Model with Seasonality 635
Estimating a Linear Trend Model with Seasonality
with R 637
A Note on Causal Models for Forecasting 637
18.4 Nonlinear Regression Models for Trend and
Seasonality 639
The Exponential Trend Model 639
Using R to Forecast with an Exponential Trend Model 641
The Polynomial Trend Model 642
Nonlinear Trend Models with Seasonality 643
Using R to Forecast a Quadratic Trend Model with
Seasons 645
18.5 Causal Forecasting Methods 647
Lagged Regression Models 648
Using R to Estimate Lagged Regression Models 650
18.6 Writing with Data 651
Conceptual Review 653
Additional Exercises 655
Appendix 18.1: Guidelines for Other Software Packages 656
CHAPTER19
RETURNS, INDEX NUMBERS,
AND INFLATION 658
19.1 Investment Return 660
The Adjusted Closing Price 661
Nominal versus Real Rates of Return 662
19.2 Index Numbers 664
A Simple Price Index 664
An Unweighted Aggregate Price Index 666
A Weighted Aggregate Price Index 667
19.3 Using Price Indices to Deflate a Time Series 672
Inflation Rate 674
19.4 Writing with Data 676
Conceptual Review 678
Additional Exercises 679
CHAPTER 20
NONPARAMETRIC TESTS 682
20.1 Testing a Population Median 684
The Wilcoxon Signed-Rank Test for a Population
Median 684
Using a Normal Distribution Approximation for T 687
Using R to Test a Population Median 688
20.2 Testing Two Population Medians 690
The Wilcoxon Signed-Rank Test for a Matched-Pairs
Sample 690
Using R to Test for Median Differences from a Matched-
Pairs Sample 691
The Wilcoxon Rank-Sum Test for Independent Samples 691
Using R to Test for Median Differences from
Independent Samples 694
Using a Normal Distribution Approximation for W 694
20.3 Testing Three or More Population Medians 697
The Kruskal-Wallis Test for Population Medians 697
Using R to Conduct a Kruskal-Wallis Test 699
20.4 The Spearman Rank Correlation Test 700
Using R to Conduct the Spearman Rank Correlation
Test 702
Summary of Parametric and Nonparametric Tests 703
20.5 The Sign Test 705
20.6 Tests Based on Runs 709
The Method of Runs Above and Below the Median 710
Using R to Conduct the Runs Test 711
20.7 Writing with Data 713
Conceptual Review 715
Additional Exercises 717
Appendix 20.1: Guidelines for Other Software
Packages 718
APPENDIXES
APPENDIX A Getting Started with R 721
APPENDIX B Tables 727
APPENDIX C Answers to Selected Even-
Numbered Exercises 739
Glossary 755
Index 763