Business Analytics, Fifth Edition
Jeffrey D. Camm, Michael J. Fry, James J. Cochran & Jeffrey W. Ohlmann
Contents
About the Authors xix
Preface xxi
Chapter 1 Introduction to Business Analytics 1
1.1 Decision Making 3
1.2 Business Analytics Defined 3
1.3 A Categorization of Analytical Methods and Models 4
Descriptive Analytics 4
Predictive Analytics 5
Prescriptive Analytics 5
1.4 Big Data, the Cloud, and Artificial Intelligence 6
Volume 6
Velocity 6
Variety 7
Veracity 7
1.5 Business Analytics in Practice 9
Accounting Analytics 9
Financial Analytics 10
Human Resource (HR) Analytics 10
Marketing Analytics 10
Health Care Analytics 11
Supply Chain Analytics 11
Analytics for Government and Nonprofits 11
Sports Analytics 12
Web Analytics 12
1.6 Legal and Ethical Issues in the Use of Data and Analytics 12
Summary 15
Glossary 15
Problems 16
Available in the Cengage eBook:
Appendix: Getting Started with R and Rstudio
Appendix: Basic Data Manipulation with R
Chapter 2 Descriptive Statistics 21
2.1 Overview of Using Data: Definitions and Goals 23
2.2 Types of Data 24
Population and Sample Data 24
Quantitative and Categorical Data 24
Cross-Sectional and Time Series Data 24
Sources of Data 25
2.3 Exploring Data in Excel 27
Sorting and Filtering Data in Excel 27
Conditional Formatting of Data in Excel 30
2.4 Creating Distributions from Data 32
Frequency Distributions for Categorical Data 32
Relative Frequency and Percent Frequency Distributions 33
Frequency Distributions for Quantitative Data 35
Histograms 37
Frequency Polygons 42
Cumulative Distributions 43
2.5 Measures of Location 46
Mean (Arithmetic Mean) 46
Median 47
Mode 48
Geometric Mean 48
2.6 Measures of Variability 51
Range 51
Variance 52
Standard Deviation 53
Coefficient of Variation 54
2.7 Analyzing Distributions 54
Percentiles 55
Quartiles 56
z-Scores 56
Empirical Rule 57
Identifying Outliers 59
Boxplots 59
2.8 Measures of Association Between Two Variables 62
Scatter Charts 62
Covariance 64
Correlation Coefficient 67
Summary 68
Glossary 69
Problems 70
Case Problem 1: Heavenly Chocolates Web Site Transactions 80
Case Problem 2: African Elephant Populations 81
Available in the Cengage eBook:
Appendix: Descriptive Statistics with R
Chapter 3 Data Visualization 83
3.1 Overview of Data Visualization 86
Preattentive Attributes 86
Data-Ink Ratio 89
3.2 Tables 92
Table Design Principles 93
Crosstabulation 94
PivotTables in Excel 97
3.3 Charts 101
Scatter Charts 101
Recommended Charts in Excel 103
Line Charts 104
Bar Charts and Column Charts 108
A Note on Pie Charts and Three-Dimensional Charts 112
Additional Visualizations for Multiple Variables: Bubble Chart,
Scatter Chart Matrix, and Table Lens 112
PivotCharts in Excel 117
3.4 Specialized Data Visualizations 120
Heat Maps 120
Treemaps 121
Waterfall Charts 122
Stock Charts 124
Parallel-Coordinates Chart 126
3.5 Visualizing Geospatial Data 126
Choropleth Maps 127
Cartograms 129
3.6 Data Dashboards 131
Principles of Effective Data Dashboards 131
Applications of Data Dashboards 132
Summary 134
Glossary 134
Problems 136
Case Problem 1: Pelican Stores 149
Case Problem 2: Movie Theater Releases 150
Available in the Cengage eBook:
Appendix: Creating Tabular and Graphical Presentations with R
Appendix: Data Visualization with Tableau
Chapter 4 Data Wrangling: Data Management and Data
Cleaning Strategies 151
4.1 Discovery 153
Accessing Data 153
The Format of the Raw Data 157
4.2 Structuring 158
Data Formatting 159
Arrangement of Data 159
Splitting a Single Field into Multiple Fields 161
Combining Multiple Fields into a Single Field 165
4.3 Cleaning 167
Missing Data 167
Identification of Erroneous Outliers, Other Erroneous Values,
and Duplicate Records 170
4.4 Enriching 176
Subsetting Data 177
Supplementing Data 179
Enhancing Data 182
4.5 Validating and Publishing 186
Validating 186
Publishing 188
Summary 188
Glossary 189
Problems 190
Case Problem 1: Usman Solutions 197
Available in the Cengage eBook:
Appendix: Importing Delimited Files into R
Appendix: Working with Records in R
Appendix: Working with Fields in R
Appendix: Unstacking and Stacking Data in R
Chapter 5 Probability: An Introduction to Modeling
Uncertainty 199
5.1 Events and Probabilities 201
5.2 Some Basic Relationships of Probability 202
Complement of an Event 202
Addition Law 203
5.3 Conditional Probability 205
Independent Events 210
Multiplication Law 210
Bayes’ Theorem 211
5.4 Random Variables 213
Discrete Random Variables 213
Continuous Random Variables 214
5.5 Discrete Probability Distributions 215
Custom Discrete Probability Distribution 215
Expected Value and Variance 217
Discrete Uniform Probability Distribution 220
Binomial Probability Distribution 221
Poisson Probability Distribution 224
5.6 Continuous Probability Distributions 227
Uniform Probability Distribution 227
Triangular Probability Distribution 229
Normal Probability Distribution 231
Exponential Probability Distribution 236
Summary 240
Glossary 240
Problems 242
Case Problem 1: Hamilton County Judges 254
Case Problem 2: McNeil’s Auto Mall 255
Case Problem 3: Gebhardt Electronics 256
Available in the Cengage eBook:
Appendix: Discrete Probability Distributions with R
Appendix: Continuous Probability Distributions with R
Chapter 6 Descriptive Data Mining 257
6.1 Dimension Reduction 259
Geometric Interpretation of Principal Component
Analysis 259
Summarizing Protein Consumption for Maillard
Riposte 262
6.2 Cluster Analysis 266
Measuring Distance Between Observations Consisting
of Quantitative Variables 267
Measuring Distance Between Observations Consisting
of Categorical Variables 269
k-Means Clustering 271
Hierarchical Clustering and Measuring Dissimilarity
Between Clusters 275
Hierarchical Clustering versus k-Means Clustering 283
6.3 Association Rules 284
Evaluating Association Rules 286
6.4 Text Mining 287
Voice of the Customer at Triad Airlines 288
Preprocessing Text Data for Analysis 289
Movie Reviews 290
Computing Dissimilarity Between Documents 293
Word Clouds 294
Summary 295
Glossary 296
Problems 298
Case Problem 1: Big Ten Expansion 315
Case Problem 2: Know Thy Customer 316
Available in the Cengage eBook:
Appendix: Principal Component Analysis with R
Appendix: k-Means Clustering with R
Appendix: Hierarchical Clustering with R
Appendix: Association Rules with R
Appendix: Text Mining with R
Appendix: Principal Component Analysis with
Orange
Appendix: k-Means Clustering with Orange
Appendix: Hierarchical Clustering with Orange
Appendix: Association Rules with Orange
Appendix: Text Mining with Orange
Chapter 7 Statistical Inference 319
7.1 Selecting a Sample 322
Sampling from a Finite Population 322
Sampling from an Infinite Population 323
7.2 Point Estimation 326
Practical Advice 328
7.3 Sampling Distributions 328
Sampling Distribution of x − 331
Sampling Distribution of p− 336
7.4 Interval Estimation 339
Interval Estimation of the Population Mean 339
Interval Estimation of the Population
Proportion 346
7.5 Hypothesis Tests 349
Developing Null and Alternative Hypotheses 349
Type I and Type II Errors 352
Hypothesis Test of the Population Mean 353
Hypothesis Test of the Population Proportion 364
7.6 Big Data, Statistical Inference, and Practical
Significance 367
Sampling Error 367
Nonsampling Error 368
Big Data 369
Understanding What Big Data Is 370
Big Data and Sampling Error 371
Big Data and the Precision of Confidence Intervals 372
Implications of Big Data for Confidence Intervals 373
Big Data, Hypothesis Testing, and p Values 374
Implications of Big Data in Hypothesis Testing 376
Summary 376
Glossary 377
Problems 380
Case Problem 1: Young Professional Magazine 390
Case Problem 2: Quality Associates, Inc. 391
Available in the Cengage eBook:
Appendix: Random Sampling with R
Appendix: Interval Estimation with R
Appendix: Hypothesis Testing with R
Chapter 8 Linear Regression 393
8.1 Simple Linear Regression Model 395
Estimated Simple Linear Regression Equation 395
8.2 Least Squares Method 397
Least Squares Estimates of the Simple Linear
Regression Parameters 399
Using Excel’s Chart Tools to Compute the Estimated
Simple Linear Regression Equation 401
8.3 Assessing the Fit of the Simple Linear Regression
Model 403
The Sums of Squares 403
The Coefficient of Determination 405
Using Excel’s Chart Tools to Compute the Coefficient
of Determination 406
8.4 The Multiple Linear Regression Model 407
Estimated Multiple Linear Regression Equation 407
Least Squares Method and Multiple Linear Regression 408
Butler Trucking Company and Multiple Linear Regression 408
Using Excel’s Regression Tool to Develop the Estimated
Multiple Linear Regression Equation 409
8.5 Inference and Linear Regression 412
Conditions Necessary for Valid Inference in the Least
Squares Linear Regression Model 413
Testing Individual Linear Regression Parameters 417
Addressing Nonsignificant Independent Variables 420
Multicollinearity 421
8.6 Categorical Independent Variables 424
Butler Trucking Company and Rush Hour 424
Interpreting the Parameters 426
More Complex Categorical Variables 427
8.7 Modeling Nonlinear Relationships 429
Quadratic Regression Models 430
Piecewise Linear Regression Models 434
Interaction Between Independent Variables 436
8.8 Model Fitting 441
Variable Selection Procedures 441
Overfitting 442
8.9 Big Data and Linear Regression 443
Inference and Very Large Samples 443
Model Selection 446
8.10 Prediction with Linear Regression 447
Summary 450
Glossary 450
Problems 452
Case Problem 1: Alumni Giving 466
Case Problem 2: Consumer Research, Inc. 468
Case Problem 3: Predicting Winnings for NASCAR Drivers 469
Available in the Cengage eBook:
Appendix: Simple Linear Regression with R
Appendix: Multiple Linear Regression with R
Appendix: Linear Regression Variable Selection Procedures with R
Chapter 9 Time Series Analysis and Forecasting 471
9.1 Time Series Patterns 474
Horizontal Pattern 474
Trend Pattern 476
Seasonal Pattern 477
Trend and Seasonal Pattern 478
Cyclical Pattern 481
Identifying Time Series Patterns 481
9.2 Forecast Accuracy 481
9.3 Moving Averages and Exponential Smoothing 485
Moving Averages 486
Exponential Smoothing 490
9.4 Using Linear Regression Analysis for Forecasting 494
Linear Trend Projection 494
Seasonality Without Trend 496
Seasonality with Trend 497
Using Linear Regression Analysis as a Causal Forecasting
Method 500
Combining Causal Variables with Trend and
Seasonality Effects 503
Considerations in Using Linear Regression in
Forecasting 504
9.5 Determining the Best Forecasting Model to Use 504
Summary 505
Glossary 505
Problems 506
Case Problem 1: Forecasting Food and Beverage Sales 515
Case Problem 2: Forecasting Lost Sales 515
Appendix 9.1: Using the Excel Forecast Sheet 517
Available in the Cengage eBook:
Appendix: Forecasting with R
Chapter 10 Predictive Data Mining: Regression Tasks 523
10.1 Regression Performance Measures 524
10.2 Data Sampling, Preparation, and Partitioning 526
Static Holdout Method 526
k-Fold Cross-Validation 530
10.3 k-Nearest Neighbors Regression 535
10.4 Regression Trees 538
Constructing a Regression Tree 538
Generating Predictions with a Regression Tree 541
Ensemble Methods 543
10.5 Neural Network Regression 548
Structure of a Neural Network 548
How a Neural Network Learns 552
10.6 Feature Selection 555
Wrapper Methods 556
Filter Methods 556
Embedded Methods 557
Summary 558
Glossary 558
Problems 560
Case Problem: Housing Bubble 568
Available in the Cengage eBook:
Appendix: k-Nearest Neighbors Regression with R
Appendix: Individual Regression Trees with R
Appendix: Random Forests of Regression Trees with R
Appendix: Neural Network Regression with R
Appendix: Regularized Linear Regression with R
Appendix: k-Nearest Neighbors Regression with Orange
Appendix: Individual Regression Trees with Orange
Appendix: Random Forests of Regression Trees with Orange
Appendix: Neural Network Regression with Orange
Appendix: Regularized Linear Regression with Orange
Chapter 11 Predictive Data Mining: Classification Tasks 571
11.1 Data Sampling, Preparation, and Partitioning 573
Static Holdout Method 573
k-Fold Cross-Validation 574
Class Imbalanced Data 574
11.2 Performance Measures for Binary Classification 576
11.3 Classification with Logistic Regression 582
11.4 k-Nearest Neighbors Classification 587
11.5 Classification Trees 591
Constructing a Classification Tree 591
Generating Predictions with a Classification Tree 593
Ensemble Methods 594
11.6 Neural Network Classification 600
Structure of a Neural Network 601
How a Neural Network Learns 605
11.7 Feature Selection 609
Wrapper Methods 609
Filter Methods 610
Embedded Methods 610
Summary 612
Glossary 612
Problems 615
Case Problem: Grey Code Corporation 630
Available in the Cengage eBook:
Appendix: Classification via Logistic Regression with R
Appendix: k-Nearest Neighbors Classification with R
Appendix: Individual Classification Trees with R
Appendix: Random Forests of Classification Trees with R
Appendix: Neural Network Classification with R
Appendix: Classification via Logistic Regression with Orange
Appendix: k-Nearest Neighbors Classification with Orange
Appendix: Individual Classification Trees with Orange
Appendix: Random Forests of Classification Trees with Orange
Appendix: Neural Network Classification with Orange
Chapter 12 Spreadsheet Models 633
12.1 Building Good Spreadsheet Models 635
Influence Diagrams 635
Building a Mathematical Model 635
Spreadsheet Design and Implementing the Model in a
Spreadsheet 637
12.2 What-If Analysis 640
Data Tables 640
Goal Seek 642
Scenario Manager 644
12.3 Some Useful Excel Functions for Modeling 649
SUM and SUMPRODUCT 650
IF and COUNTIF 651
XLOOKUP 654
12.4 Auditing Spreadsheet Models 656
Trace Precedents and Dependents 656
Show Formulas 656
Evaluate Formulas 658
Error Checking 658
Watch Window 659
12.5 Predictive and Prescriptive Spreadsheet Models 660
Summary 661
Glossary 661
Problems 662
Case Problem: Retirement Plan 670
Chapter 13 Monte Carlo Simulation 671
13.1 Risk Analysis for Sanotronics LLC 673
Base-Case Scenario 673
Worst-Case Scenario 674
Best-Case Scenario 674
Sanotronics Spreadsheet Model 674
Use of Probability Distributions to Represent Random Variables 676
Generating Values for Random Variables with Excel 677
Executing Simulation Trials with Excel 681
Measuring and Analyzing Simulation Output 682
13.2 Inventory Policy Analysis for Promus Corp 686
Spreadsheet Model for Promus 687
Generating Values for Promus Corp’s Demand 688
Executing Simulation Trials and Analyzing Output 691
13.3 Simulation Modeling for Land Shark Inc. 693
Spreadsheet Model for Land Shark 694
Generating Values for Land Shark’s Random Variables 696
Executing Simulation Trials and Analyzing Output 698
Generating Bid Amounts with Fitted Distributions 700
13.4 Simulation with Dependent Random Variables 709
Spreadsheet Model for Press Teag Worldwide 709
13.5 Simulation Considerations 714
Verification and Validation 714
Advantages and Disadvantages of Using Simulation 714
Summary 715
Summary of Steps for Conducting a Simulation Analysis 715
Glossary 716
Problems 717
Case Problem 1: Four Corners 731
Case Problem 2: Ginsberg’s Jewelry Snowfall Promotion 732
Appendix 13.1: Common Probability Distributions
for Simulation 734
Chapter 14 Linear Optimization Models 743
14.1 A Simple Maximization Problem 745
Problem Formulation 746
Mathematical Model for the Par, Inc. Problem 748
14.2 Solving the Par, Inc. Problem 749
The Geometry of the Par, Inc. Problem 749
Solving Linear Programs with Excel Solver 751
14.3 A Simple Minimization Problem 755
Problem Formulation 755
Solution for the M&D Chemicals Problem 755
14.4 Special Cases of Linear Program Outcomes 757
Alternative Optimal Solutions 758
Infeasibility 759
Unbounded 760
14.5 Sensitivity Analysis 762
Interpreting Excel Solver Sensitivity Report 762
14.6 General Linear Programming Notation and More
Examples 764
Investment Portfolio Selection 765
Transportation Planning 768
Maximizing Banner Ad Revenue 772
Assigning Project Leaders to Clients 776
Diet Planning 779
14.7 Generating an Alternative Optimal Solution
for a Linear Program 782
Summary 783
Glossary 784
Problems 785
Case Problem1: Investment Strategy 801
Case Problem 2: Solutions Plus 802
Available in the Cengage eBook:
Appendix: Linear Programming with R
Chapter 15 Integer Linear Optimization Models 805
15.1 Types of Integer Linear Optimization Models 806
15.2 Eastborne Realty, an Example of Integer Optimization 807
The Geometry of Linear All-Integer Optimization 808
15.3 Solving Integer Optimization Problems with Excel Solver 810
A Cautionary Note About Sensitivity Analysis 813
15.4 Applications Involving Binary Variables 815
Capital Budgeting 815
Fixed Cost 816
Bank Location 820
Product Design and Market Share Optimization 822
15.5 Modeling Flexibility Provided by Binary Variables 825
Multiple-Choice and Mutually Exclusive Constraints 825
k Out of n Alternatives Constraint 826
Conditional and Corequisite Constraints 826
15.6 Generating Alternatives in Binary Optimization 827
Summary 829
Glossary 830
Problems 830
Case Problem 1: Applecore Children’s Clothing 845
Case Problem 2: Yeager National Bank 847
Available in the Cengage eBook:
Appendix: Integer Programming with R
Chapter 16 Nonlinear Optimization Models 849
16.1 A Production Application: Par, Inc. Revisited 850
An Unconstrained Problem 850
A Constrained Problem 851
Solving Nonlinear Optimization Models Using Excel
Solver 853
Sensitivity Analysis and Shadow Prices in Nonlinear
Models 855
16.2 Local and Global Optima 856
Overcoming Local Optima with Excel Solver 858
16.3 A Location Problem 860
16.4 Markowitz Portfolio Model 861
16.5 Adoption of a New Product: The Bass Forecasting
Model 866
16.6 Heuristic Optimization Using Excel’s Evolutionary
Method 869
Summary 877
Glossary 877
Problems 878
Case Problem: Portfolio Optimization with Transaction
Costs 889
Available in the Cengage eBook:
Appendix: Nonlinear Programming with R
Chapter 17 Decision Analysis 893
17.1 Problem Formulation 895
Payoff Tables 896
Decision Trees 896
17.2 Decision Analysis Without Probabilities 897
Optimistic Approach 897
Conservative Approach 898
Minimax Regret Approach 898
17.3 Decision Analysis with Probabilities 900
Expected Value Approach 900
Risk Analysis 902
Sensitivity Analysis 903
17.4 Decision Analysis with Sample Information 904
Expected Value of Sample Information 909
Expected Value of Perfect Information 909
17.5 Computing Branch Probabilities with Bayes’ Theorem 910
17.6 Utility Theory 913
Utility and Decision Analysis 914
Utility Functions 918
Exponential Utility Function 921
Summary 923
Glossary 923
Problems 925
Case Problem 1: Property Purchase Strategy 939
Case Problem 2: Semiconductor Fabrication at Axeon Labs 941
Multi-Chapter Case Problems
Capital State University Game-Day Magazines 943
Hanover Inc. 945
Appendix A Basics of Excel 947
Appendix B Database Basics with Microsoft Access 959
Appendix C Solutions to Even-Numbered Problems
(Cengage eBook)
Appendix D Microsoft Excel Online and Tools for Statistical Analysis
(Cengage eBook)
References Available in the Cengage eBook
Index 997