Univariate, Bivariate and Multivariate Statistics Using R by Daniel J. Denis

Univariate, Bivariate and Multivariate Statistics Using R: Quantitative Tools for Data Analysis and Data Science

By Daniel J. Denis

Contents:

Preface xiii

1 Introduction to Applied Statistics 1

1.1 The Nature of Statistics and Inference 2

1.2 A Motivating Example 3

1.3 What About “Big Data”? 4

1.4 Approach to Learning R 7

1.5 Statistical Modeling in a Nutshell 7

1.6 Statistical Significance Testing and Error Rates 10

1.7 Simple Example of Inference Using a Coin 11

1.8 Statistics Is for Messy Situations 13

1.9 Type I versus Type II Errors 14

1.10 Point Estimates and Confidence Intervals 15

1.11 So What Can We Conclude from One Confidence Interval? 18

1.12 Variable Types 19

1.13 Sample Size, Statistical Power, and Statistical Significance 22

1.14 How “p < 0.05” Happens 23

1.15 Effect Size 25

1.16 The Verdict on Significance Testing 26

1.17 Training versus Test Data 27

1.18 How to Get the Most Out of This Book 28

Exercises 29

2 Introduction to R and Computational Statistics 31

2.1 How to Install R on Your Computer 34

2.2 How to Do Basic Mathematics with R 35

2.2.1 Combinations and Permutations 38

2.2.2 Plotting Curves Using curve() 39

2.3 Vectors and Matrices in R 41

2.4 Matrices in R 44

2.4.1 The Inverse of a Matrix 47

2.4.2 Eigenvalues and Eigenvectors 49

2.5 How to Get Data into R 52

2.6 Merging Data Frames 55

2.7 How to Install a Package in R, and How to Use It 55

2.8 How to View the Top, Bottom, and “Some” of a Data File 58

2.9 How to Select Subsets from a Dataframe 60

2.10 How R Deals with Missing Data 62

2.11 Using ls( ) to See Objects in the Workspace 63

2.12 Writing Your Own Functions 65

2.13 Writing Scripts 65

2.14 How to Create Factors in R 66

2.15 Using the table() Function 67

2.16 Requesting a Demonstration Using the example() Function 68

2.17 Citing R in Publications 69

Exercises 69

3 Exploring Data with R: Essential Graphics and Visualization 71

3.1 Statistics, R, and Visualization 71

3.2 R’s plot() Function 73

3.3 Scatterplots and Depicting Data in Two or More Dimensions 77

3.4 Communicating Density in a Plot 79

3.5 Stem-and-Leaf Plots 85

3.6 Assessing Normality 87

3.7 Box-and-Whisker Plots 89

3.8 Violin Plots 95

3.9 Pie Graphs and Charts 97

3.10 Plotting Tables 98

Exercises 99

4 Means, Correlations, Counts: Drawing Inferences Using Easy-to-Implement

Statistical Tests 101

4.1 Computing z and Related Scores in R 101

4.2 Plotting Normal Distributions 105

4.3 Correlation Coefficients in R 106

4.4 Evaluating Pearson’s r for Statistical Significance 110

4.5 Spearman’s Rho: A Nonparametric Alternative to Pearson 111

4.6 Alternative Correlation Coefficients in R 113

4.7 Tests of Mean Differences 114

4.7.1 t-Tests for One Sample 114

4.7.2 Two-Sample t-Test 115

4.7.3 Was the Welch Test Necessary? 117

4.7.4 t-Test via Linear Model Set-up 118

4.7.5 Paired-Samples t-Test 118

4.8 Categorical Data 120

4.8.1 Binomial Test 120

4.8.2 Categorical Data Having More Than Two Possibilities 123

4.9 Radar Charts 126

4.10 Cohen’s Kappa 127

Exercises 129

5 Power Analysis and Sample Size Estimation Using R 131

5.1 What Is Statistical Power? 131

5.2 Does That Mean Power and Huge Sample Sizes Are

“Bad?” 133

5.3 Should I Be Estimating Power or Sample Size? 134

5.4 How Do I Know What the Effect Size Should Be? 135

5.4.1 Ways of Setting Effect Size in Power Analyses 135

5.5 Power for t-Tests 136

5.5.1 Example: Treatment versus Control Experiment 137

5.5.2 Extremely Small Effect Size 138

5.6 Estimating Power for a Given Sample Size 140

5.7 Power for Other Designs – The Principles Are the Same 140

5.7.1 Power for One-Way ANOVA 141

5.7.2 Converting R2 to f 143

5.8 Power for Correlations 143

5.9 Concluding Thoughts on Power 145

Exercises 146

6 Analysis of Variance: Fixed Effects, Random Effects, Mixed Models, and

Repeated Measures 147

6.1 Revisiting t-Tests 147

6.2 Introducing the Analysis of Variance (ANOVA) 149

6.2.1 Achievement as a Function of Teacher 149

6.3 Evaluating Assumptions 152

6.3.1 Inferential Tests for Normality 153

6.3.2 Evaluating Homogeneity of Variances 154

6.4 Performing the ANOVA Using aov() 156

6.4.1 The Analysis of Variance Summary Table 157

6.4.2 Obtaining Treatment Effects 158

6.4.3 Plotting Results of the ANOVA 159

6.4.4 Post Hoc Tests on the Teacher Factor 159

6.5 Alternative Way of Getting ANOVA Results via lm() 161

6.5.1 Contrasts in lm() versus Tukey’s HSD 163

6.6 Factorial Analysis of Variance 163

6.6.1 Why Not Do Two One-Way ANOVAs? 163

6.7 Example of Factorial ANOVA 166

6.7.1 Graphing Main Effects and Interaction in the Same Plot 171

6.8 Should Main Effects Be Interpreted in the Presence of Interaction? 172

6.9 Simple Main Effects 173

6.10 Random Effects ANOVA and Mixed Models 175

6.10.1 A Rationale for Random Factors 176

6.10.2 One-Way Random Effects ANOVA in R 177

6.11 Mixed Models 180

6.12 Repeated-Measures Models 181

Exercises 186

7 Simple and Multiple Linear Regression 189

7.1 Simple Linear Regression 190

7.2 Ordinary Least-Squares Regression 192

7.3 Adjusted R2 198

7.4 Multiple Regression Analysis 199

7.5 Verifying Model Assumptions 202

7.6 Collinearity Among Predictors and the Variance Inflation Factor 206

7.7 Model-Building and Selection Algorithms 209

7.7.1 Simultaneous Inference 209

7.7.2 Hierarchical Regression 210

7.7.2.1 Example of Hierarchical Regression 211

7.8 Statistical Mediation 214

7.9 Best Subset and Forward Regression 217

7.9.1 How Forward Regression Works 218

7.10 Stepwise Selection 219

7.11 The Controversy Surrounding Selection Methods 221

Exercises 223

8 Logistic Regression and the Generalized Linear Model 225

8.1 The “Why” Behind Logistic Regression 225

8.2 Example of Logistic Regression in R 229

8.3 Introducing the Logit: The Log of the Odds 232

8.4 The Natural Log of the Odds 233

8.5 From Logits Back to Odds 235

8.6 Full Example of Logistic Regression 236

8.6.1 Challenger O-ring Data 236

8.7 Logistic Regression on Challenger Data 240

8.8 Analysis of Deviance Table 241

8.9 Predicting Probabilities 242

8.10 Assumptions of Logistic Regression 243

8.11 Multiple Logistic Regression 244

8.12 Training Error Rate Versus Test Error Rate 247

Exercises 248

9 Multivariate Analysis of Variance (MANOVA) and Discriminant

Analysis 251

9.1 Why Conduct MANOVA? 252

9.2 Multivariate Tests of Significance 254

9.3 Example of MANOVA in R 257

9.4 Effect Size for MANOVA 259

9.5 Evaluating Assumptions in MANOVA 261

9.6 Outliers 262

9.7 Homogeneity of Covariance Matrices 263

9.7.1 What if the Box-M Test Had Suggested a Violation? 264

9.8 Linear Discriminant Function Analysis 265

9.9 Theory of Discriminant Analysis 266

9.10 Discriminant Analysis in R 267

9.11 Computing Discriminant Scores Manually 270

9.12 Predicting Group Membership 271

9.13 How Well Did the Discriminant Function Analysis Do? 272

9.14 Visualizing Separation 275

9.15 Quadratic Discriminant Analysis 276

9.16 Regularized Discriminant Analysis 278

Exercises 278

10 Principal Component Analysis 281

10.1 Principal Component Analysis Versus Factor Analysis 282

10.2 A Very Simple Example of PCA 283

10.2.1 Pearson’s 1901 Data 284

10.2.2 Assumptions of PCA 286

10.2.3 Running the PCA 288

10.2.4 Loadings in PCA 290

10.3 What Are the Loadings in PCA? 292

10.4 Properties of Principal Components 293

10.5 Component Scores 294

10.6 How Many Components to Keep? 295

10.6.1 The Scree Plot as an Aid to Component Retention 295

10.7 Principal Components of USA Arrests Data 297

10.8 Unstandardized Versus Standardized Solutions 301

Exercises 304

11 Exploratory Factor Analysis 307

11.1 Common Factor Analysis Model 308

11.2 A Technical and Philosophical Pitfall of EFA 310

11.3 Factor Analysis Versus Principal Component Analysis on the Same

Data 311

11.3.1 Demonstrating the Non-Uniqueness Issue 311

11.4 The Issue of Factor Retention 314

11.5 Initial Eigenvalues in Factor Analysis 315

11.6 Rotation in Exploratory Factor Analysis 316

11.7 Estimation in Factor Analysis 318

11.8 Example of Factor Analysis on the Holzinger and Swineford Data 318

11.8.1 Obtaining Initial Eigenvalues 323

11.8.2 Making Sense of the Factor Solution 324

Exercises 325

12 Cluster Analysis 327

12.1 A Simple Example of Cluster Analysis 329

12.2 The Concepts of Proximity and Distance in Cluster Analysis 332

12.3 k-Means Cluster Analysis 332

12.4 Minimizing Criteria 333

12.5 Example of k-Means Clustering in R 334

12.5.1 Plotting the Data 335

12.6 Hierarchical Cluster Analysis 339

12.7 Why Clustering Is Inherently Subjective 343

Exercises 344

13 Nonparametric Tests 347

13.1 Mann–Whitney U Test 348

13.2 Kruskal–Wallis Test 349

13.3 Nonparametric Test for Paired Comparisons and Repeated

Measures 351

13.3.1 Wilcoxon Signed-Rank Test and Friedman Test 351

13.4 Sign Test 354

Exercises 356

References 359

Index 363

Univariate, Bivariate and Multivariate Statistics Using R: Quantitative Tools for Data Analysis and Data Science PDF by Daniel J. Denis

Univariate, Bivariate and Multivariate Statistics Using R: Quantitative Tools for Data Analysis and Data Science

Related

Leave a Comment Cancel reply