Multivariate Statistics: Exercises and Solutions, Second Edition
By Wolfgang Karl Hardle and Zdenˇek Hlávka
Contents:
Part I Descriptive Techniques
1 Comparison of Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Part II Multivariate Random Variables
2 A Short Excursion into Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Moving to Higher Dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 Theory of the Multinormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6 Theory of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Part III Multivariate Techniques
8 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9 Variable Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
10 Decomposition of Data Matrices by Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
12 Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
13 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
14 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
15 Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
16 Canonical Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
17 Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
18 Conjoint Measurement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
19 Applications in Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
20 Highly Interactive, Computationally Intensive Techniques. . . . . . . . . . . . 319
A DataSets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
A.1 Athletic Records Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
A.2 Bank Notes Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
A.3 Bankruptcy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
A.4 Car Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
A.5 Car Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
A.6 Classic Blue Pullover Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
A.7 Fertilizer Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
A.8 French Baccalauréat Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
A.9 French Food Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
A.10 Geopol Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
A.11 German Annual Population Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
A.12 Journals Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
A.13 NYSE Returns Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
A.14 Plasma Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
A.15 Time Budget Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
A.16 Unemployment Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
A.17 U.S. Companies Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
A.18 U.S. Crime Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
A.19 U.S. Health Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
A.20 Vocabulary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
A.21 WAIS Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Preface to the Second Edition
I have always had an idea that I would have made a highly efficient criminal. This is the
chance of my lifetime in that direction. See here! This is a first-class, up-to-date burgling kit,
with nickel-plated Jimmy, diamond-tipped glass-cutter, adaptable keys, and every modern
improvement which the march of civilization demands.
Sherlock Holmes in “The Adventure of Charles Augustus Milverton”
The statistical science has seen new paradigms and more complex and richer
data sets. These include data on human genomics, social networks, huge climate
and weather data, and, of course, high frequency financial and economic data.
The statistical community has reacted to these challenges by developing modern
mathematical tools and by advancing computational techniques, e.g., through
fresher Quantlets and better hardware and software platforms.As a consequence, the
book Härdle, W. and Simar, L. (2015) Applied Multivariate Statistical Analysis, 4th
Springer Verlag had to be adjusted and partly beefed up with more easy access
tools and figures. An extra chapter on regression models with variable selection was
introduced and dimension reduction methods were discussed.
These new elements had to be reflected in the exercises and solutions book
as well. We have now all figures completely redesigned in the freely available
software R (R Core Team, 2013) that implements the classical statistical interactive
language S (Becker, Chambers, & Wilks, 1988; Chambers & Hastie, 1992). The R
codes for the classical multivariate analysis in Chaps. 11–17 are mostly based on
library MASS (Venables & Ripley, 2002). Throughout the book, some examples
are implemented directly in the R programming language but we have also used
functions from R libraries aplpack (Wolf, 2012), ca (Nenadic & Greenacre, 2007),
car (Fox & Weisberg, 2011), depth (Genest, Masse, & Plante, 2012), dr (Weisberg,
2002), glmnet (Friedman, Hastie, & Tibshirani, 2010), hexbin (Carr, Lewin-Koh,
& Maechler, 2011), kernlab (Karatzoglou, Smola, Hornik, & Zeileis, 2004), KernSmooth
(Wand, 2012), lasso2 (Lokhorst, Venables, Turlach, & Maechler, 2013),
locpol (Cabrera, 2012), MASS (Venables & Ripley, 2002), mvpart (Therneau,
Atkinson, Ripley, Oksanen, & Deáth, 2012), quadprog (Turlach & Weingessel,
2011), scatterplot3d (Ligges & Mächler, 2003), stats (R Core Team, 2013), tseries
(Trapletti & Hornik, 2012), and zoo (Zeileis & Grothendieck, 2005). All data sets
and computer codes (quantlets) in R and MATLAB may be downloaded via the
quantlet download center: www.quantlet.org. or the Springer web page. For
interactive display of low-dimensional projections of a multivariate data set, we
recommend GGobi (Swayne, Lang, Buja, & Cook, 2003; Lang, Swayne,Wickham,
& Lawrence, 2012).
As the number of available R libraries and functions steadily increases, one
should always consult the multivariate task view at http://www.r-project.org before
starting any new analysis. As before, analogues of all quantlets in the MATLAB
language are also available at the quantlet download center.
The set of exercises was extended and all quantlets have been revised and optimized.
Such a project would not be possible without numerous help of colleagues
and students. We also gratefully acknowledge the support of our cooperation via
the Erasmus program and through the Faculty of Mathematics and Physics at
Charles University in Prague and C.A.S.E.—the Centre for Applied Statistics and
Economics at Humboldt-Universität zu Berlin.
We thank the following students who contributed some of the R codes used in the
second edition: Alena Babiaková, Dana Chromíková, Petra ˇCernayová, Tomáš Hovorka,
Kristýna Ivanková, Monika Jakubcová, Lucia Jarešová, Barbora Lebdušková,
Tomáš Marada, Michaela Maršálková, Jaroslav Pazdera, Jakub Peˇcánka, Jakub
Petrásek, Radka Picková, Kristýna Sionová, Ondˇrej Šedivý, and Ivana Žohová. We
thank Awdesch Melzer who carefully reviewed all R codes and pointed out several
errors that escaped our attention in the first edition of this book.
We also acknowledge support of the Deutsche Forschungsgemeinschaft through
CRC 649 “Economic Risk” and IRTG 1792 “High Dimensional Non Stationary
Time Series Analysis”.
Berlin, Germany Wolfgang K. Härdle
Prague, Czech Republic Zdenˇek Hlávka
May 2015