Methods Overview
Introduction
 
History
 
Dictionary
 
References and Links
 


History

Multivariate Data Analysis

It's not surprising that interest in multivariate data analysis has grown rapidly since the advent of modern computers because of the intensive calculations involved. However, the first steps were taken back in the 1870s with the paper on Singular Value Decomposition by Eugenio Beltrami1 and in the work on linear regression by Sir Francis Galton (picture).

Francis Galton (1822–1911)
Galton also attempted multiple regression but the work was never completed.
As his collaborator Karl Pearson wrote after Galton's death: “The somewhat complicated mathematics of multiple correlation, with its repeated appeals to the geometrical hyperspace, remained a closed chamber to him.2
Karl Pearson, himself known for considerable contributions to statistic analysis, is sometimes considered being the originator of Principal Component Analysis, with his paper from 19013 (see drawings below).


Illustration of PCA concept from 1901 by Pearson. Karl Pearson (1857–1936).

Many important findings followed over the years, of which the Hotelling Transform in 1933 by Harold Hotelling4 was one. In 1966, economist and statistician Herman Wold presented the Nonlinear Iterative Partial Least Squares algorithm, or NIPALS5. The concept wasn't entirely new, since similar findings had been published already in 1923 by Fischer et al6. But the originality in H. Wold’s work was the partial least squares interpretation and the ability to handle missing values. During the years that followed, H. Wold developed the NIPALS algorithm into the Partial Least Squares regression method (PLS) that remains a core part of the Umetrics software today.

Herman Wold
Herman Wold's son and one of the co-founders of Umetrics (present R&D Manager), Svante Wold, simplified the PLS algorithm and added to the diagnostic interpretation with various co-workers in the 1980s and since. It is this later work that has turned the methods into the general scientific data analysis tools that they are today.

Svante Wold

Design of Experiments


One of the very first scientific papers to relate to the topic of design of experiments was published in the statistical journal Biometrika in 19177. The author, who used the famous pseudonym "Student", was also responsible for developing “Student's t-test” in 1908. His true identity was Willian Sealey Gosset (picture).



William Gosset (1876–1937).



In the 1920s, Sir Ronald Aylmer Fisher developed the methods into a coherent philosophy for experimentation and he is now widely regarded as the originator of the approach. He made a huge contribution to statistics in general, and to design of experiments in particular, from his post at the Rothamsted Experimental Station in Harpenden, UK. His book "The Design of Experiments" from 1935 is still widely referenced today8.


Ronald Fisher (1890–1962),
by kind permission from JOC/EFR©

Classical full and fractional factorial designs were already in use at the beginning of the 20th century, while more complex designs, such as D-optimal, came with the arrival of modern computers in the 1970s. It was also around that time that more advanced regression analysis and optimization techniques were developed.

1 Beltrami, E., Sulle funzioni bilineari, Giornale di Mathematiche, 11:98–106 (1873).
2 Pearson, K., The Life, Letters and Labors of Francis Galton, Cambridge University Press, p. 21 (1930).
3 Pearson, K., Principal Components Analysis, The London, Edinburgh and Dublin Philosophical Magazine and Journal, p. 566, Volume 6, Issue 2 (1901).
4 Hotelling, H., Analysis of a Complex of Statistical Variables with Principal Components, Journal of Educational Psychology, 24:498-520 (1933).
5 Wold, H., Estimation of principal components and related models by iterative least squares, Multivariate Analysis (Ed., Krishnaiah, P. R.), Academic Press, NY, pp. 391-420 (1966).
6 Fisher, R., and MacKensie, W., Studies in crop variation. II. The manurial response of different potato varieties, Journal of Agricultural Science, 13, 311–320 (1923).
7 "Student", Tables for estimating the probability that the mean of a unique sample of observations lie between any given distance of the mean of the population from which the sample is drawn, Biometrika, 11, pp. 414–417 (1917).
8 Fisher, R. A., The design of experiments, Oliver and Boyd, Edinburgh (1935).
All images have been released into the public domain since their copyright has expired. Any exceptions is stated in the text. This applies worldwide.