Parametric vs. Nonparametric methods
Leo Breiman (Breiman 2001a): Statistical Modeling: The Two Cultures:
For instance, in the Journal of the American Statistical Association (JASA), virtually every article contains a statement of the form: Assume that the data are generated by the following model:… I am deeply troubled by the current and past use of data models in applications, where quantitative conclusions are drawn and perhaps policy decisions made.
… assume the data is generated by independent draws from the model*
\[ y=b_{0}+\sum_{1}^{M} b_{m} x_{m}+\varepsilon \]
where the coefficients are to be estimated. The error term is N(0, \(\sigma^2\)) and \(\sigma^2\) is to be estimated. Given that the data is generated this way, elegant tests of hypotheses,confidence intervals,distributions of the residual sum-of-squares and asymptotics can be derived. This made the model attractive in terms of the mathematics involved. This theory was used both by academics statisticians and others to derive significance levels for coefficients on the basis of model (R), with little consideration as to whether the data on hand could have been generated by a linear model. Hundreds, perhaps thousands of articles were published claiming proof of something or other because the coefficient was significant at the 5% level…
…With the insistence on data models, multivariate analysis tools in statistics are frozen at discriminant analysis and logistic regression in classification and multiple linear regression in regression. Nobody really believes that multivariate data is multivariate normal, but that data model occupies a large number of pages in every graduate text book on multivariate statistical analysis…
According to Breiman, there are two “cultures”:
The Data Modeling Culture : One assumes that the data are generated by a given stochastic data model (econometrics) …
Algorithmic Modeling Culture: One uses algorithmic models and treats the data mechanism as unknown (machine learning) …
He argues that the focus in the statistical community on data models has:
- Led to irrelevant theory and questionable scientific conclusions;
- Kept statisticians from using more suitable algorithmic models;
- Prevented statisticians from working on exciting new problems.
In parametric econometrics we assume that the data come from a generating process that takes the following form:
\[ y=X \beta+\varepsilon \]
Model (\(X\)’s) are determined by the researcher and probability theory is a foundation of econometrics
In Machine learning we do not make any assumption on how the data have been generated:
\[ y \approx m(X) \]
Model (\(X\)’s) is not selected by the researcher and probability theory is not required
Nonparametric econometrics makes the link between the two: Machine Learning: an extension of nonparametric econometrics
To see the difference between two “cultures”, we start with parametric modeling in classification problems.