Finite Mixture Models

 

Finite mixture models assume that the outcome of interest is a mixture of two or more distributions. One assumption of mixture models is that we cannot observe a priori to which distribution an observation belongs. For example, suppose that you sample men and women and measure their height. You'll probably observe a bimodal distribution or something close to a normal distribution with an extra hump (height tends to distribute normal and men are on average taller than women).

Now, suppose that you forgot to record the sex variable. You could still estimate a model assuming a mixture of two normals that tells you how a covariate affects height for each gender and also the probability that an observation is male or female. You could also calculate a posterior probability (given the estimated parameters and the observed outcome) of belonging to a particular gender (class or component, in mixture model terminology). Thus, finite mixture models can also be used to classify observations into classes (when used in this way finite mixture models are often called "latent class" models). A popular example of finite mixture models is the zero-inflated poisson model (ZIP). The ZIP model assumes that the observed data comes from a mixture of two distributions: a degenerate distribution with mass at zero and a Poisson distribution.

See  these slides for an example using NHANES 09-10 data. With a simple mixture of two normals and only one variable as predictor, 90 percent of observations are correctly classified by sex.

Zero-inflated Censored Normal(s) Mixtures: I used a mixture of two Tobit normals and a degenerate distribution with mass at zero to predict the EQ-5D preference score from the SF-12. See the paper and appendix for details:

Perraillon M, Ya-Chen Tina Shih, Ronald A. Thisted. "Predicting the EQ-5D Preference Index from the SF-12 Health Survey: A finite Mixture Approach.", Medical Decision Making, October 2015 vol. 35 no. 7 888-901.

To install the package in Stata, type: ssc install zicen.

  • You can also download simulated Stata data to try zicen. You can also download the sample dataset by typing: net get zicen after installing the command zicen.