Using Nonlinear Models to Enhance Prediction Performance with Incomplete Data

Faraj A. A. Bashir, Hua-Liang Wei

Abstract

A great deal of recent methodological research on missing data analysis has focused on model parameter estimation using modern statistical methods such as maximum likelihood and multiple imputation. These approaches are better than traditional methods (for example listwise deletion and mean imputation methods). These modern techniques can lead to unbiased parametric estimation in many particular application cases. However, these methods do not work well in some cases especially for nonlinear systems that have highly nonlinear behaviour. This paper explains the linear parametric estimation in existence of missing data, which includes an overview of biased and unbiased linear parametric estimation with missing data, and provides accessible descriptions of expectation maximization (EM) algorithm and Gauss-Newton method. In particular, this paper proposes a Gauss-Newton iteration method for nonlinear parametric estimation in case of missing data. Since Gauss-Newton method needs initial values that are hard to obtain in the presence of missing data, the EM algorithm is thus used to estimate these initial values. In addition, we present two analysis examples to illustrate the performance of the proposed methods.

References

  1. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., and Herrera, F. (2010). Keel datamining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17:255-287.
  2. Allison, P. D. (2002). Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology, 55(1):193-196.
  3. Azar, B. (2002). Finding a solution for missing data. Monitor on Psychology, 33(7):70-1.
  4. Baraldi, A. N. and Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48(1):5-37.
  5. Box, G. E. and Tidwell, P. W. (1962). Transformation of the independent variables. Technometrics, 4(4):531-550.
  6. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1-38.
  7. Draper, N. R. and Smith, H. (1981). Applied regression analysis 2nd ed.
  8. Enders, C. K. (2006). A primer on the use of modern missing-data methods in psychosomatic medicine research. Psychosomatic medicine, 68(3):427-436.
  9. Enders, C. K. (2010). Applied missing data analysis. Guilford Press.
  10. Enders, C. K. and Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8(3):430-457.
  11. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual review of psychology, 60:549-576.
  12. Little, R. J. and Rubin, D. B. (2002). Statistical analysis with missing data.
  13. Luengo, J., García, S., and Herrera, F. (2012). On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and information systems, 32(1):77- 108.
  14. Montez-Rath, M. E., Winkelmayer, W. C., and Desai, M. (2014). Addressing missing data in clinical studies of kidney diseases. Clinical Journal of the American Society of Nephrology, pages CJN-10141013.
  15. Montgomery, D., Peck, E., and Vining, G. (2006). Diagnostics for leverage and influence: Introduction to linear regression analysis . hoboken.
  16. Ng, S. K., Krishnan, T., and McLachlan, G. J. (2012). The em algorithm. In Handbook of computational statistics, pages 139-172. Springer.
  17. Royston, P. and Sauerbrei, W. (2008). Multivariable modelbuilding: a pragmatic approach to regression anaylsis based on fractional polynomials for modelling continuous variables, volume 777. John Wiley & Sons.
  18. Schafer, J. L. (2010). Analysis of incomplete multivariate data. CRC press.
  19. Schafer, J. L. and Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological methods, 7(2):147.
  20. Schlomer, G. L., Bauman, S., and Card, N. A. (2010). Best practices for missing data management in counseling psychology. Journal of Counseling Psychology, 57(1):1.
  21. Seaman, S. R. and White, I. R. (2013). Review of inverse probability weighting for dealing with missing data. Statistical Methods in Medical Research, 22(3):278- 295.
  22. Smyth, G. K. (2002). Nonlinear regression. Encyclopedia of environmetrics.
Download


Paper Citation


in Harvard Style

A. A. Bashir F. and Wei H. (2015). Using Nonlinear Models to Enhance Prediction Performance with Incomplete Data . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 141-148. DOI: 10.5220/0005157201410148


in Bibtex Style

@conference{icpram15,
author={Faraj A. A. Bashir and Hua-Liang Wei},
title={Using Nonlinear Models to Enhance Prediction Performance with Incomplete Data},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={141-148},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005157201410148},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Using Nonlinear Models to Enhance Prediction Performance with Incomplete Data
SN - 978-989-758-076-5
AU - A. A. Bashir F.
AU - Wei H.
PY - 2015
SP - 141
EP - 148
DO - 10.5220/0005157201410148