A TWO-STAGED APPROACH FOR ASSESSING FOR THE QUALITY OF INTERNET SURVEY DATA

Chun-Hung Cheng, Chon-Huat Goh, Anita Lee-Post

Abstract

In this work, we propose to develop a procedure to detect errors in data collected through Internet surveys. Although several approaches have been developed, they suffer many limitations. For instance, many approaches require prior knowledge of data and hence they need different procedures for different applications. Others have to test a large number of parameter values and hence they are not very efficient. To develop a procedure to overcome the limitations of existing approaches, we try to understand the nature of this quality problem and establish its linkage to travelling salesman problem (TSP). Based on the TSP problem structure, we propose to develop a two-staged approach based on a genetic algorithm to help ensure the quality of Internet survey data.

References

  1. Anderberg, M.R., 1993. Cluster analysis for applications, Academic Press, New York.
  2. Cheng, C.H., Goh, C.H., and Lee-Post, A., 2006. Data auditing by hierarchical clustering, Internatinal Journal of Applied Management & Technology, Vol.4, No.1, pp. 153-163.
  3. Felligi, I.P. and Holt, D., 1976. A systematic approach to automatic editing and imputation, Journal of American Statistics Assocication, Vol. 71, pp. 17-35.
  4. Freund, R.J. and Hartley, H.O., 1967. A procedure for automatic data editing, J Journal of American Statistics Assocication, Vol. 62, pp. 341-352.
  5. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Massachusetts: Addison Wesley.
  6. Hartigan, J.A., 1975. Clustering Algorithms, McGraw-Hill, New York.
  7. Haupt, R.L. and Haupt, S.E., 1998. Practical Genetic Algorithms. New York: John Wiley & Sons.
  8. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. Michigan: Michigan Press.
  9. Lee, R.C., Slagle, J.R., and Mong, C.T., 1978. Towards automatic auditing of records, IEEE Transactions on Software Engineering., Vol. SE-4, pp. 441-448.
  10. Lenstra, J.K. and Kan Rinnooy, A.H.G., 1975. Some Simple Applications of the Traveling Salesman Problem, Operations Research Quarterly, Vol. 26, pp. 717-733.
  11. Michalewicz, Z., 1999. Genetic Algorithms + Data Structures = Evolution Programs. Third, Revised and Extended Edition, Hong Kong: Springer.
  12. Naus, J.I., Johnson, T.G., and Montalvo, R., 1972. A probabilistic model for identifying errors and data editing, Journal of American Statistics Assocication, Vol. 67, pp. 943-950.
  13. Slagle, J.R., Chang, C.L., and Heller, S.R., 1975. A clustering and data-reorganizing algorithm, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-5, pp. 125-128.
  14. Stanfel, L.E., 1983. Applications of clustering to information system design, Information Processing & Management., Vol. 19, pp. 37-50.
  15. Starkweather, T., McDaniel, S., Mathias, K., Whitley, D., and Whitley, C., 1991. A Comparison of Genetic Sequencing Operators, Proceedings of the fourth International Conference on. Genetic Algorithms and their Applications, pp.69-76
  16. Storer, W.F. and Eastman, C.M., 1990. Some Experiments in the use of clustering for data validation, Information Systems., Vol. 15, pp. 537-542.
  17. Whitley, D., 1989. The Genitor Algorithm and Selection Pressure: Why Rank-based Allocation of Reproductive Trials Is Best, Proeedings of. the Third International Conference on Genetic Algorithms, pp.116-121.
Download


Paper Citation


in Harvard Style

Cheng C., Goh C. and Lee-Post A. (2007). A TWO-STAGED APPROACH FOR ASSESSING FOR THE QUALITY OF INTERNET SURVEY DATA . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 3: WEBIST, ISBN 978-972-8865-79-5, pages 77-83. DOI: 10.5220/0001264200770083


in Bibtex Style

@conference{webist07,
author={Chun-Hung Cheng and Chon-Huat Goh and Anita Lee-Post},
title={A TWO-STAGED APPROACH FOR ASSESSING FOR THE QUALITY OF INTERNET SURVEY DATA},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 3: WEBIST,},
year={2007},
pages={77-83},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001264200770083},
isbn={978-972-8865-79-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 3: WEBIST,
TI - A TWO-STAGED APPROACH FOR ASSESSING FOR THE QUALITY OF INTERNET SURVEY DATA
SN - 978-972-8865-79-5
AU - Cheng C.
AU - Goh C.
AU - Lee-Post A.
PY - 2007
SP - 77
EP - 83
DO - 10.5220/0001264200770083