5 CONCLUSION
In this paper, we presented a method to build predic-
tive model trees from data streams with incomplete
observations. The approach aims at adjusting the
missing values estimation in order to help the model
tree construction.
The method has been developed in a JAVA proto-
type, and its effectiveness was demonstrated and dis-
cussed on various data streams.
In future works, we will apply our method on
large real-world data streams related to e-commerce
and live sensors management. Moreover, we have in
view to improve the estimation method by using other
heuristics such as genetic algorithms.
ACKNOWLEDGEMENTS
The project is supported by a grant from the Min-
istry of Economy and External Trade, Grand-Duchy
of Luxembourg, under the RDI Law.
Moreover, this work has been realized in partner-
ship with the infinAIt Solutions S.A. company (
3
),
so we would like to thank Gero Vierke and Helmut
Rieder for their help.
REFERENCES
Bache, K. and Lichman, M. (2013). UCI M.L. repository.
Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B.
(2010). Moa: Massive online analysis. The Journal of
Machine Learning Research, 11:1601–1604.
Breslow, L. A. and Aha, D. W. (1997). Simplifying decision
trees: A survey. Knowl. Eng. Rev., 12(1):1–40.
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., and Reis,
J. (2009). Modeling wine preferences by data mining
from physicochemical properties. Decision Support
Systems, 47(4):547 – 553. Smart Business Networks:
Concepts and Empirical Evidence.
Domingos, P. and Hulten, G. (2000). Mining high-speed
data streams. In Proceedings of the sixth ACM
SIGKDD international conference on Knowledge dis-
covery and data mining, pages 71–80. ACM.
Enders, C. K. (2010). Applied missing data analysis. Guil-
ford Publications.
Farhangfar, A., Kurgan, L., and Dy, J. (2008). Impact of
imputation of missing values on classification error for
discrete data. Pattern Recognit., 41(12):3692–3705.
Fong, S. and Yang, H. (2011). The six technical gaps be-
tween intelligent applications and real-time data min-
ing: A critical review. Journal of Emerging Technolo-
gies in Web Intelligence, 3(2).
3
http://infinait.eu
Ikonomovska, E. and Gama, J. (2008). Learning model
trees from data streams. In Discovery Science, pages
52–63. Springer.
Ikonomovska, E., Gama, J., Sebasti
˜
ao, R., and Gjorgjevik,
D. (2009). Regression trees from data streams with
drift detection. In Discovery Science, pages 121–135.
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J.,
and Kolehmainen, M. (2004). Methods for imputation
of missing values in air quality data sets. Atmospheric
Environment, 38(18):2895 – 2907.
Kotsiantis, S. (2013). Decision trees: a recent overview.
Artificial Intelligence Review, 39(4):261–283.
Marwala, T. and Global, I. (2009). Computational intel-
ligence for missing data imputation, estimation and
management: knowledge optimization techniques. In-
formation Science Reference Herhsey, USA.
Murthy, S. K. (1998). Automatic construction of decision
trees from data: A multi-disciplinary survey. Data
Min. Knowl. Discov., 2(4):345–389.
Mwale, F., Adeloye, A., and Rustum, R. (2012). Infilling of
missing rainfall and streamflow data in the Shire River
basin, Malawi–a SOM approach. Phys. and Chem. of
the Earth, 50:34–43.
Quinlan, J. R. (1992). Learning with continuous classes.
In 5th Australian joint Conference on Artificial Intel-
ligence, volume 92, pages 343–348. Singapore.
Shmueli, G. and Koppius, O. R. (2011). Predictive analyt-
ics in information systems research. Mis Quarterly,
35(3):553–572.
Stiglic, G., Kocbek, S., Pernek, I., and Kokol, P. (2012).
Comprehensive decision tree models in bioinformat-
ics. PLoS ONE, 7(3):e33812.
Tfwala, S. S., Wang, Y.-M., and Lin, Y.-C. (2013). Predic-
tion of missing flow records using multilayer percep-
tron and coactive neurofuzzy inference system. The
Sc. World Journal, 2013.
Van Buuren, S. (2012). Flexible imputation of missing data.
CRC press.
Wang, Y. and Witten, I. H. (1996). Induction of model trees
for predicting continuous classes.
Zhu, X. and Wu, X. (2004). Class noise vs. attribute noise:
A quantitative study. A. I. Review, 22(3):177–210.
Zhu, X., Zhang, P., Wu, X., He, D., Zhang, C., and Shi, Y.
(2008). Cleansing noisy data streams. In ICDM 08,
pages 1139–1144. IEEE.
DATA2015-4thInternationalConferenceonDataManagementTechnologiesandApplications
96