Data Mining for Automatic Linguistic Description of Data - Textual Weather Prediction as a Classification Problem

J. Janeiro; I. Rodriguez-Fdez; A. Ramos-Soto; A. Bugarín

doi:10.5220/0005282905560562

Data Mining for Automatic Linguistic Description of Data - Textual Weather Prediction as a Classification Problem

J. Janeiro, I. Rodriguez-Fdez, A. Ramos-Soto, A. Bugarín

2015

Abstract

In this paper we present the results and performance of five different classifiers applied to the task of automatically generating textual weather forecasts from raw meteorological data. The type of forecasts this methodology can be applied to are template-based ones, which can be transformed into an intermediate language that can directly mapped to classes (or values of variables). Experimental validation and tests of statistical significance were conducted using nine datasets from three real meteorological publicly accessible websites, showing that RandomForest, IBk and PART are statistically the best classifiers for this task in terms of F-Score, with RandomForest providing slightly better results.

References

Adeyanju, I. (2012). Generating weather forecast texts with case based reasoning. International Journal of Computer Applications, 45.
AEMET (2014). Spanish meteorological agency website. http://www.aemet.es/, Retrieved: 2014-10-08.
Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1):37-66.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
Coch, J. (1998). Multimeteo: multilingual production of weather forecasts. ELRA Newsletter, 3(2).
Finner, H. (1993). On a monotonicity problem in step-down multiple test procedures. Journal of the American Statistical Association, 88(423):920-923.
Frank, E. and Witten, I. H. (1998). Generating accurate rule sets without global optimization. In Shavlik, J., editor, Fifteenth International Conference on Machine Learning, pages 144-151. Morgan Kaufmann.
Goldberg, E., Driedger, N., and Kittredge, R. I. (1994). Using natural-language processing to produce weather forecasts. IEEE Expert, 9(2):45-53.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1):10-18.
Iman, R. L. and Davenport, J. M. (1980). Approximations of the critical region of the fbietkan statistic. Communications in Statistics-Theory and Methods, 9(6):571- 595.
Intellicast (2014). Intellicast website. www.intellicast.com/, Retrieved: 2014-10-08.
MetOffice (2014a). British meteorological office data2text website. http://www.metoffice.gov.uk/public/weather/ forecast-data2text, Retrieved: 2014-10-08.
MetOffice (2014b). British meteorological office website. http://www.metoffice.gov.uk/, Retrieved: 2014-10-08.
NWF (2014). National weather forecast website. http:// www.weather.gov/, Retrieved: 2014-10-08.
Quinlan, J. R. (1993). C4. 5: programs for machine learning, volume 1. Morgan kaufmann.
Ramos Soto, A., Bugarin, A., Barro, S., and Taboada, J. (2014). Linguistic descriptions for automatic generation of textual short-term weather forecasts on real prediction data. IEEE Transactions on Fuzzy Systems, Early Access.
Reiter, E., Dale, R., and Feng, Z. (2000). Building natural language generation systems, volume 33. MIT Press.
Sokolova, M., Japkowicz, N., and Szpakowicz, S. (2006). Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In AI 2006: Advances in Artificial Intelligence, pages 1015- 1021. Springer.
STAC (2014). Stac: Web platform for algorithms comparison through statistical tests. http://tec.citius.usc.es/ stac/, Retrieved: 2014-10-08.
Van Deemter, K., Krahmer, E., and Theune, M. (2005). Real versus template-based natural language generation: A false opposition? Computational Linguistics, 31(1):15-24.
WeatherForecast (2014). Weather-forecast website. http:// www.weather-forecast.com/, Retrieved: 2014-10-08.

Download

Paper Citation

in Harvard Style

Janeiro J., Rodriguez-Fdez I., Ramos-Soto A. and Bugarín A. (2015). Data Mining for Automatic Linguistic Description of Data - Textual Weather Prediction as a Classification Problem . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-074-1, pages 556-562. DOI: 10.5220/0005282905560562

in Bibtex Style

@conference{icaart15,
author={J. Janeiro and I. Rodriguez-Fdez and A. Ramos-Soto and A. Bugarín},
title={Data Mining for Automatic Linguistic Description of Data - Textual Weather Prediction as a Classification Problem},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2015},
pages={556-562},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005282905560562},
isbn={978-989-758-074-1},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Data Mining for Automatic Linguistic Description of Data - Textual Weather Prediction as a Classification Problem
SN - 978-989-758-074-1
AU - Janeiro J.
AU - Rodriguez-Fdez I.
AU - Ramos-Soto A.
AU - Bugarín A.
PY - 2015
SP - 556
EP - 562
DO - 10.5220/0005282905560562