MINING TIMED SEQUENCES WITH TOM4L FRAMEWORK

Nabil Benayadi, Marc Le Goc

2010

Abstract

We introduce the problem of mining sequential patterns in large database of sequences using a Stochastic Approach. An example of patterns we are interested in is : 50% of cases of engine stops in the car are happened between 0 and 2 minutes after observing a lack of the gas in the engine, produced between 0 and 1 minutes after the fuel tank is empty. We call this patterns “signatures”. Previous research have considered some equivalent patterns, but such work have three mains problems : (1) the sensibility of their algorithms with the value of their parameters, (2) too large number of discovered patterns, and (3) their discovered patterns consider only ”after“ relation (succession in time) and omit temporal constraints between elements in patterns. To address this issue, we present TOM4L process (Timed Observations Mining for Learning process) which uses a stochastic representation of a given set of sequences on which an inductive reasoning coupled with an abductive reasoning is applied to reduce the space search. The results obtained with an application on very complex real world system are also presented to show the operational character of the TOM4L process.

References

  1. Agrawal, R. and Psaila, G. (1995). Active data mining. In Fayyad, Usama, M. and Uthurusamy, R., editors, First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 3-8, Montreal, Quebec, Canada. AAAI Press, Menlo Park, CA, USA.
  2. Ayres, J., Flannick, J., Gehrke, J., and Yiu, T. (2002). Sequential pattern mining using a bitmap representation. KDD02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 429-435.
  3. Benayadi, N. and Le Goc, M. (2008). Using a measure of the crisscross of series of timed observations to discover timed knowledge. Proceedings of the 19th International Workshop on Principles of Diagnosis (DX'08).
  4. Blachman, N. M. (1968). The amount of information that y gives about x. IEEE Transcations on Information Theory IT, 14.
  5. Bouché, P. (2005). Une approche stochastique de modélisation de séquences d'événements discrets pour le diagnostic des systèmes dynamiques. Thèse, Faculté des Sciences et Techniques de Saint Jéroˆme.
  6. Cover, T. M. and Thomas, J. A. (August 12. 1991). Elements of Information Theory. Wiley-Interscience.
  7. Dousson, C. and Duong, T. V. (1999). Discovering chronicles with numerical time constraints from alarm logs for monitoring dynamic systems. In IJCAI : Proceedings of the 16th international joint conference on Artifical intelligence, pages 620-626.
  8. Han, J. and Kamber, M. (2006). Data Mining: Concepts and Techniques. Morgan Kaufmann.
  9. Le Goc, M. (2006). Notion d'observation pour le diagnostic des processus dynamiques: Application à Sachem et à la découverte de connaissances temporelles. Hdr, Faculté des Sciences et Techniques de Saint Jéroˆme.
  10. Le Goc, M., Bouché, P., and Giambiasi, N. (2006). Devs, a formalism to operationnalize chronicle models in the elp laboratory, usa. In DEVS'06, DEVS Integrative M&S Symposium, pages 143-150.
  11. Mannila, H. (2002). Local and global methods in data mining: Basic techniques and open problems. 29th International Colloquium on Automata, Languages and Programming.
  12. Mannila, H. and Toivonen, H. (1996). Discovering generalized episodes using minimal occurrences. In Knowledge Discovery and Data Mining, pages 146-151.
  13. Mannila, H., Toivonen, H., and Verkamo, A. I. (1995). Discovering frequent episodes in sequences. In Fayyad, U. M. and Uthurusamy, R., editors, Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada. AAAI Press.
  14. Roddick, F. and Spiliopoulou, M. (2002). A survey of temporal knowledge discovery paradigms and methods. IEEE Transactions on Knowledge and Data Engineering, 14(4):750-767.
  15. Shannon, C. E. (1949). Communication in the presence of noise. Institute of Radio Engineers, 37.
  16. Smyth, P. and Goodman, R. M. (1992). An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering 4, pages 301-316.
  17. Vilalta, R. and Ma, S. (2002). Predicting rare events in temporal domains. In ICDM02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM02), page 474. IEEE Computer Society.
  18. Weiss, G. M. and Hirsh, H. (1998). Learning to predict rare events in categorical time-series data. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA.
  19. Zaki, M. J. (2001). Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42:31-60.
Download


Paper Citation


in Harvard Style

Benayadi N. and Le Goc M. (2010). MINING TIMED SEQUENCES WITH TOM4L FRAMEWORK . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8425-05-8, pages 111-120. DOI: 10.5220/0002958401110120


in Bibtex Style

@conference{iceis10,
author={Nabil Benayadi and Marc Le Goc},
title={MINING TIMED SEQUENCES WITH TOM4L FRAMEWORK},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2010},
pages={111-120},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002958401110120},
isbn={978-989-8425-05-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - MINING TIMED SEQUENCES WITH TOM4L FRAMEWORK
SN - 978-989-8425-05-8
AU - Benayadi N.
AU - Le Goc M.
PY - 2010
SP - 111
EP - 120
DO - 10.5220/0002958401110120