MINING THE RELATIONSHIPS IN THE FORM OF THE PREDISPOSING FACTORS AND CO-INCIDENT FACTORS AMONG NUMERICAL DYNAMIC ATTRIBUTES IN TIME SERIES DATA SET BY USING THE COMBINATION OF SOME EXISTING TECHNIQUES

Suwimon Kooptiwoot, M. Abdus Salam

2004

Abstract

Temporal mining is a natural extension of data mining with added capabilities of discovering interesting patterns, inferring relationships of contextual and temporal proximity and may also lead to possible cause-effect associations. Temporal mining covers a wide range of paradigms for knowledge modeling and discovery. A common practice is to discover frequent sequences and patterns of a single variable. In this paper we present a new algorithm which is the combination of many existing ideas consists of the reference event as proposed in (Bettini, Wang et al. 1998), the event detection technique proposed in (Guralnik and Srivastava 1999), the large fraction proposed in (Mannila, Toivonen et al. 1997), the causal inference proposed in (Blum 1982) We use all of these ideas to build up our new algorithm for the discovery of multi-variable sequences in the form of the predisposing factor and co-incident factor of the reference event of interest. We define the event as positive direction of data change or negative direction of data change above a threshold value. From these patterns we infer predisposing and co-incident factors with respect to a reference variable. For this purpose we study the Open Source Software data collected from SourceForge website. Out of 240+ attributes we only consider thirteen time dependent attributes such as Page-views, Download, Bugs0, Bugs1, Support0, Support1, Patches0, Patches1, Tracker0, Tracker1, Tasks0, Tasks1 and CVS. These attributes indicate the degree and patterns of activities of projects through the course of their progress. The number of the Download is a good indication of the progress of the projects. So we use the Download as the reference attribute. We also test our algorithm with four synthetic data sets including noise up to 50 %. The results show that our algorithm can work well and tolerate the noise data.

References

  1. Agrawal, R. and Srikant R., 1995. Mining Sequential Patterns. In Proceedings of the IEEE International Conference on Data Engineering, Taipei, Taiwan.
  2. Bettini, C., Wang S., et al. 1998. Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences. In IEEE Transactions on Knowledge and Data Engineering 10(2).
  3. Blum, R. L., 1982. Discovery, Confirmation and Interpretation of Causal Relationships from a Large Time-Oriented Clinical Databases: The Rx Project. Computers and Biomedical Research 15(2): 164-187.
  4. Dasgupta, D. and Forrest S., 1995. Novelty Detection in Time Series Data using Ideas from Immunology. In Proceedings of the 5th International Conference on Intelligent Systems, Reno, Nevada.
  5. Guralnik, V. and Srivastava J., 1999. Event Detection from Time Series Data. In KDD-99, San Diego, CA USA.
  6. Hirano, S., Sun X., et al., 2001. Analysis of Time-series Medical Databases Using Multiscale Structure Matching and Rough Sets-Based Clustering Technique. In IEEE International Fuzzy Systems Conference.
  7. Hirano, S. and Tsumoto S., 2001. A Knowledge-Oriented Clustering Technique Based on Rough Sets. In 25th Annual International Computer Software and Applications Conference (COMPSAC'01), Chicago, Illinois.
  8. Hirano, S. and Tsumoto S., 2002. Mining Similar Temporal Patterns in Long Time-Series Data and Its Application to Medicine. In IEEE: 219-226.
  9. Kantardzic, M., 2003. Data Mining Concepts, Models, Methods, and Algorithms. USA, IEEE Press.
  10. Keogh, E., Chu S., et al., 2001. An Online Algorithm for Segmenting Time Series. In Proceedings of IEEE International Conference on Data Mining, 2001.
  11. Keogh, E., Lonardi S., et al., 2002. Finding Surprising Patterns in a Time Series Database in Linear Time and Space. In Proceedings of The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 7802), Edmonton, Alberta, Canada.
  12. Last, M., Klein Y., et al., 2001. Knowledge Discovery in Time Series Databases. In IEEE Transactions on Systems, Man, and Cybernetics 31(1): 160-169.
  13. Lu, H., Han J., et al., 1998. Stock Movement Prediction and N-Dimensional Inter-Transaction Association Rules. In Proc. of 1998 SIGMOD'98 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'98) ,, Seattle, Washington.
  14. Mannila, H., Toivonen H., et al., 1997. Discovery of frequent episodes in event sequences. In Data Mining and Knowledge Discovery 1(3): 258-289.
  15. Roddick, J. F. and Spiliopoulou M., 2002. A Survey of Temporal Knowledge Discovery Paradigms and Methods. In IEEE Transactions on Knowledge and Data Mining 14(4): 750-767.
  16. Salam, M. A., 2001. Quasi Fuzzy Paths in Semantic Networks. In Proceedings 10th IEEE International Conference on Fuzzy Systems, Melbourne, Australia.
  17. Tung, A., Lu H., et al., 1999. Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules. In Proceedings of the Fifth International on Knowledge Discovery and Data Mining [KDD 99], San Diego, CA.
  18. Ueda, N. and Suzuki S., 1990. A Matching Algorithm of Deformed Planar Curves Using Multiscale Convex/Concave Structures. In JEICE Transactions on Information and Systems J73-D-II(7): 992-1000.
  19. Weiss, S. M. and Indurkhya N., 1998. Predictive Data Mining. San Francisco, California, Morgn Kaufmann Publsihers, Inc.
Download


Paper Citation


in Harvard Style

Kooptiwoot S. and Abdus Salam M. (2004). MINING THE RELATIONSHIPS IN THE FORM OF THE PREDISPOSING FACTORS AND CO-INCIDENT FACTORS AMONG NUMERICAL DYNAMIC ATTRIBUTES IN TIME SERIES DATA SET BY USING THE COMBINATION OF SOME EXISTING TECHNIQUES . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-00-7, pages 327-334. DOI: 10.5220/0002625903270334


in Bibtex Style

@conference{iceis04,
author={Suwimon Kooptiwoot and M. Abdus Salam},
title={MINING THE RELATIONSHIPS IN THE FORM OF THE PREDISPOSING FACTORS AND CO-INCIDENT FACTORS AMONG NUMERICAL DYNAMIC ATTRIBUTES IN TIME SERIES DATA SET BY USING THE COMBINATION OF SOME EXISTING TECHNIQUES},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2004},
pages={327-334},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002625903270334},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - MINING THE RELATIONSHIPS IN THE FORM OF THE PREDISPOSING FACTORS AND CO-INCIDENT FACTORS AMONG NUMERICAL DYNAMIC ATTRIBUTES IN TIME SERIES DATA SET BY USING THE COMBINATION OF SOME EXISTING TECHNIQUES
SN - 972-8865-00-7
AU - Kooptiwoot S.
AU - Abdus Salam M.
PY - 2004
SP - 327
EP - 334
DO - 10.5220/0002625903270334