FUZZY CONCEPT LATTICE-BASED APPROACH FOR REACTIVE MOTIFS DISCOVERY
Thanapat Kangkachit, Kitsana Waiyamai
2012
Abstract
Reactive motifs are short conserved regions discovered from binding and catalytic sites of enzymes sequences. Thus, reactive motifs provide more biological meaning than statistic-based motifs because they are directly extracted from where the chemical reaction mechanism occurs. Main problem of discovering reactive motifs is that only 4.94% enzymes sequences contain sites information. To overcome this problem, we present fuzzy concept lattice-based (FCL-based) method for discovering more general reactive motifs by incorporating biochemical knowledge. Fuzzy concept lattices are used to represent both binary and multi-value biochemical knowledge. The fuzzy concept lattice Join operator is applied to determine complete substitution groups that obtains more general reactive motifs. Experiments are conducted among different methods of determining complete substitution groups: FCL-based, concecpt lattice-based (CL-based) and similarity-based method. Experimental results show that FCL-based method significantly outperforms other methods in term of coverage value and F-measure with SVM learning algorithm. Therefore, fuzzy concept lattice provides more efficient computational support for complete substitution groups operation than that of other existing methods.
References
- Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L.- S. L. (2004). Uniprot: the universal protein knowledgebase. Nucleic Acids Research, 32(DatabaseIssue):115-119.
- Bairoch, A. (1993). The prosite dictionary of sites and patterns in proteins, its current status. Nucleic Acids Research, 21(13):3097-3103.
- Bairoch, A. and Apweiler, R. (2000). The swiss-prot protein sequence database and its supplement trembl in 2000. Nucleic Acids Res, 27:49-54.
- Belohlvek, R., Baets, B. D., Outrata, J., and Vychodil, V. (2007). Inducing decision trees via concept lattices. In CLA, volume 331 of CEUR Workshop Proceedings. CEUR-WS.org.
- Bennett, S. P., Lu, L., and Brutlag, D. L. (2003). 3matrix and 3motif: a protein structure visualization system for conserved sequence motifs. Nucleic Acids Research, 31(13):3328-3332.
- Boser, B. E., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In COLT, pages 144-152.
- Cristianini, N. and Shawe-Taylor, J. (2010). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
- Eidhammer, I., Jonassen, I., and Taylor, W. R. (1999). Structure comparison and structure patterns. JOURNAL OF COMPUTATIONAL BIOLOGY, 7:685-716.
- Elloumi, S., Youssef, C. B., and Yahia, S. B. (2004). The fuzzy classifier by concept localization in a lattice of concepts. In Proceedings of the CLA 2004 International Workshop on Concept Lattices and their Applications (CLA).
- Henikoff, S. and Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, 89(22):10915- 10919.
- Huang, J. Y. and Brutlag, D. L. (2001). The emotif database. Nucleic Acids Research, 29(1):202-204.
- Li, W., Han, J., and Pei, J. (2001). Cmar: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), pages 369-376.
- Liu, B., Hsu, W., and Ma, Y. (1998). Integrating classification and association rule mining. In Knowledge Discovery and Data Mining, pages 80-86.
- Nakai, K., Kidera, A., and Kanehisa, M. (1988). Cluster analysis of amino acid indices for prediction of protein structure and function. Protein Eng, 2(2):93-100.
- Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
- Ramu, C., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G., and Thompson, J. D. (2003). Multiple sequence alignment with the clustal series of programs. Nucleic Acids Research, 31(13):3497-3500.
- S., K. (2003). Cluster based efficient generation of fuzzy concepts. In Neural Network World, pages 521-530.
- Sander, C. and Schneider, R. (1991). Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins: Structure, Function, and Genetics, 9(1):56-68.
- Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., and Schomburg, D. (2004). BRENDA, the enzyme database: updates and major new developments. Nucleic acids research, 32(Database issue):D431-433.
- Smith, O., T., A. M., and S., C. (1990). Finding sequence motifs in groups of functionally related proteins. Proceedings of the National Academy of Sciences, 87(2):826-830.
- Waiyamai, K., Liewlom, P., Kangkachit, T., and Rakthanmanon, T. (2008). Concept lattice-based mutation control for reactive motifs discovery. In PAKDD, pages 767-776.
- Yahia, S. B. and Jaoua, A. (2001). Discovering knowledge from fuzzy concept lattice, pages 167-190. PhysicaVerlag GmbH, Heidelberg, Germany, Germany.
Paper Citation
in Harvard Style
Kangkachit T. and Waiyamai K. (2012). FUZZY CONCEPT LATTICE-BASED APPROACH FOR REACTIVE MOTIFS DISCOVERY . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012) ISBN 978-989-8425-90-4, pages 326-330. DOI: 10.5220/0003787503260330
in Bibtex Style
@conference{bioinformatics12,
author={Thanapat Kangkachit and Kitsana Waiyamai},
title={FUZZY CONCEPT LATTICE-BASED APPROACH FOR REACTIVE MOTIFS DISCOVERY},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)},
year={2012},
pages={326-330},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003787503260330},
isbn={978-989-8425-90-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2012)
TI - FUZZY CONCEPT LATTICE-BASED APPROACH FOR REACTIVE MOTIFS DISCOVERY
SN - 978-989-8425-90-4
AU - Kangkachit T.
AU - Waiyamai K.
PY - 2012
SP - 326
EP - 330
DO - 10.5220/0003787503260330