A Novel Technique of Feature Extraction Based on Local and Global Similarity Measure for Protein Classification

Neha Bharill, Aruna Tiwari

2015

Abstract

The paper aims to propose a novel approach for extracting features from protein sequences. This approach extracts only 6 features for each protein sequence which are computed by globally considering the probabilities of occurrences of the amino acids in different position of the sequences within the superfamily which locally belongs to the six exchange groups. Then, these features are used as an input for Neural Network learning algorithm named as Boolean-Like Training Algorithm (BLTA). The BLTA classifier is used to classify the protein sequences obtained from the Protein Information Resource (PIR). To investigate the efficacy of proposed feature extraction approach, the experimentation is performed on two superfamilies, namely Ras and Globin. Across tenfold cross validation, the highest Classification Accuracy achieved by proposed approach is 94.32±3.52 with Computational Time 6.54±0.10 (s) is remarkably better in comparison to the Classification Accuracies achieved by other approaches. The experimental results demonstrate that the proposed approach extracts the minimum number of features for each protein sequence. Therefore, it results in considerably potential improvement in Classification Accuracy and takes less Computational Time for protein sequence classification in comparison with other well-known feature extraction approaches.

References

  1. Bandyopadhyay, S. (2005). An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection. Fuzzy Sets and Systems, 152(1):5-16.
  2. Barker, W., Garavelli, J., Huang, H., McGarvey, P., Orcutt, B., G.Y.Srinivasarao, Xiao, C., Yeh, L., Ledley, R., Janda, J., F.Pfeiffer, H.W.Mewes, A. T., and Wu, C. (2004). The protein information resource (pir). Nucleic Acids Research, 28(1):41-44.
  3. Dayhoff, M. and Schwartz, R. (1978). A model of evolutionary change in proteins. In In Atlas of protein sequence and structure. Citeseer.
  4. Gray, D. and Michel, A. (1992). A training algorithm for binary feedforward neural networks. Neural Networks, IEEE Transactions on, 3(2):176-194.
  5. Iqbal, M. J., Faye, I., Samir, B. B., and Said, A. M. (2014). Efficient feature selection and classification of protein sequence data in bioinformatics. The Scientific World Journal, 2014.
  6. Karchin, R. and Hughey, R. (1998). Weighting hidden markov models for maximum discrimination. Bioinformatics, 14(9):772-782.
  7. Mansouri, E., A.M. Zou, S. Katebi, H. M. R. B., and Sadr, A. (2008). Generating fuzzy rules for protein classification. Iranian Journal of Fuzzy Systems.
  8. Solovyov, A. and Lipkin, W. I. (2013). Centroid based clustering of high throughput sequencing reads based on n-mer counts. BMC bioinformatics, 14(1):268.
  9. Vergara, J. R. and Estévez, P. A. (2014). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24(1):175-186.
  10. Vipsita, S. and Rath, S. K. (2013). Two-stage approach for protein superfamily classification. Computational Biology Journal, 2013.
  11. Wang, J., Ma, Q., Shasha, D., and Wu, C. (2001). New techniques for extracting features from protein sequences. IBM Systems Journal, 40(2):426-441.
Download


Paper Citation


in Harvard Style

Bharill N. and Tiwari A. (2015). A Novel Technique of Feature Extraction Based on Local and Global Similarity Measure for Protein Classification . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015) ISBN 978-989-758-070-3, pages 219-224. DOI: 10.5220/0005283702190224


in Bibtex Style

@conference{bioinformatics15,
author={Neha Bharill and Aruna Tiwari},
title={A Novel Technique of Feature Extraction Based on Local and Global Similarity Measure for Protein Classification},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)},
year={2015},
pages={219-224},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005283702190224},
isbn={978-989-758-070-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)
TI - A Novel Technique of Feature Extraction Based on Local and Global Similarity Measure for Protein Classification
SN - 978-989-758-070-3
AU - Bharill N.
AU - Tiwari A.
PY - 2015
SP - 219
EP - 224
DO - 10.5220/0005283702190224