Authors:
Neha Bharill
and
Aruna Tiwari
Affiliation:
Indian Institute of Technology, India
Keyword(s):
Bioinformatics, Probability-based Features, Position-specific Information, Binary Feed Forward Neural Network, Protein Classification.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Databases and Data Management
;
Genomics and Proteomics
;
Pattern Recognition, Clustering and Classification
;
Sequence Analysis
;
Structural Bioinformatics
Abstract:
The paper aims to propose a novel approach for extracting features from protein sequences. This approach extracts only 6 features for each protein sequence which are computed by globally considering the probabilities of occurrences of the amino acids in different position of the sequences within the superfamily which locally belongs to the six exchange groups. Then, these features are used as an input for Neural Network learning algorithm named as Boolean-Like Training Algorithm (BLTA). The BLTA classifier is used to classify the protein sequences obtained from the Protein Information Resource (PIR). To investigate the efficacy of proposed feature extraction approach, the experimentation is performed on two superfamilies, namely Ras and Globin. Across tenfold cross validation, the highest Classification Accuracy achieved by proposed approach is 94.32±3.52 with Computational Time 6.54±0.10 (s) is remarkably better in comparison to the Classification Accuracies achieved by other appro
aches. The experimental results demonstrate that the proposed approach extracts the minimum number of features for each protein sequence. Therefore, it results in considerably potential improvement in Classification Accuracy and takes less Computational Time for protein sequence classification in comparison with other well-known feature extraction approaches.
(More)