Authors:
            
                    Neha Bharill
                    
                        
                    
                     and
                
                    Aruna Tiwari
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    Indian Institute of Technology, India
                
        
        
        
        
        
             Keyword(s):
            Bioinformatics, Probability-based Features, Position-specific Information, Binary Feed Forward Neural Network, Protein Classification.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Bioinformatics
                    ; 
                        Biomedical Engineering
                    ; 
                        Data Mining and Machine Learning
                    ; 
                        Databases and Data Management
                    ; 
                        Genomics and Proteomics
                    ; 
                        Pattern Recognition, Clustering and Classification
                    ; 
                        Sequence Analysis
                    ; 
                        Structural Bioinformatics
                    
            
        
        
            
                Abstract: 
                The paper aims to propose a novel approach for extracting features from protein sequences. This approach extracts only 6 features for each protein sequence which are computed by globally considering the probabilities of occurrences of the amino acids in different position of the sequences within the superfamily which locally belongs to the six exchange groups. Then, these features are used as an input for Neural Network learning algorithm named as Boolean-Like Training Algorithm (BLTA). The BLTA classifier is used to classify the protein sequences obtained from the Protein Information Resource (PIR). To investigate the efficacy of proposed feature extraction approach, the experimentation is performed on two superfamilies, namely Ras and Globin. Across tenfold cross validation, the highest Classification Accuracy achieved by proposed approach is 94.32±3.52 with Computational Time 6.54±0.10 (s) is remarkably  better in comparison to the Classification Accuracies achieved by other appro
                aches. The experimental results demonstrate that the proposed approach extracts the minimum number of features for each protein sequence. Therefore, it results in considerably potential improvement in Classification Accuracy and takes less Computational Time for protein sequence classification in comparison with other well-known feature extraction approaches.
                (More)