Authors:
Vikash Kumar
1
;
Ashish Ranjan
2
;
Deng Cao
3
;
Gopalakrishnan Krishnasamy
3
and
Akshay Deepak
1
Affiliations:
1
National Institute of Technology Patna, Patna, India
;
2
ITER, Siksha ’O’ Anusandhan Deemed to be University, Bhubaneswar, India
;
3
Associate Professor, Department of Mathematics & Computer Science, Central State University, Wilberforce, Ohio, U.S.A.
Keyword(s):
Protein Sequence, Convolutional Neural Network, Protein Sub-Sequence, Consistency Factor.
Abstract:
The challenge of determining protein functions, inferred from the study of protein sub-sequences, is a complex
problem. Also, a little literature is evident in this regard, while a broad coverage of the literature shows a bias in
the existing approaches for the full-length protein sequences. In this paper, a CNN-based architecture is introduced to detect motif information from the sub-sequence and predict its function. Later, functional inference
for sub-sequences is used to facilitate the functional annotation of the full-length protein sequence. The results
for the proposed approach demonstrate a great future ahead for further exploration of sub-sequence based protein studies. Comparisons with the ProtVecGen-Plus – a (multi-segment + LSTM) approach – demonstrate,
an improvement of +1.24% and +4.66% for the biological process (BP) and molecular function (MF) subontologies, respectively. Next, the proposed method outperformed the hybrid ProtVecGen-Plus + MLDA by a
margin of +3.
45% for the MF dataset, while raked second for the BP dataset. Overall, the proposed method
produced better results for significantly large protein sequences (having sequence length > 500 amino acids).
(More)