4 CONCLUSIONS AND FUTURE
WORK
This work is focused on the problem of finding splice
sites, by developing two transfer learning algorithms
using a new feature representation, based both on n-
gram graphs and biological information
7
.
We noticed from our results that our work con-
tributed in the field of splice site recognition in an im-
portant manner. Using the proposed representation,
we managed to achieve higher prediction accuracy
than the current approaches of the state-of-the-art.
In addition, the proposed representation uses a small
amount of features, which help us achieve high per-
formances quickly and with low computational cost.
As future steps, we consider a deeper investiga-
tion of the biological knowledge that can be used, as
it seems to be the key factor of our method. In addi-
tion, different transfer learning approaches will be in-
vestigated, in order to take into account the proposed
representation more efficiently.
REFERENCES
Arnold, A., Nallapati, R., and Cohen, W. (2007). A compar-
ative study of methods for transductive transfer learn-
ing. pages 77–82.
Giannakopoulos, G. (2009). Automatic summarization
from multiple documents, phd thesis, department of
information and communication systems engineering,
university of the aegean.
Giannoulis, G., Krithara, A., Karatsalos, C., and Paliouras,
G. (2014). Splice site recognition using transfer learn-
ing. In Artificial Intelligence: Methods and Applica-
tions, pages 341–353.
Herndon, N. and Caragea, D. (2015). Empirical study of
domain adaptation algorithms on the task of splice site
prediction. In Biomedical Engineering Systems and
Technologies, volume 511, pages 195–211.
Herndon, N. and Caragea, D. (2016). A study of domain
adaptation classifiers derived from logistic regression
for the task of splice site prediction. IEEE Transac-
tions on NanoBioscience.
Kamath, U., Compton, J., Islamaj-Dogan, R., Jong, K. D.,
and Shehu, A. (2012). An evolutionary algorithm
approach for feature generation from sequence data
and its application to dna splice site prediction. In
IEEE/ACM Transactions on Computational Biology
and Bioinformatics, volume 9, pages 1387–1398.
Li, P. and Goldman, N. (1998). Models of molec-
ular evolution and phylogeny. Genome research,
8(12):12331244.
7
The source code is available on:
https://github.com/SimosKaza/splice site recognition
transfer learning
Mller, A., Asp, T., Holm, P., and Palmgren, M. (2007). Phy-
logenetic analysis of p5 p-type atpases, a eukaryotic
lineage of secretory pathway pumps. In Molecular
Phylogenetics and Evolution, page 619634.
Needleman, S. B. and Wunsch, C. D. (1970). A gen-
eral method applicable to the search for similarities
in the amino acid sequence of two proteins. Journal
of Molecular Biology, 48(3):443 – 453.
Pan, S. and Yang, Q. (2010). A survey on transfer learn-
ing. In IEEE Transactions on Knowledge and Data
Engineering, pages 1345–1359.
Rajapakse, J. C. and Ho, L. S. (2005). Markov encoding for
detecting signals in genomic sequences. IEEE/ACM
Transactions on Computational Biology and Bioinfor-
matics, 2(2):131–142.
R
¨
atsch, G. and Sonnenburg, S. (2004). Accurate splice
site prediction for caenorhabditis elegans. In Kernel
Methods in Computational Biology, MIT Press series
on Computational Molecular Biology, pages 277–298.
MIT Press.
R
¨
atsch, G., Sonnenburg, S., Srinivasan, J., Witte, H.,
M
¨
uller, K.-R., Sommer, R., and Sch
¨
olkopf, B. (2007).
Improving the c. elegans genome annotation using
machine learning. PLoS Computational Biology,
3:e20.
Schweikert, G., Widmer, C., Schlkopf, B., and Rtsch, G.
(2008). An empirical analysis of domain adapta-
tion algorithm for genomic sequence analysis. In
Advances in Neural Information Processing Systems,
pages 1433–1440.
Smith, T. and Waterman, M. (1981). Identification of com-
mon molecular subsequences. Journal of Molecular
Biology, 147(1):195 – 197.
Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., and
Rtsch, G. (2007). Accurate splice site prediction using
support vector machines.
Widmer, C. and Ratsch, G. (2012). Multitask learning in
computational biology. pages 207–216.
Wikipedia (2004). Nucleic acid notation - Wikipedia,
the free encyclopedia. [Online; accessed 29 August
2015].
Yamamura, M., Gotoh, O., Dunker, A., Konagaya, A.,
Miyano, S., and Takagi, T. (2003). Detection of the
splicing sites with kernel method approaches dealing
with nucleotide doublets. Genome Informatics Online,
14:426–427.
Splice Site Prediction: Transferring Knowledge Across Organisms
167