we confirmed that the performance of the
combination of our deep learning framework and the
representation of the protein features as PSSMs was
mostly superior to combinations of other machine
learning and pre-trained feature embeddings. While
we found that our model that was trained on a given
source human-viral interaction data set performed
dismally in predicting protein interactions of proteins
in a target human-virus domain, we introduced two
transfer learning methods (i.e. frozen type and fine-
tuning type). Notably, our methods increased the
cross-viral prediction performance dramatically,
compared to the naïve baseline model. In particular,
for small target datasets, fine-tuning pre-trained
parameters that were obtained from larger source sets
increased prediction performance.
REFERENCES
Alguwaizani, S., Park, B., Zhou, X., Huang, D.S., and Han,
K. (2018). Predicting interactions between virus and
host proteins using repeat patterns and composition of
amino acids. J. Healthc. Eng. 2018, 1391265.
Altunkaya, A., Bi, C., Bradley, A.R., Rose, P.W., Prli, A.,
Christie, H., Costanzo, L. Di, Duarte, J.M., Dutta, S.,
Feng, Z., et al. (2017). The RCSB protein data bank:
integrative view of protein, gene and 3D structural
information. Nucleic Acids Res. 45, D271–D281.
Ammari, M.G., Gresham, C.R., McCarthy, F.M., and
Nanduri, B. (2016). HPIDB 2.0: a curated database for
host-pathogen interactions. Database 2016, baw103.
Calderone, A., Licata, L., and Cesareni, G. (2015).
VirusMentha: a new resource for virus-host protein
interactions. Nucleic Acids Res. 43, D588–D592.
Chang, H., Han, J., Zhong, C., Snijders, A.M., and Jian-
Hua, M. (2018). Unsupervised transfer learning via
multi-scale convolutional sparse coding for biomedical
applications. IEEE Trans. Pattern Anal. Mach. Intell.
40, 1182–1194.
Chen, M., Ju, C.J.T., Zhou, G., Chen, X., Zhang, T., Chang,
K.W., Zaniolo, C., and Wang, W. (2019). Multifaceted
protein-protein interaction prediction based on Siamese
residual RCNN. Bioinformatics 35, i305–i314.
Cheplygina, V., de Bruijne, M., and Pluim, J.P.W. (2019).
Not-so-supervised: a survey of semi-supervised, multi-
instance, and transfer learning in medical image
analysis. Med. Image Anal. 54, 280–296.
Cui, J., Han, L.Y., Li, H., Ung, C.Y., Tang, Z.Q., Zheng,
C.J., Cao, Z.W., and Chen, Y.Z. (2007). Computer
prediction of allergen proteins from sequence-derived
protein structural and physicochemical properties. Mol.
Immunol. 44, 514–520.
Davies, M.N., Secker, A., Freitas, A.A., Clark, E., Timmis,
J., and Flower, D.R. (2008). Optimizing amino acid
groupings for GPCR classification. Bioinformatics 24,
1980–1986.
Durmuş Tekir, S., Çakir, T., Ardiç, E., Sayilirbaş, A.S.,
Konuk, G., Konuk, M., Sariyer, H., Uǧurlu, A.,
Karadeniz, I., Özgür, A., et al. (2013). PHISTO:
pathogen-host interaction search tool. Bioinformatics
29, 1357–1358.
Eid, F., Elhefnawi, M., and Heath, L.S. (2016). DeNovo:
virus-host sequence-based protein-protein interaction
prediction. 32, 1144–1150.
Gordon, D.E., Jang, G.M., Bouhaddou, M., Xu, J.,
Obernier, K., White, K.M., O’Meara, M.J., Rezelj, V.
V., Guo, J.Z., Swaney, D.L., et al. (2020). A SARS-
CoV-2 protein interaction map reveals targets for drug
repurposing. Nature 583, 459–468.
Guirimand, T., Delmotte, S., and Navratil, V. (2015).
VirHostNet 2.0: surfing on the web of virus/host
molecular interactions data. Nucleic Acids Res. 43,
D583–D587.
Guo, Y., Yu, L., Wen, Z., and Li, M. (2008). Using support
vector machine combined with auto covariance to
predict protein-protein interactions from protein
sequences. Nucleic Acids Res. 36, 3025–3030.
Hamp, T., and Rost, B. (2015). Evolutionary profiles
improve protein-protein interaction prediction from
sequence. 31, 1945–1950.
Hashemifar, S., Neyshabur, B., Khan, A.A., and Xu, J.
(2018). Predicting protein-protein interactions through
sequence-based deep learning. Bioinformatics 34,
i802–i810.
Karimi, M., Wu, D., Wang, Z., and Shen, Y. (2019).
DeepAffinity: interpretable deep learning of
compound-protein affinity through unified recurrent
and convolutional neural networks. Bioinformatics 35,
3329–3338.
Kriegeskorte, N., and Douglas, P.K. (2018). Cognitive
computational neuroscience. Nat. Neurosci. 21, 1148–
1160.
Le, Q. V., Karpenko, A., Ngiam, J., and Ng, A.Y. (2011).
ICA with reconstruction cost for efficient overcomplete
feature learning. Adv. Neural Inf. Process. Syst. 24 25th
Annu. Conf. Neural Inf. Process. Syst. 2011, NIPS
2011 2027–2035.
Lee, H., Grosse, R., Ranganath, R., and Ng, A.Y. (2009).
Convolutional deep belief networks for scalable
unsupervised learning of hierarchical representations.
Proc. 26th Annu. Int. Conf. Mach. Learn. ICML 54,
609–616.
Liang, Q., Li, J., Guo, M., Tian, X., Liu, C., Wang, X.,
Yang, X., Wu, P., Xiao, Z., Qu, Y., et al. (2020). Virus-
host interactome and proteomic survey of PMBCs from
COVID-19 patients reveal potential virulence factors
influencing SARS-CoV-2 pathogenesis. BioRxiv
2020.03.31.019216.
Matching, S. (2018). Neural article pair modeling for
wikipedia sub-article matching. In: ECML-PKDD 3–
19.
Min, X., Zeng, W., Chen, N., and Chen, T. (2017).
Chromatin accessibility prediction via convolutional
long short-term memory networks with k -mer
embedding. Bioinformatics 33, i92–i101.