Application of RotaSVM for HLA Class II Protein-Peptide Interaction Prediction

Shib Sankar Bhowmick, Indrajit Saha, Giovanni Mazzocco, Ujjwal Maulik, Luis Rato, Debotosh Bhattacharjee, Dariusz Plewczynski


In this article, the recently developed RotaSVM is used for accurate prediction of binding peptides to Human Leukocyte Antigens class II (HLA class II) proteins. The HLA II - peptide complexes are generated in the antigen presenting cells (APC) and transported to the cell membrane to elicit an immune response via T-cell activation. The understanding of HLA class II protein-peptide binding interaction facilitates the design of peptide-based vaccine, where the high rate of polymorphisms in HLA class II molecules poses a big challenge. To determine the binding activity of 636 non-redundant peptides, a set of 27 HLA class II proteins are considered in the present study. The prediction of HLA class II - peptide binding is carried out by an ensemble classifier called RotaSVM. In RotaSVM, the feature selection scheme generates bootstrap samples that are further used to create a diverse set of features using Principal Component Analysis. Thereafter, Support Vector Machines are trained with these bootstrap samples with the integration of their original feature values. The effectiveness of the RotaSVM for HLA class II protein-peptide binding prediction is demonstrated in comparison with other traditional classifiers by evaluating several validity measures with the visual plot of ROC curves. Finally, Friedman test is conducted to judge the statistical significance of RotaSVM in prediction of peptides binding to HLA class II proteins.


  1. Bhowmick, S. S., Saha, I., Rato, L., and Bhattacharjee, D. (2013). RotaSVM: A new ensemble classifier. Advances in Intelligent Systems and Computing, 227:47- 57.
  2. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
  3. Brusic, V., Rudy, G., Honeyman, G., Hammer, J., and L, L. H. (1998). Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network. Bioinformatics, 14:121-130.
  4. Bui, H. H., Sidney, J., Peters, B., Sathiamurthy, M., Sinichi, A., Purton, K. A., Moth, B. R., Chisari, F. V., Watkins, D. I., and Sette, A. (2005). Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics, 57:304- 314.
  5. Chang, S. T., Ghosh, D., Kirschner, D. E., and Linderman, J. J. (2006). Peptide length-based prediction of peptide-MHC class II binding. Bioinformatics, 22:2761-2767.
  6. Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21-27.
  7. Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32:675-701.
  8. Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics, 11:86-92.
  9. George, H. and Langley, J. P. (1995). Estimating continuous distributions in bayesian classifiers. in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 69:338-345.
  10. Greenbaum, J., Sidney, J., Chung, J., Brander, C., Peters, B., and Sette, A. (2011). Functional classification of class II human leukocyte antigen (HLA) molecules reveals seven different supertypes and a surprising degree of repertoire sharing across supertypes. Immunogenetics, 63(6):325-335.
  11. Haque, A. and Blum, J. S. (2005). New insights in antigen processing and epitope selection: development of novel immunotherapeutic strategies for cancer, autoimmunity and infectious diseases. Journal of Biological Regulators and Homeostatic Agents, 19:93- 104.
  12. Karpenko, O., Shi, J., and Dai, Y. (2005). Prediction of MHC class II binders using the ant colony search strategy. Artificial Intelligence in Medicine, 35:147-156.
  13. Lauemoller, S. L., Kesmir, C., Corbet, S. L., Fomsgaard, A., Holm, A., Claesson, M. H., Brunak, S., and Buus, S. (2000). Identifying cytotoxic T cell epitopes from genomic and proteomic information. Rev Immunogenet, 2:447-491.
  14. Maulik, U., Bandyopadhyay, S., and Saha, I. (2010). Integrating clustering and supervised learning for categorical data analysis. IEEE Transactions on Systems, Man and Cybernetics Part-A, 40(4):664-675.
  15. Maulik, U. and Saha, I. (2010). Automatic fuzzy clustering using modified differential evolution for image classification. IEEE Transactions on Geoscience and Remote Sensing, 48(9):3503-3510.
  16. Moutaftsi, M., Peters, B., Pasquetto, V., Tscharke, D. C., Sidney, J., Bui, H. H., Grey, H., and Sette, A. (2006). A consensus epitope prediction approach identifies the breadth of murine T(CD8+)-cell responses to vaccinia virus. Nature Biotechnology, 24:817-819.
  17. Murugan, N. and Dai, Y. (2005). Prediction of MHC class II binding peptides based on an iterative learning model. Immunome Research, 1:6.
  18. Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., and et al. (2007). NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE, 2:e796.
  19. Nielsen, M., Lundegaard, C., Worning, P., Hvid, C. S., Lamberth, K., Buus, S., Brunak, S., and Lund, O. (2004). Improved prediction of MHC class I and class II epitopes using a novel gibbs sampling approach. Bioinformatics, 20:1388-1397.
  20. Plewczynski, D., Basu, S., and Saha, I. (2012). AMS 4.0: consensus prediction of post-translational modifications in protein sequences. Amino Acid, 43(2):573- 582.
  21. Ramana, J. and Gupta, D. (2010). Machine learning methods for prediction of CDK-Inhibitors. PLoS ONE, 5(10).
  22. Saha, I., Maulik, U., Bandyopadhyay, S., and Plewczynski, D. (2011a). Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acid, 43(2):583-594.
  23. Saha, I., Maulik, U., Bandyopadhyay, S., and Plewczynski, D. (2011b). Improvement of new automatic differential fuzzy clustering using SVM classifier for microarray analysis. Expert Systems with Applications, 38(12):15122-15133.
  24. Saha, I., Maulik, U., Bandyopadhyay, S., and Plewczynski, D. (2011c). SVMeFC: SVM ensemble fuzzy clustering for satellite image segmentation. IEEE Geoscience and Remote Sensing Letters, 9(1):52-55.
  25. Saha, I., Maulik, U., Bandyopadhyay, S., and Plewczynski, D. (2011d). Unsupervised and supervised learning approaches together for microarray analysis. Fundamenta Informaticae, 106(1):45-73.
  26. Saha, I., Mazzocco, G., and Plewczynski, D. (2013). Consensus classification of human leukocyte antigen class II proteins. Immunogenetics, 65(2):97-105.
  27. Saha, I. and Mukhopadhyay, A. (2008). Improved crisp and fuzzy clustering techniques for categorical data. IAENG International Journal of Computer Science, 35(4):438-450.
  28. Saha, I., Plewczynski, D., Maulik, U., and Bandyopadhyay, S. (2010). Real-coded differential crisp clustering for MRI brain image segmentation. in Proceedings of the IEEE Congress on Evolutionary Computation, pages 3912-3919.
  29. Saha, I., Plewczynski, D., Maulik, U., and Bandyopadhyay, S. (2012). Improved differential evolution for microarray analysis. International journal of data mining and bioinformatics, 6(1):86-103.
  30. Salomon, J. and Flower, D. R. (2006). Predicting class II MHC-peptide binding: a kernel based approach using similarity scores. BMC Bioinformatics, 7:501.
  31. Sette, A. and Peters, B. (2007). Immune epitope mapping in the post-genomic era: lessons for vaccine development. Current Opinion in Immunology, 19:106-110.
  32. Singh, H. and Raghava, G. P. (2001). Propred: prediction of HLA-DR binding sites. Bioinformatics, 17:1236- 1237.
  33. Stern, L. J. and Wiley, D. C. (1994). Antigenic peptide binding by class I and class II histocompatibility proteins. Structure, 2(4):245-251.
  34. Sturniolo, T., Bono, E., Ding, J., Raddrizzani, L., Tuereci, O., Sahin, U., Braxenthaler, M., Gallazzi, F., Protti, M. P., Sinigaglia, F., and Hammer, J. (1999). Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nature Biotechnology, 17:555-561.
  35. Sur, A., Patra, N., Chakraborty, S., and Saha, I. (2009). A new wavelet based edge detection technique for iris imagery. IEEE International Conference on Advance Computing Conference, pages 120-124.
  36. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240:1285-1293.
  37. Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer.
  38. Wan, J., Liu, W., Xu, Q., Ren, Y., Flower, D. R., and Li, T. (2006). SVRMHC prediction server for MHC-binding peptides. BMC Bioinformatics, 7:463.
  39. Yewdell, J. W. and Bennink, J. R. (1999). Immunodominance in major histocompatibility complex class Irestricted T lymphocyte responses. Annual Review of Immunology, 17:51-88.

Paper Citation

in Harvard Style

Bhowmick S., Saha I., Mazzocco G., Maulik U., Rato L., Bhattacharjee D. and Plewczynski D. (2014). Application of RotaSVM for HLA Class II Protein-Peptide Interaction Prediction . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 178-185. DOI: 10.5220/0004804801780185

in Bibtex Style

author={Shib Sankar Bhowmick and Indrajit Saha and Giovanni Mazzocco and Ujjwal Maulik and Luis Rato and Debotosh Bhattacharjee and Dariusz Plewczynski},
title={Application of RotaSVM for HLA Class II Protein-Peptide Interaction Prediction},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},

in EndNote Style

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - Application of RotaSVM for HLA Class II Protein-Peptide Interaction Prediction
SN - 978-989-758-012-3
AU - Bhowmick S.
AU - Saha I.
AU - Mazzocco G.
AU - Maulik U.
AU - Rato L.
AU - Bhattacharjee D.
AU - Plewczynski D.
PY - 2014
SP - 178
EP - 185
DO - 10.5220/0004804801780185