The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space

Hong-Yu Chen, Chang-Biau Yang, Chiou-Yi Hor, Kuo-Tsung Tseng

Abstract

A disulfide bond, formed by two oxidized cysteines, plays an important role in the protein folding and structure stability, and it may regulate protein functions. The disulfide connectivity prediction problem is to reveal the correct information of disulfide connectivity in the target protein. It is difficult because the number of possible patterns grows rapidly with respect to the number of cysteines. In this paper, we discover some rules to discriminate the patterns with high accuracy in various methods. Then, we propose the pattern-wise and pair-wise BKS (behavior knowledge space) methods to fuse multiple classifiers constructed by the SVM (support vector machine) methods. Furthermore, we combine the CSP (cysteine separation profile) method to form our hybrid method. The prediction accuracy of our hybrid method in SP39 dataset with 4-fold cross-validation is increased to 69.1%, which is better than the best previous result 65.9%.

References

  1. Alessandro Vullo and Paolo Frasconi (2004). Disulfide connectivity prediction using recursive neural networks and evolutionary information. Bioinformatics, 20(5):653-659.
  2. Bo-Juen Chen, Chi-Hung Tsai, Chen-hsiung Chan, and Cheng-Yan Kao (2006). Disulfide connectivity prediction with 70% accuracy using two-level models. PROTEINS: Structure, Function, and Genetics, 64:246-252.
  3. Castrense Savojardo, Piero Fariselli, Pier Luigi Martelli, and Rita Casadio (2013). Prediction of disulfide connectivity in proteins with machine-learning methods and correlated mutations. BMC Bioinformatics, 14(S10).
  4. Chao-Chun Chuang, Chun-Yin Chen, Jinn-Moon Yang, Ping-Chiang Lyu, and Jenn-Kang Hwang (2003). Relationship between protein structures and disulfidebonding patterns. PROTEINS: Structure, Function, and Genetics, 53:1-5.
  5. Chi-Hung Tsai, Bo-Juen Chen, Chen-Hsiung Chan, HsuanLiang Liu, and Cheng-Yan Kao (2005). Improving disulfide connectivity prediction with sequential distance between oxidized cysteines. Bioinformatics, 21(24):4416-4419.
  6. Chih-Chung Chang and Chih-Jen Lin (2001). LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  7. Chih-Hao Lu, Yu-Ching Chen, Chin-Sheng Yu, and JennKang Hwang (2007). Predicting disulfide connectivity patterns. PROTEINS: Structure, Function, and Genetics, 67:262-270.
  8. Chong-Jie Wang, Chang-Biau Yang, Chiou-Yi Hor, and Kuo-Tsung Tseng (2012). Disulfide bond prediction with hybrid models. In Proc. of the 2012 International Conference on Computing and Security (ICCSa?e?12), Ulaanbaatar, Mongolia.
  9. David T Jones (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, 292(2):195-202.
  10. East Zhao, Hsuan-Liang Liu, Chi-Hung Tsai, HuaiKuang Tsai, Chen-Hsiung Chan, and Cheng-Yan Kao (2005). Cysteine separations profiles on protein sequences infer disulfide connectivity. Bioinformatics, 21(8):1415-1420.
  11. F. Ferre and P. Clote (2005). Disulfide connectivity prediction using secondary structure information and diresidue frequencies. Bioinformatics, 21(10):2336- 2346.
  12. Guantao Chen, Hai Deng, Yufeng Gui, Yi Pan, and Xue Wang (2006). Cysteine separations profiles on protein secondary structure infer disulfide connectivity. In 2006 IEEE International Conference on Granular Computing, pages 663-665.
  13. Hsuan-Liang Liu and Shih-Chieh Chen (2007). Prediction of disulfide connectivity in proteins with support vector machine. Journal of the Chinese Institute of Chemical Engineers, 38(1):63-70.
  14. Jayavardhana Rama G. L., Alistair P. Shilton, Michael M. Parker, and Marimuthu Palaniswami (2005). Prediction of cystine connectivity using svm. Bioinformation, 1(2):69-74.
  15. Jianlin Cheng, Hiroto Saigo, and Pierre Baldi (2006). Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching. PROTEINS: Structure, Function, and Genetics, 62:617-629.
  16. Leonid A. Mirny and Eugene I. Shakhnovich (1996). How to derive a protein folding potential? a new approach to an old problem. Journal of Molecular Biology, 264(5):1164-1179.
  17. Marc Vincent, Andrea Passerini, Matthieu Labbe, and Paolo Frasconi (2008). A simplified approach to disulfide connectivity prediction from protein sequences. BMC Bioinformatics, 9(1):20.
  18. P. Frasconi, A. Passerini, and A. Vullo (2002). A two-stage svm architecture for predicting the disulfide bonding state of cysteines. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, pages 25-34.
  19. Paul M. Harrison and Michael J. E. Sternberg (1994). Analysis and classification of disulphide connectivity in proteins : The entropic effect of cross-linkage. Journal of Molecular Biology, 244(4):448-463.
  20. Pier Luigi Martelli, Piero Fariselli, Luca Malaguti, and Rita Casadio (2002). Prediction of the disulfide-bonding state of cysteines in proteins at 88% accuracy. Protein Science, 11:2735-2739.
  21. Piero Fariselli, Paola Riccobelli, and Rita Casadio (1999). Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. PROTEINS: Structure, Function, and Genetics, 36:340- 346.
  22. Piero Fariselli and Rita Casadio (2001). disulfide connectivity in proteins. 17(10):957-964.
  23. Pierre Baldi, Jianlin Cheng, and Alessandro Vullo (2005). Large-scale prediction of disulphide bond connectivity. In Advances in Neural Information Processing Systems 17, pages 97-104, Cambridge, MA, USA. MIT Press.
  24. Rotem Rubinstein and Andras Fiser (2008). Predicting disulfide bond connectivity in proteins by correlated mutations analysis. Bioinformatics, 24(4):498-504.
  25. Sarunas Raudys and Fabio Roli (2003). The behavior knowledge space fusion method: Analysis of generalization error and strategies for performance improvement. In In Proc. Int. Workshop on Multiple Classifier Systems (LNCS 2709, pages 55-64. Springer.
  26. Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997). Gapped blast and psiblast: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389-3402.
  27. Vladimir N. Vapnik (1999). The Nature of Statistical Learning Theory. Springer.
  28. Wei-Chun Chung, Chang-Biau Yang, and Chiou-Yi Hor (2009). An effective tuning method for cysteine state classification. In Proc. of National Computer Symposium, Workshop on Algorithms and Bioinformatics, Taipei, Taiwan.
  29. Yu-Ching Chen (2007). Prediction of Disulfide Connectivity from Protein Sequences. Ph. D. dissertation, National Chiao Tung University, Hsinchu, Taiwan.
  30. Yu-Ching Chen and Jenn-Kang Hwang (2005). Prediction of disulfide connectivity from protein sequences. PROTEINS: Structure, Function, and Genetics, 61:507-512.
  31. Yu-Ching Chen, Yeong-Shin Lin, Chih-Jen Lin, and JennKang Hwang (2004). Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences. PROTEINS: Structure, Function, and Genetics, 55:1036-1042.
Download


Paper Citation


in Harvard Style

Chen H., Tseng K., Yang C. and Hor C. (2013). The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 112-118. DOI: 10.5220/0004541501120118


in Bibtex Style

@conference{kdir13,
author={Hong-Yu Chen and Kuo-Tsung Tseng and Chang-Biau Yang and Chiou-Yi Hor},
title={The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={112-118},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004541501120118},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space
SN - 978-989-8565-75-4
AU - Chen H.
AU - Tseng K.
AU - Yang C.
AU - Hor C.
PY - 2013
SP - 112
EP - 118
DO - 10.5220/0004541501120118