DISULFIDE CONNECTIVITY PREDICTION WITH EXTREME LEARNING MACHINES

Monther Alhamdoosh, Castrense Savojardo, Piero Fariselli, Rita Casadio

Abstract

Our paper emphasizes the relevance of Extreme Learning Machine (ELM) in Bioinformatics applications by addressing the problem of predicting the disulfide connectivity from protein sequences. We test different activation functions of the hidden neurons and we show that for the task at hand the Radial Basis Functions are the best performing. We also show that the ELM approach performs better than the Back Propagation learning algorithm both in terms of generalization accuracy and running time. Moreover, we find that for the problem of the prediction of the disulfide connectivity it is possible to increase the predicting performance by initializing the Radial Basis Function kernels with a k-mean clustering algorithm. Finally, the ELM procedure is not only very fast but the final predicting networks can achieve an accuracy of 0.51 and 0.45, per-bonds and per-pattern, respectively. Our ELM results are in line with the state of the art predictors addressing the same problem.

References

  1. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, pages 403-410.
  2. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997). Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389-3402.
  3. Baldi, P., Cheng, J., and Vullo, A. (2005). Large-scale prediction of disulphide bond connectivity. In Saul, L., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems, 17, pages 97-104. MA MIT Press.
  4. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, L. N., and Bourne, P. E. (2000). The protein data bank. Nucleic Acids Research, 28(1).
  5. Chen, G., Deng, H., Gui, Y., Pan, Y., and Wang, X. (2006). Cysteine separations profiles on protein secondary structure infer disulfide connectivity. In IEEE International Conference on Granular Computing.
  6. Chen, Y., Lin, Y.-S., Lin, C.-J., and Hwang, J.-K. (2004). Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins, 55:1036-1042.
  7. Fariselli, P. and Casadio, R. (2001). Prediction of disulfide connectivity in proteins. Bioinformatics, 17(10):957- 964.
  8. Fariselli, P., Martelli, P. L., and Casadio, R. (2002). A neural network-based method for predicting the disulfide connectivity in proteins. In Damiani, E., editor, Knowledge Based Intelligent Information Engineering Systems and Allied Technologies (KES), pages 464-468. Amsterdam IOS Press.
  9. Fariselli, P., Riccobelli, P., and Casadio, R. (1999). Role of evolutionary information in predicting the disulfidebonding state of cysteine in proteins. Proteins, 36:340-346.
  10. Ferrè, F. and Clote, P. (2005). Disulfide connectivity prediction using secondary structure information and diresidue frequencies. Bioinformatics, 21(10):2336- 2346.
  11. Fiser, A., Cserzo, M., Tudos, E., and Simon, I. (1992). Different sequence environments of cysteines and half cystines in proteins: Application to predict disulfide forming residues. FEBS, 302(2):117-120.
  12. Fiser, A. and Simon, I. (2000). Predicting the oxidation state of cysteines by multiple sequence alignment. Bioinformatics, 16(3):251-256.
  13. Gabow, H. N. (1975). An efficient implementation of edmonds al- gorithm for maximum weight matching on graphs. Technical Report CU-CS-075-75, Department of Computer Science, Colorado University.
  14. Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M. R., Appel, R. D., and Bairoch, A. (2005). Protein identification and analysis tools on the expasy server. In Walker, J. M., editor, The Proteomics Protocols Handbook, pages 571-607. Humana Press.
  15. Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transaction on Neural Networks, 14(2).
  16. Huang, G.-B. and Siew, C.-K. (2004). Extreme learning machine: Rbf network case. In he Proceedings of the Eighth International Conference on Control, Automation, Robotics and Vision (ICARCV).
  17. Huang, G.-B. and Siew, C.-K. (2005). Extreme learning machine with randomly assigned rbf kernels. International Journal of Information Technology, 11(1):16- 24.
  18. Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of International Joint Conference on Neural Networks (IJCNN).
  19. Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2006a). Extreme learning machine: Theory and applications. Neurocomputing, 70:489-501.
  20. Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2006b). Realtime learning capability of neural networks. IEEE Transaction on Neural Networks, 17(4):251255.
  21. Kabsch, W. and Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers, 22(12):2577-2637.
  22. Lin, H.-H. and Tseng, L.-Y. (2010). Dbcp: a web server for disulfide bonding connectivity pattern prediction without the prior knowledge of the bonding state of cysteines. Nucleic Acids Research, 38:503-507.
  23. Liu, H.-L. (2007). Recent advances in disulfide connectivity predictions. Bioinformatics, 2:31-47.
  24. Lu, C.-H., Chen, Y.-C., Yu, C.-S., and Hwang, J.-K. (2007). Predicting disulfide connectivity patterns. Proteins, 67:262-270.
  25. Martelli, P. L., Fariselli, P., Malaguti, L., and Casadio, R. (2002). Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. Protein Engineering, 15(12):951-953.
  26. Mucchielli-Giorgi, M., Hazout, S., and Tuffery, P. (2002). Predicting the disulfide bonding state of cysteines using protein descriptors. Proteins, 46:243-249.
  27. Muskal, S., Holbrook, S., and Kim, S. (1990). Prediction of the disulfide-bonding state of cysteine in proteins. Protein Engineering, 3(8):667-672.
  28. Nguyen, D. and Widrow, B. (1990). Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of the International Joint Conference on Neural Networks, volume 3, pages 21-26.
  29. Oliphant, T. E. (2006). Guide to numpy.
  30. Pavelka, A. and Procházka, A. (2004). Algorithms for initialization of neural network weights. In Sborník príspevku 12. rocníku konference MATLAB, 2:453- 459.
  31. Rubinstein, R. and Fiser, A. (2008). Predicting disulfide bond connectivity in proteins by correlated mutations analysis. Bioinformatics, 24(4):489-504.
  32. Serre, D. (2002). Matrices: Theory and applications. Springer-Verlag New York, Inc.
  33. Shi, O., Cai, C., Yang, H., and Yang, J. (2008). Disulfide bond prediction using neural network and secondary structure information. The 2nd International Conference on Bioinformatics and Biomedical Engineering (ICBBE).
  34. Song, J.-N., Yuan, Z., Tan, H., Huber, T., and Burrage, K. (2007). Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure. Bioinformatics, 23(23):3147- 3154.
  35. Tamura, S. and Tateishi, M. (1997). Capabilities of a fourlayered feedforward neural network: Four layers versus three. IEEE Transaction on Neural Networks, 8(2):251255.
  36. Van Rossum, G. et al. (1991). Python language website http://www.python.org/.
  37. Vincent, M., Passerini, A., Labbé, M., and Frasconi, P. (2008). A simplified approach to disulfide connectivity prediction from protein sequences. BMC Bioinformatics, 9(20).
  38. Vullo, A. and Frasconi, P. (2004). Disulfide connectivity prediction using recursive neural networks and evolutionary information. Bioinformatics, 20:653-659.
  39. Zhao, E., Liu, H.-L., Tsai, C.-H., Tsai, H.-K., Chan, C.- H., and Kao, C.-Y. (2005). Cysteine separations profiles on protein sequences infer disulfide connectivity. Bioinformatics, 21(8):1415-1420.
  40. Zhu, L., Yang, J., Song, J.-N., Chou, K.-C., and Shen, H.-B. (2010). Improving the accuracy of predicting disulfide connectivity by feature selection. Journal of Computing Chemistry, 00(00).
Download


Paper Citation


in Harvard Style

Alhamdoosh M., Savojardo C., Fariselli P. and Casadio R. (2011). DISULFIDE CONNECTIVITY PREDICTION WITH EXTREME LEARNING MACHINES . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 5-14. DOI: 10.5220/0003125600050014


in Bibtex Style

@conference{bioinformatics11,
author={Monther Alhamdoosh and Castrense Savojardo and Piero Fariselli and Rita Casadio},
title={DISULFIDE CONNECTIVITY PREDICTION WITH EXTREME LEARNING MACHINES},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)},
year={2011},
pages={5-14},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003125600050014},
isbn={978-989-8425-36-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)
TI - DISULFIDE CONNECTIVITY PREDICTION WITH EXTREME LEARNING MACHINES
SN - 978-989-8425-36-2
AU - Alhamdoosh M.
AU - Savojardo C.
AU - Fariselli P.
AU - Casadio R.
PY - 2011
SP - 5
EP - 14
DO - 10.5220/0003125600050014