Pertinent Parameters Selection for Processing of Short Amino Acid Sequences

Zbigniew Szymanski, Stanisław Jankowski, Marek Dwulit, Joanna Chodzyńska, Lucjan S. Wyrwicz

2010

Abstract

The paper describes the Least Squares Support Vector Machine (LS-SVM) classifier of short amino acid sequences for the recognition of kinase-specific phosphorylation sites. The sequences are represented by the strings of 17 characters, each character denotes one amino acid. The data contains sequences reacting with 6 enzymes: PKA, PKB, PKC, CDK, CK2 and MAPK. To enable classification of such data by the LS-SVM classifier it is necessary to map symbolic data into real numbers domain and to perform pertinent feature selection. Presented method utilizes the AAindex (amino acid index) set up of values representing various physicochemical and biological properties of amino acids. Each symbol of the sequence is substituted by 193 values. Thereafter the feature selection procedure is applied, which uses correlation ranking formula and the Gram-Schmidt orthogonalization. The selection of 3-17 most pertinent features out of 3281 enabled successful classification by the LS-SVM.

References

  1. 1 Suykens J.A.K and Vandewalle J.: Least squares support vector machine classifier, Neural Processing Letters, 9 (1999), 293-300
  2. 2 Wan J., Kang S., Tang C., Yan J., Ren Y., Liu J., Gao X., Banerjee A., Ellis L.B.M., and Li T.: Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection, Nucleic Acids Res. 36(4): e22, 2008
  3. 3 Hsien-Da H., Tzong-Yi L., Shih-Wei T. Jorng-Tzong H.: KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites, Nucleic Acids Res. 33(Web Server Issue), W226-W229, 2005
  4. 4 Blom N., Gammeltoft S., Brunak S.: Sequence and structure-based prediction of eukaryotic protein phosphorylation sites, J. Mol. Biol. 294, 1351-1362, 1999
  5. 5 Obenauer J.C., Cantley L.C., Yaffe M.B.: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs, Nucleic Acids Res. 31, 3635-3641, 2003
  6. 6 Phospho.ELM database, http://phospho.elm.eu.org/
  7. 7 Cover T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. on Electr. Comp., 1965 EC14, 326-334
  8. 8 Plewczynski D., Tkacz A., Wyrwicz L.S., Godzik A., Kloczkowski A., Rychlewski L.: Support-vector-machine classification of linear functional motifs in proteins, J Mol Model (2006) 12: 453-461
  9. 9 Henikoff S., Henikoff J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, Vol. 89, pp. 10915-10919, November 1992
  10. 10 Kawashima S. and Kanehisa M.; AAindex: amino acid index database. NucleicAcids Res. 28, 374 (2000)
  11. 11 Stoppiglia H., G. Dreyfus, R. Dubois, Y. Oussar, Ranking a Random Feature for Variable and Feature Selection, Journal of Machine Learning Research 3 (2003), 1399-1414
  12. 12 Jankowski S., Szymanski Z., Raczyk M., Piatkowska-Janko E., Oreziak A. Pertinent signal-averaged ECG parameters selection for recognition of sustained ventrical tachycardia. XXXVth International Congress on Electrocardiology, 18-21 September, 2008, St. Petersburg, Russia, pp. 43 (abstract).
  13. 13 Kawashima S., AAindex: Amino Acid Index Database Release 9.1, Aug 2006, ftp://ftp.genome.jp/pub/db/community/aaindex/aaindex.doc
Download


Paper Citation


in Harvard Style

Szymanski Z., Jankowski S., Dwulit M., Chodzyńska J. and S. Wyrwicz L. (2010). Pertinent Parameters Selection for Processing of Short Amino Acid Sequences . In Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010) ISBN 978-989-8425-14-0, pages 25-32. DOI: 10.5220/0003040600250032


in Bibtex Style

@conference{pris10,
author={Zbigniew Szymanski and Stanisław Jankowski and Marek Dwulit and Joanna Chodzyńska and Lucjan S. Wyrwicz},
title={Pertinent Parameters Selection for Processing of Short Amino Acid Sequences},
booktitle={Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010)},
year={2010},
pages={25-32},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003040600250032},
isbn={978-989-8425-14-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010)
TI - Pertinent Parameters Selection for Processing of Short Amino Acid Sequences
SN - 978-989-8425-14-0
AU - Szymanski Z.
AU - Jankowski S.
AU - Dwulit M.
AU - Chodzyńska J.
AU - S. Wyrwicz L.
PY - 2010
SP - 25
EP - 32
DO - 10.5220/0003040600250032