UNREVEALING BIOLOGICAL PROCESS WITH LINEAR ALGEBRA - Extracting Patterns from Noisy Data
Bráulio Roberto Gonçalves Marinho Couto, Marcelo Matos Santoro, Marcos Augusto dos Santos
2011
Abstract
Extracting patterns from protein sequence data is one of the challenges of computational biology. Here we use linear algebra to analyze sequences without the requirement of multiples alignments. In this study, the singular value decomposition (SVD) of a sparse p-peptide frequency matrix (M) is used to detect and extract signals from noisy protein data (M = USVT). The central matrix S is diagonal and contains the singular values of M in decreasing order. Here we give sense to the biological significance of the SVD: the singular value spectrum visualized as scree plots unreveals the main components, the process that exists hidden in the database. This information can be used in many applications as clustering, gene expression analysis, immune response pattern identification, characterization of protein molecular dynamics and phylogenetic inference. The visualization of singular value spectrum from SVD analysis shows how many processes can be hidden in database and can help biologists to detect and extract small signals from noisy data.
References
- Berry, M. W. et al., 1995. Using linear algebra for intelligent information retrieval. SIAM Review, 37, 573-595.
- Couto, B. R. G. M. et al., 2007. Application of latent semantic indexing to evaluate the similarity of sets of sequences without multiple alignments character-bycharacter. GMR, 6(4), 983-999.
- Deerwester, S. et al., 1990. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 1-13.
- Eldén, L., 2006. Numerical linear algebra in data mining. Acta Numerica, 327-384.
- Hunter, L., 1993. Artificial Intelligence and Molecular Biology. American Association for Artificial Intelligence, MIT Press, Cambridge.
- King, R. D. et al., 2001. The utility of different representations of protein sequence for predicting functional class. Bioinformatics, 17(5): 445-454.
- Stuart, G. W. et al., 2002. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics, 18(1), 100-108.
- The Mathworks, 1996. MATLAB: mathematical computation, analysis, visualization, and algorithm development (version 5.0). Natick, Massachusetts, USA.
- Wall, M. E. et al., 2003. Singular value decomposition and principal component analysis. In: Berrar, D.P. et al. (eds.), A practical approach to microarray data analysis, Kluwer, Norwell, pp. 91-109.
- Zhu, M. and Ghodsi, A, 2006. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics and Data Analysis, 51, 918-930.
Paper Citation
in Harvard Style
Roberto Gonçalves Marinho Couto B., Matos Santoro M. and Augusto dos Santos M. (2011). UNREVEALING BIOLOGICAL PROCESS WITH LINEAR ALGEBRA - Extracting Patterns from Noisy Data . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 313-317. DOI: 10.5220/0003164103130317
in Bibtex Style
@conference{bioinformatics11,
author={Bráulio Roberto Gonçalves Marinho Couto and Marcelo Matos Santoro and Marcos Augusto dos Santos},
title={UNREVEALING BIOLOGICAL PROCESS WITH LINEAR ALGEBRA - Extracting Patterns from Noisy Data},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)},
year={2011},
pages={313-317},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003164103130317},
isbn={978-989-8425-36-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)
TI - UNREVEALING BIOLOGICAL PROCESS WITH LINEAR ALGEBRA - Extracting Patterns from Noisy Data
SN - 978-989-8425-36-2
AU - Roberto Gonçalves Marinho Couto B.
AU - Matos Santoro M.
AU - Augusto dos Santos M.
PY - 2011
SP - 313
EP - 317
DO - 10.5220/0003164103130317