Authors:
Bráulio Roberto Gonçalves Marinho Couto
1
;
Marcelo Matos Santoro
2
and
Marcos Augusto dos Santos
2
Affiliations:
1
Centro Universitário de Belo Horizonte / UNI-BH, Brazil
;
2
UFMG, Brazil
Keyword(s):
Linear algebra, Data mining, Information retrieval, SVD.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Pattern Recognition, Clustering and Classification
;
Sequence Analysis
Abstract:
Extracting patterns from protein sequence data is one of the challenges of computational biology. Here we use linear algebra to analyze sequences without the requirement of multiples alignments. In this study, the singular value decomposition (SVD) of a sparse p-peptide frequency matrix (M) is used to detect and extract signals from noisy protein data (M = USVT). The central matrix S is diagonal and contains the singular values of M in decreasing order. Here we give sense to the biological significance of the SVD: the singular value spectrum visualized as scree plots unreveals the main components, the process that exists hidden in the database. This information can be used in many applications as clustering, gene expression analysis, immune response pattern identification, characterization of protein molecular dynamics and phylogenetic inference. The visualization of singular value spectrum from SVD analysis shows how many processes can be hidden in database and can help biologists
to detect and extract small signals from noisy data.
(More)