Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?

D.-T. Phan; P. Leray; C. Sinoquet

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?

Topics: Biostatistics and Stochastic Models; Data Mining and Machine Learning

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 0BIOSTEC, 5-16, 2015 , Lisbon, Portugal

Authors: D.-T. Phan ¹ ; P. Leray ¹ and C. Sinoquet ²

Affiliations: ¹ Polytech/ University of Nantes, France ; ² Faculty of Sciences and University of Nantes, France

Keyword(s): Linkage Disequilibrium, Genome-wide Association Study, Multilocus Association Study, Data Dimension Reduction, Probabilistic Graphical Model, Bayesian Network.

Related Ontology Subjects/Areas/Topics: Bioinformatics ; Biomedical Engineering ; Biostatistics and Stochastic Models ; Data Mining and Machine Learning

Abstract: Association genetics, and in particular genome-wide association studies (GWASs), aim at elucidating the etiology of complex genetic diseases. In the domain of association genetics, machine learning provides an appealing alternative framework to standard statistical approaches. Pioneering works (Mourad et al., 2011) have proposed the forest of latent trees (FLTM) to model genetical data at the genome scale. The FLTM is a hierarchical Bayesian network with latent variables. A key to FLTMconstruction is the recursive clustering of variables, in a bottom up subsuming process. In this paper, we study the impact of the choice of the clustering method to be plugged in the FLTM learning algorithm, in a GWAS context. Using a real GWAS data set describing 41400 variables for each of 3004 controls and 2005 individuals affected by Crohn’s disease, we compare the influence of three clustering methods. Data dimension reduction and ability to split or group putative causal SNPs in agreement with th e underlying biological reality are analyzed. To assess the risk of missing significant association results through subsumption, we also compare the methods through the corresponding FLTM-driven GWASs. In the GWAS context and in this framework, the choice of the clustering method does not impact the satisfying performance of the downstream application, both in power and detection of false positive associations. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Phan, D.-T., Leray, P. and Sinoquet, C. (2015). Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?. In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2015) - BIOINFORMATICS; ISBN 978-989-758-070-3; ISSN 2184-4305, SciTePress, pages 5-16. DOI: 10.5220/0005179800050016

@conference{bioinformatics15,
author={D.{-}T. Phan and P. Leray and C. Sinoquet},
title={Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2015) - BIOINFORMATICS},
year={2015},
pages={5-16},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005179800050016},
isbn={978-989-758-070-3},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2015) - BIOINFORMATICS
TI - Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?
SN - 978-989-758-070-3
IS - 2184-4305
AU - Phan, D.
AU - Leray, P.
AU - Sinoquet, C.
PY - 2015
SP - 5
EP - 16
DO - 10.5220/0005179800050016
PB - SciTePress