Effect of Database Size in the Genetic Variants Calling

Sunhee Kim, Young-Suk Lee, Chang-Yong Lee

2019

Abstract

The base quality score recalibration (BQSR) is an important step in the variant calling from high-throughput sequence data. Motivated by the fact that BQSR necessarily requires a database of known variants such as the dbSNP, we present an extensive analysis on BQSR results for human and rice genome. We showed that the recalibration results depended on the size of the database: the more variants are there in the database, the larger averaged value of the recalibrated base quality scores is obtained. This implies that the recalibrated quality score is lower than it should be when the number of variants in the database is not large enough. Based on the finding that the size of the database should play a crucial role in BQSR, we proposed a method to create a database when the size of a database is not large enough for BQSR results to be reliable. We demonstrated that, in the case of human, the database constructed by the proposed method generated almost the same results as the human dbSNP. In the case of rice, however, we showed that the proposed database is more reasonable than the rice dbSNP.

Download


Paper Citation


in Harvard Style

Kim S., Lee Y. and Lee C. (2019). Effect of Database Size in the Genetic Variants Calling. In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-353-7, SciTePress, pages 209-215. DOI: 10.5220/0007413402090215


in Bibtex Style

@conference{bioinformatics19,
author={Sunhee Kim and Young-Suk Lee and Chang-Yong Lee},
title={Effect of Database Size in the Genetic Variants Calling},
booktitle={Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS},
year={2019},
pages={209-215},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007413402090215},
isbn={978-989-758-353-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS
TI - Effect of Database Size in the Genetic Variants Calling
SN - 978-989-758-353-7
AU - Kim S.
AU - Lee Y.
AU - Lee C.
PY - 2019
SP - 209
EP - 215
DO - 10.5220/0007413402090215
PB - SciTePress