Effect of Database Size in the Genetic Variants Calling
Sunhee Kim, Young-Suk Lee, Chang-Yong Lee
2019
Abstract
The base quality score recalibration (BQSR) is an important step in the variant calling from high-throughput sequence data. Motivated by the fact that BQSR necessarily requires a database of known variants such as the dbSNP, we present an extensive analysis on BQSR results for human and rice genome. We showed that the recalibration results depended on the size of the database: the more variants are there in the database, the larger averaged value of the recalibrated base quality scores is obtained. This implies that the recalibrated quality score is lower than it should be when the number of variants in the database is not large enough. Based on the finding that the size of the database should play a crucial role in BQSR, we proposed a method to create a database when the size of a database is not large enough for BQSR results to be reliable. We demonstrated that, in the case of human, the database constructed by the proposed method generated almost the same results as the human dbSNP. In the case of rice, however, we showed that the proposed database is more reasonable than the rice dbSNP.
DownloadPaper Citation
in Harvard Style
Kim S., Lee Y. and Lee C. (2019). Effect of Database Size in the Genetic Variants Calling. In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-353-7, SciTePress, pages 209-215. DOI: 10.5220/0007413402090215
in Bibtex Style
@conference{bioinformatics19,
author={Sunhee Kim and Young-Suk Lee and Chang-Yong Lee},
title={Effect of Database Size in the Genetic Variants Calling},
booktitle={Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS},
year={2019},
pages={209-215},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007413402090215},
isbn={978-989-758-353-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS
TI - Effect of Database Size in the Genetic Variants Calling
SN - 978-989-758-353-7
AU - Kim S.
AU - Lee Y.
AU - Lee C.
PY - 2019
SP - 209
EP - 215
DO - 10.5220/0007413402090215
PB - SciTePress