Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis
Farhad Maleki, Anthony J. Kusalik
2019
Abstract
Gene set analysis methods are widely used to analyze data from high-throughput “omics” technologies. One drawback of these methods is their low specificity or high false positive rate. Over-representation analysis is one of the most commonly used gene set analysis methods. In this paper, we propose a systematic approach to investigate the hypothesis that gene set overlap is an underlying cause of low specificity in over-representation analysis. We quantify gene set overlap and show that it is a ubiquitous phenomenon across gene set databases. Statistical analysis indicates a strong negative correlation between gene set overlap and the specificity of over-representation analysis. We conclude that gene set overlap is an underlying cause of the low specificity. This result highlights the importance of considering gene set overlap in gene set analysis and explains the lack of specificity of methods that ignore gene set overlap. This research also establishes the direction for developing new gene set analysis methods.
DownloadPaper Citation
in Harvard Style
Maleki F. and Kusalik A. (2019). Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis. In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-353-7, SciTePress, pages 182-193. DOI: 10.5220/0007376901820193
in Bibtex Style
@conference{bioinformatics19,
author={Farhad Maleki and Anthony J. Kusalik},
title={Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis},
booktitle={Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS},
year={2019},
pages={182-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007376901820193},
isbn={978-989-758-353-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS
TI - Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis
SN - 978-989-758-353-7
AU - Maleki F.
AU - Kusalik A.
PY - 2019
SP - 182
EP - 193
DO - 10.5220/0007376901820193
PB - SciTePress