Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis

Farhad Maleki, Anthony J. Kusalik

2019

Abstract

Gene set analysis methods are widely used to analyze data from high-throughput “omics” technologies. One drawback of these methods is their low specificity or high false positive rate. Over-representation analysis is one of the most commonly used gene set analysis methods. In this paper, we propose a systematic approach to investigate the hypothesis that gene set overlap is an underlying cause of low specificity in over-representation analysis. We quantify gene set overlap and show that it is a ubiquitous phenomenon across gene set databases. Statistical analysis indicates a strong negative correlation between gene set overlap and the specificity of over-representation analysis. We conclude that gene set overlap is an underlying cause of the low specificity. This result highlights the importance of considering gene set overlap in gene set analysis and explains the lack of specificity of methods that ignore gene set overlap. This research also establishes the direction for developing new gene set analysis methods.

Download


Paper Citation


in Harvard Style

Maleki F. and Kusalik A. (2019). Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis. In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-353-7, SciTePress, pages 182-193. DOI: 10.5220/0007376901820193


in Bibtex Style

@conference{bioinformatics19,
author={Farhad Maleki and Anthony J. Kusalik},
title={Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis},
booktitle={Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS},
year={2019},
pages={182-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007376901820193},
isbn={978-989-758-353-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 3: BIOINFORMATICS
TI - Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis
SN - 978-989-758-353-7
AU - Maleki F.
AU - Kusalik A.
PY - 2019
SP - 182
EP - 193
DO - 10.5220/0007376901820193
PB - SciTePress