Mining Association Rules that Incorporate Transcription Factor Binding Sites and Gene Expression Patterns in C. elegans

Hao Wan, Gregory Barrett, Carolina Ruiz, Elizabeth F. Ryder

2013

Abstract

Gene expression in different cells is regulated by different sets of transcription factors. How the combinations of transcription factors required to achieve specificity of expression are encoded by regulatory regions of DNA is a long-standing problem in biology. In the model system C. elegans, gene regulatory regions are relatively compact, and much work has been done to describe gene expression patterns in a number of cell types. In this work, we collected the promoter regions of genes with known expression patterns in a limited number of neuronal cell types, and annotated any DNA motifs in the promoters that corresponded to putative binding sites of known C. elegans transcription factors, using position weight matrices. We used association rule mining to identify rules relating the presence of particular motifs with expression of particular genes. We used metrics including confidence, support, lift, and p-value to mine and assess rules. We examined the effect on the rules of multiple vs. single transcription factors, and the effect of distance from transcription factor binding sites to the start of transcription. The mined association rules were filtered by Benjamini and Hochberg’s approach, and the most interesting rules were selected. We also validated our approach by generating association rules corresponding to gene expression patterns which have been already revealed in biological research. We conclude that our system allows the identification of interesting putative gene expression rules involving known transcription factors. These rules can be further validated using biological techniques.

References

  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proc. 20th Int. Conference on very Large Data Bases (VLDB), 1215, 487-499.
  2. Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. SIGMOD Rec., 22(2), 207-216. doi: http://doi.acm.org/10.1145/170036.170072
  3. Altun, Z. F., & Hall, D. H. (2011). Nervous system, general description. (). WormAtlas. doi: 10.3908/wormatlas.1.18
  4. Alvarez, S. A. (2003). Chi-squared computation for association rules: Preliminary results. (Technical Report No. BC-CS-2003-01).Computer Science Department, Boston College.
  5. Arda, H. E., & Walhout, A. J. M. (2010). Gene-centered regulatory networks. Briefings in Functional Genomics, 9(1), 4-12.
  6. Bailey, T. T. L. (1998). Combining evidence using pvalues: Application to sequence homology searches. Bioinformatics (Oxford, England), 14(1), 48-54.
  7. Bamps, S., & Hope, I. A. (2008). Large-scale gene expression pattern analysis, in situ, in caenorhabditis elegans. Briefings in Functional Genomics & Proteomics, 7(3), 175-183.
  8. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society.Series B (Methodological), , 289- 300.
  9. Bigelow, H., Wenick, A., Wong, A., & Hobert, O. (2004). CisOrtho: A program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting. BMC Bioinformatics, 5(1), 27.
  10. Conrad, R., Lea, K., & Blumenthal, T. (1995). SL1 transsplicing specified by AU-rich synthetic RNA inserted at the 5'end of caenorhabditis elegans pre-mRNA. Rna, 1(2), 164-170.
  11. Hobert, O., Carrera, I., & Stefanakis, N. (2010). The molecular and gene regulatory signature of a neuron. Trends in Cognitive Sciences, 33(10), 435.
  12. Hope Laboratory Expression Pattern Database. Retrieved from http://bgypc059.leeds.ac.uk/web/databaseintro. htm
  13. Hunt-Newbury, R., Viveiros, R., Johnsen, R., Mah, A., Anastas, D., Fang, L., Lorch, A. (2007). Highthroughput in vivo analysis of gene expression in caenorhabditis elegans. PLoS Biology, 5(9), e237.
  14. A. Icev*, C. Ruiz , and E. Ryder. (2003). DistanceEnhanced Association Rules for Gene Expression. In Proc. of the Third ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD2003). Held in conjunction with the Ninth International Conference on Knowledge Discovery and Data Mining (KDD2003). pp. 34-40. Washington DC, USA. August 2003
  15. Ihuegbu, N. E., Stormo, G. D., & Buhler, J. (2012). Fast, sensitive discovery of conserved genome-wide motifs. Journal of Computational Biology, 19(2), 139-147.
  16. MacIsaac, K. D., Lo, K. A., Gordon, W., Motola, S., Mazor, T., & Fraenkel, E. (2010). A quantitative model of transcriptional regulation reveals the influence of binding location on expression. PLoS Computational Biology, 6(4), e1000773.
  17. Newburger, D. E., & Bulyk, M. L. (2009). UniPROBE: An online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Research, 37(suppl 1), D77-D82.
  18. K. A. Pray*, C. Ruiz. (2005). Mining Expressive Temporal Associations From Complex Data. International Conference on Machine Learning and Data Mining MLDM'2005. Springer Verlag. Leipzig, Germany. July 9-11, 2005
  19. Reece-Hoyes, J. S., Deplancke, B., Shingles, J., Grove, C. A., Hope, I. A., & Walhout, A. J. M. (2005). A compendium of caenorhabditis elegans regulatory transcription factors: A resource for mapping transcription regulatory networks. Genome Biology, 6(13), R110.
  20. Reece-Hoyes, J. S., Shingles, J., Dupuy, D., Grove, C. A., Walhout, A. J. M., Vidal, M., & Hope, I. A. (2007). Insight into transcription factor gene duplication from caenorhabditis elegans promoterome-driven expression patterns. BMC Genomics, 8(1), 27.
  21. D. Thakkar*, C. Ruiz, E. F. Ryder. (2007). Hypothesis Driven Specialization of Gene Expression Association Rules. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM2007). pp. 48-55. Fremont, CA. USA. Nov. 2007.
  22. The C. elegans Sequencing Consortium. (1998). Genome sequence of the nematode C. elegans: A platform for investigating biology. Science, 282(5396), 2012-2018. doi: 10.1126/science.282.5396.2012
  23. WormBase, http://www.wormbase.org/, release WS230, date 1 April 2012.
Download


Paper Citation


in Harvard Style

Wan H., Barrett G., Ruiz C. and F. Ryder E. (2013). Mining Association Rules that Incorporate Transcription Factor Binding Sites and Gene Expression Patterns in C. elegans . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 81-89. DOI: 10.5220/0004252300810089


in Bibtex Style

@conference{bioinformatics13,
author={Hao Wan and Gregory Barrett and Carolina Ruiz and Elizabeth F. Ryder},
title={Mining Association Rules that Incorporate Transcription Factor Binding Sites and Gene Expression Patterns in C. elegans},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={81-89},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004252300810089},
isbn={978-989-8565-35-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - Mining Association Rules that Incorporate Transcription Factor Binding Sites and Gene Expression Patterns in C. elegans
SN - 978-989-8565-35-8
AU - Wan H.
AU - Barrett G.
AU - Ruiz C.
AU - F. Ryder E.
PY - 2013
SP - 81
EP - 89
DO - 10.5220/0004252300810089