AN EXTENSIBLE BIOMARKER CURATION APPROACH AND SOFTWARE INFRASTRUCTURE FOR THE EARLY DETECTION OF CANCER
Andrew F. Hart, John J. Tran, Daniel J. Crichton, Kristen Anton, Heather Kincaid, Sean Kelly, J. S. Hughes, Chris A. Mattmann
2009
Abstract
Modern research requires collaboration among geographically distributed scientists. This collaborative model is transforming scientific discovery by enabling sharing and validation of data across institutions. Informatics infrastructures are being developed to support cancer research, endowing scientists with the ability to capture and share data with remote collegues. A critical challenge presented by such infrastructures is the development of a curation model for the science data. While considerable emphasis has been placed on developing grid infrastructures, few are addressing the curation aspects crucial to creating a useful scientific knowledgebase. The United States National Cancer Institute’s (NCI) Early Detection Research Network (EDRN) is a distributed network of research institutions focused on the discovery of cancer biomarkers. In this paper, we describe our work building a data collection and curation infrastructure on top of the existing EDRN bioinformatics data grid. The approach involves normalizing curated data through the use of a common information model for cancer biomarker research. We argue that such a model is critical to ensuring that data can be combined into an integrated knowledge system. Furthermore, we argue that human curators with backgrounds in both informatics and science play a critical role in the overall value of the EDRN knowledge-base.
References
- (2008). Birn - describing your http://nbirn.net/bdr/study information.shtm.
- (2008). catissue https://cabig.nci.nih.gov/tools/catissuecore.
- Baral, C., Davulcu, H., Nakamura, M., Singh, P., Tari, L., and Yu, L. (2005). Collaborative curation of data from bio-medical texts and abstracts and its integration. In Data Integration in the Life Sciences, pages 309-312.
- Birmingham, K. (2004). An inauspicious start for the us national biospecimen network. J. Clin. Invest., 113(3):320-320.
- Crichton, D., Kelly, S., Mattmann, C., Xiao, Q., Hughes, J. S., Oh, J., Thornquist, M., Johnsey, D., Srivastava, S., Essermann, L., and Bigbee, W. (2006). A distributed information services architecture to support biomarker discovery in early detection of cancer. In e-Science, page 44.
- Foster, I., Kesselman, C., and Tuecke, S. (2001). The anatomy of the grid: Enabling scalable virtual organizations. J. Supercomputing Applications., pages 1-25.
- Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D. P., Kania, R., Schaeffer, M., Pieer, S. S., Twigger, S., White, O., and Rhee, S. Y. (2008). Big data: The future of biocuration. Nature, 455:47- 50.
- Keator, D., Grethe, J., Marcus, D., Ozyurt, B., Gadde, S., Murphy, S., Pieper, S., Greve, D., Notestine, R., Bockholt, H., and Papadopoulos, P. (2008). A national human neuroimaging collaboratory enabled by the biomedical informatics research network (birn). IEEE Trans. Information Technology in Biomedicine, 12(2):162-172.
- Keerthi, S. S., Ong, C. J., Siah, K. B., Lim, D. B. L., Chu, W., Shi, M., Edwin, D. S., Menon, R., Shen, L., Lim, J. Y. K., and Loh, H. T. (2002). A machine learning approach for the curation of biomedical literature: Kdd cup 2002 (task 1). SIGKDD Explor. Newsl., 4(2):93-94.
- Lassila, O. and Swick, R. (1999). Resource description framework (rdf) model and syntax specification. Technical report, W3C.
- Lynch, C. (2008). Big data: How do your data grow? Nature, 455:28-29.
- Mattmann, C., Freeborn, D., Crichton, D., Hughes, J. S., Ramirez, P., Hardman, S., Woollard, D., and Kelly, S. (2008). Transformation of oodt cas to perform larger tasks. NASA Tech Briefs., 32(6):44.
- Noy, N. F., Fergerson, R. W., and Musen, M. A. (2000). The knowledge model of protege-2000: Combining interoperability and flexibility. In Knowledge Engineering and Knowledge Management Methods, Models and Tools, pages 69-82.
- von Eschenbach, A. C. and Buetow, K. (2006). Cancer informatics vision: cabig. Cancer Informatics, 2:22-24.
Paper Citation
in Harvard Style
F. Hart A., J. Tran J., J. Crichton D., Anton K., Kincaid H., Kelly S., S. Hughes J. and A. Mattmann C. (2009). AN EXTENSIBLE BIOMARKER CURATION APPROACH AND SOFTWARE INFRASTRUCTURE FOR THE EARLY DETECTION OF CANCER . In Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2009) ISBN 978-989-8111-63-0, pages 387-392. DOI: 10.5220/0001781103870392
in Bibtex Style
@conference{healthinf09,
author={Andrew F. Hart and John J. Tran and Daniel J. Crichton and Kristen Anton and Heather Kincaid and Sean Kelly and J. S. Hughes and Chris A. Mattmann},
title={AN EXTENSIBLE BIOMARKER CURATION APPROACH AND SOFTWARE INFRASTRUCTURE FOR THE EARLY DETECTION OF CANCER},
booktitle={Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2009)},
year={2009},
pages={387-392},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001781103870392},
isbn={978-989-8111-63-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2009)
TI - AN EXTENSIBLE BIOMARKER CURATION APPROACH AND SOFTWARE INFRASTRUCTURE FOR THE EARLY DETECTION OF CANCER
SN - 978-989-8111-63-0
AU - F. Hart A.
AU - J. Tran J.
AU - J. Crichton D.
AU - Anton K.
AU - Kincaid H.
AU - Kelly S.
AU - S. Hughes J.
AU - A. Mattmann C.
PY - 2009
SP - 387
EP - 392
DO - 10.5220/0001781103870392