valuable (as it has been curated) as an authoritative
information resource.
The key factor distinguishing our work from that
of others building bioinformatics grids is that many
of the other bioinformatics grid efforts are pursuing
technology research and have, we believe, not given
curation sufficient prominence, even as the data man-
agement problems in science have continued to grow.
Our key lesson learned is that scientists need to be in-
volved in the planning and curation process and the
bioinformatics grid software needs to be able to grow
and evolve in unison with the data model during cu-
ration activities.
5 CONCLUSIONS AND FUTURE
WORK
In this paper we discussed our efforts to define a cura-
tion process for biomarker information collected with
the National Cancer Institute (NCI)’s Early Detection
Research Network (EDRN) project. Biomarker re-
search data is curated and stored within two appli-
cations running on top of EDRN’s data grid infras-
tructure: (1) the Biomarker Database (BMDB), and
(2) the EDRN Catalog and Archive Service (eCAS).
We described the data model and curation process for
each of these applications and described real EDRN
use cases for each application. We have experi-
enced firsthand some of the difficulties in transform-
ing raw research data from geographically diverse
sources into a comprehensive query-driven knowlege-
base. These difficulties reinforce the notion that (1)
data models must be developed with evolvability as
a cornerstone; (2) scientists need to be actively in-
volved in the model development process to the great-
est extent possible; and (3) data management (cura-
tion) policy development is at least as important to
address as decisions about the underlying technology
infrastructure.
ACKNOWLEDGEMENTS
This effort was supported by the Jet Propulsion Lab-
oratory, managed by the California Institute of Tech-
nology under a contract with the National Aeronautics
and Space Administration. The authors would like to
thank Donald Johnsey, Christos Patriotis, and Sudhir
Srivastava and the NCI leadership as a whole for their
collaborative guidance and support.
REFERENCES
(2008). Birn - describing your data,
http://nbirn.net/bdr/study information.shtm.
(2008). catissue core,
https://cabig.nci.nih.gov/tools/catissuecore.
Baral, C., Davulcu, H., Nakamura, M., Singh, P., Tari, L.,
and Yu, L. (2005). Collaborative curation of data from
bio-medical texts and abstracts and its integration. In
Data Integration in the Life Sciences, pages 309–312.
Birmingham, K. (2004). An inauspicious start for the
us national biospecimen network. J. Clin. Invest.,
113(3):320–320.
Crichton, D., Kelly, S., Mattmann, C., Xiao, Q., Hughes,
J. S., Oh, J., Thornquist, M., Johnsey, D., Srivastava,
S., Essermann, L., and Bigbee, W. (2006). A dis-
tributed information services architecture to support
biomarker discovery in early detection of cancer. In
e-Science, page 44.
Foster, I., Kesselman, C., and Tuecke, S. (2001). The
anatomy of the grid: Enabling scalable virtual organi-
zations. J. Supercomputing Applications., pages 1–25.
Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L.,
Hide, W., Hill, D. P., Kania, R., Schaeffer, M., Pieer,
S. S., Twigger, S., White, O., and Rhee, S. Y. (2008).
Big data: The future of biocuration. Nature, 455:47–
50.
Keator, D., Grethe, J., Marcus, D., Ozyurt, B., Gadde,
S., Murphy, S., Pieper, S., Greve, D., Notestine, R.,
Bockholt, H., and Papadopoulos, P. (2008). A na-
tional human neuroimaging collaboratory enabled by
the biomedical informatics research network (birn).
IEEE Trans. Information Technology in Biomedicine,
12(2):162–172.
Keerthi, S. S., Ong, C. J., Siah, K. B., Lim, D. B. L., Chu,
W., Shi, M., Edwin, D. S., Menon, R., Shen, L., Lim,
J. Y. K., and Loh, H. T. (2002). A machine learn-
ing approach for the curation of biomedical litera-
ture: Kdd cup 2002 (task 1). SIGKDD Explor. Newsl.,
4(2):93–94.
Lassila, O. and Swick, R. (1999). Resource description
framework (rdf) model and syntax specification. Tech-
nical report, W3C.
Lynch, C. (2008). Big data: How do your data grow? Na-
ture, 455:28–29.
Mattmann, C., Freeborn, D., Crichton, D., Hughes, J. S.,
Ramirez, P., Hardman, S., Woollard, D., and Kelly, S.
(2008). Transformation of oodt cas to perform larger
tasks. NASA Tech Briefs., 32(6):44.
Noy, N. F., Fergerson, R. W., and Musen, M. A. (2000).
The knowledge model of protege-2000: Combining
interoperability and flexibility. In Knowledge Engi-
neering and Knowledge Management Methods, Mod-
els and Tools, pages 69–82.
von Eschenbach, A. C. and Buetow, K. (2006). Cancer in-
formatics vision: cabig. Cancer Informatics, 2:22–24.
HEALTHINF 2009 - International Conference on Health Informatics
392