represent a genetic variation, known to be associated
to disease. This quite rigorous definition becomes en-
dangered in cases where indicated variations ’might’
be associated to disease, as indicated in the HGMD
by the question mark. Indeed, a variation that is not
associated to disease should not be considered a mu-
tation and thus not enter the dataset as is. The CSHG
handles these cases nicely by providing the neutral
polymorphism dimension, for the Variation concept.
Another point of improvement is the lack of a proper
way of facilitating the various reference sequence in
common use by research papers. For illustration, a
certain mutation might be located in position 131 in
reference sequence X, but correspond to position 125
in reference sequence Y. The HGMD provides it’s
own cDNA sequence, from which it locates the ma-
jority of it’s mutations. However this cDNA sequence
is ’based’ on an NCBI sequence, and can thus differ
from it.
For an optimal use of the data provided by
HGMD, the above means an expert in many cases still
needs to evaluate and interpret the data. This is expen-
sive in both time and money. Aligning the HGMD set
of mutations to the NCBI reference sequence, that is
considered to be the ’golden standard’ thus seems a
logic step. Concretely, we suggest two major changes
to the HGMD: (i) facilitate a more elaborate way of
handling associated phenotype, perhaps link directly
to the Online Mendelian Inheritance in Man (OMIM)
database. And (ii) add a new column, in which the
reference sequence indicated by the source paper is
also stored. This will allow for a much easier, and
more efficient use of the HGMD data set. Consider-
ing data is acquired manually from the papers, adding
this element of extracted data seems to be relatively
low cost.
When we look at the HGMD we can not help but
notice that although very useful, a lot is still to be
wished for from an information systems point of view.
It is our strong belief that the only way of accurately
representing any data, and perhaps genetic data in par-
ticular, can only be done by means of careful analysis
of the domain. The CSHG aims to do exactly this, by
applying a conceptual modeling approach.
REFERENCES
Ashburner, M., Ball, C., and Blake, J. (2000). Gene ontol-
ogy: tool for the unification of biology. Nature genet-
ics, 25(1):25–30.
Langston, A., Stanford, J., Wicklund, K., Thompson, J.,
Blazej, R., and Ostrander, E. (1996). Germ-line brca1
mutations in selected men with prostate cancer. Amer-
ican Journal of Human Genetics, 58:881–885.
Okayama, T., Tamura, T., Gojobori, T., Tateno, Y., Ikeo, K.,
Miyazaki, S., Fukami-Kobayashi, K., and Sugawara,
H. (1998). Formal design and implementation of an
improved ddbj dna database with a new schema and
object-oriented library. Bioinformatics, 14(6):472.
Panguluri, R., Dunston, G., Brody, L., Modali, R., Ut-
ley, K., Adams-Campbell, L., Day, A., and Whitfield-
Broome, C. (1999). Brca1 mutations in african amer-
icans. Human Genetics, 105(1-2):28–31.
Pastor, O. (2008). Conceptual modeling meets the human
genome. In Conceptual modeling - ER 2008, volume
5231 of Lecture Notes in Computer Science, pages 1–
11. Springer-Verlag Berling Heidelberg.
Pastor, O., Levin, A., Casamayor, J., Celma, M., Virrueta,
A., and Eraso, L. (2009). The Evolution of Concep-
tual Modeling, chapter Model driven-based engineer-
ing applied to the interpretation of the human genome.
Springer-Verlag.
Pastor, O., Levin, A., Celma, M., Casamayor, J., Schattka,
L. E., Villanueva, M., and Perez-Alonso, M. (2010a).
Proceedings of the IVth Int. Conference on Research
Challenges in Information Science, chapter Enforcing
Conceptual Modeling to Improve the Understanding
of Human Genome. IEEE Press.
Pastor, O., Pastor, M., and Burriel, V. (2010b). Conceptual
modeling of human genome mutations: a dichotomy
between what we have and what we should have. In
Proceedings of Bioinformatics 2010, pages 160–166.
BIOSTEC Bioinformatics.
Paton, N., Khan, S., Hayes, A., Moussouni, F., Brass, A.,
Eilbeck, K., Goble, C., Hubbard, S., and Oliver, S.
(2000). Conceptual modeling of genomic information.
Bioinformatics, 16(6):548–557.
Smith, T., Lee, M., Jerome, N., McEuen, M., Taylor, M.,
Hood, L., and King, M. (1996). Complete genomic
sequence and analysis of 117 kb of human dna con-
taining the gene brca1. Genome Research, 6:1029–
1049.
van der Kroon, M., Ramirez, I. L., Levin, A., Pastor, O.,
and Brinkkemper, S. (2009). Mutational data loading
routines for human genome databases: the brca1 case.
Report UUCS2009020, Utrecht University.
MUTATIONAL DATA LOADING ROUTINES FOR HUMAN GENOME DATABASES - The BRCA1 Case
269