3.3 Minimum Information about a
Microarray Experiment (MIAME)
Although increasingly used for gene expression data
analysis at a genome-wide level, microarray
technology still has the limitation of insufficient
standardization for presentation and exchange of
such data. The MIAME standard (Brazma, 2006)
aims at establishing a common way for recording
and reporting microarray-based gene expression
data, and proposes the minimum information
required to ensure that microarray data can be
interpreted and that the results that yield from the
analysis of the data can be independently verified.
The standard only defines the content and the
structure of the information and does not address the
actual technical format of storing and
communicating the data.
MIAME has also identified the need for
controlled vocabularies and ontologies for data
representation in order to enable interoperability. As
there is a very limited availability of suitable
controlled vocabularies, MIAME proposes a
representation in lists of ‘qualifier, value, source’
triplets, which authors can use to define their own
attributes (i.e. qualifiers) and provide the appropriate
values and the source from which the terms were
extracted.
A significant amount of context data is necessary
to describe a microarray experiment because the
results of such an experiment (gene expression) are
only meaningful in the context of the conditions in
which the experiment was run. Most microarray
experiments only report relative changes in gene
expression relative to a non-standardized reference,
the data is normalized in different ways and is
represented in non-standardized formats, and the
annotation describing the data is often insufficient.
All these factors make comparing data from distinct
experiments very difficult. MIAME attempts to
alleviate these issues by specifying the annotation
necessary to properly interpret the data and the
detailed description of the experiment, including the
way in which the gene expression level
measurements were obtained.
Next to the gene expression matrix, which
contains for each gene and sample in the array the
measured expression, MIAME advises to provide
information about the genes whose expression was
measured and about the experimental conditions
under which the samples were taken. The
information required can be divided at a conceptual
level into three logical parts: gene annotation,
sample annotation and a gene expression matrix.
3.4 MAGE
While MIAME focuses on the conceptual content of
the data, specifying what information is needed in
order to be able to interpret and reproduce a
microarray experiment, MAGE (MAGE, 2006)
delivers data exchange standards to facilitate the
exchange of gene expression data. The core of
MAGE is MAGE-OM, which provides an object
model for the exchange of gene expression data.
MAGE also proposes two data exchange formats,
MAGE-ML – which provides a mark-up language-
and MAGE-TAB – which provides a tabular format
(which is the current recommendation). MAGE-OM
(OMG, 2003) defines the object model for gene
expression data and it is modelled using UML.
The model can express microarray designs,
microarray manufacturing information, microarray
experiment setup and execution information, gene
expression data, and data analysis results, satisfying
the MIAMI requirements. MAGE-OM tries to be
generic and as complete as possible. Users typically
use a subset of the provided classes and relations,
which would fulfil their needs.
MAGE-ML captures MAGE-OM in an xml
notation, explicit mapping rules map the MAGE-
OM model to xml. Although MAGE-ML is
supported in various tools as import and export
format, it is a cumbersome format to use in a
laboratory when no appropriate tooling or software
development expertise is available.
MAGE-TAB (Rayner, 2006) fills this gap by
providing a simple format, still capturing the
requirements of the MIAMI standard. MAGE-TAB
is a tabular format and can be easily manipulated
with various tools (even spreadsheet programs).
4 HIGH-LEVEL DATA MODEL
FOR MICROARRAY DATA
Although there are several existing HL7 standards
addressing the issues of communication of clinico-
genomic data, we consider them not applicable for
the actual storage of the data. On the other hand, the
MIAME and MAGE standards provide models for
the storage and exchange of microarray-based gene
expression data, but are mainly tailored towards
research purposes: The underlying data models are
too complex and too elaborate to be directly used in
an EHR. For that reason, we define an initial
simplified model for the storage of genomic
information in a medical record, which combines
HIGH-LEVEL MODEL DEFINITION FOR MICROARRAY DATA IN A FUTURE CLINICO-GENOMIC EHR
151