formats exist, with well over 100 documented for-
mats (Brooks et al., 2011), such as the Extensible
Biosignal File Format (EBS) (Hellmann et al., 1996),
the European Data Format (EDF+) (Kemp and Oli-
van, 2003), the Medical Waveform Format Encod-
ing Rules (MFER) (MFER, 2003), and the WaveForm
DataBase (WFDB) (Goldberger et al., 2000). Many
formats are proprietary, limiting the access to the in-
formation contained in the files. Most are designed
for a specific purpose, such as the storage of electro-
cardiogram (ECG) signals, and there is a general lack
of structured metadata, that is, all the semantic infor-
mation that describes how the signals were acquired
and what was done with them.
It should be noted that peer-reviewed literature on
this topic is fairly limited. A quick search for ”biosig-
nal database” in IEEE’s Xplore
R
Digital Library pro-
duces 28 results. Of these, only four directly address
the issues of specifying and implementing a system
for the storage of biosignals. For instance, Penzel et
al. (Penzel et al., 2001) describe the approach used
to store polysomnography (PSG) data, produced in
the scope of SIESTA, an European project carried out
in 2001. The European Data Format was used, with
the need to specify strict filename conventions (e.g.
encoding the subject identification, or the recording
site), and the internal structure of the files (i.e. the or-
der of the signals in the file and their sampling rates).
In the same year, Lovell et al. (Lovell et al., 2001) de-
tailed a framework for web-enabled storage of biosig-
nals, where it is already noted the absence of a com-
mon file format. The development of browser-based
applications is discussed and access-load issues are
addressed.
More recently, the focus has been given to se-
mantic approaches (see, for example, (Brooks et al.,
2011), (Kokkinaki et al., 2008), and (Brooks, 2009)),
where, in addition to the core biosignal data, contex-
tual information is included, such as patient and ac-
quisition procedure information, enabling the integra-
tion of disparate and heterogeneous sources of med-
ical information and facilitating their query and re-
trieval.
It is easy to grasp that current methods exhibit
some glaring limitations. In particular, it is common
to have the signal data separated from metadata char-
acterizing the experimental setup. Therefore, possess-
ing the data files alone is of limited interest, because
the acquisition context necessary to analyze them can-
not be accessed. Furthermore, current file formats
usually have a fixed number of metadata fields, with
limited size, employing weak semantics (while cer-
tain metadata fields are self-explanatory – sampling
frequency, resolution, etc. – others may not be – e.g.
”channel” may refer to the actual samples of a sig-
nal, its label, or the identification of the physical in-
put of the acquisition system). Additionally, current
approaches make it hard to append new information
to an already existing file (e.g. a filtered version of
the biosignals, or a comment about the data). And,
finally, these approaches provide poor interfaces to
the user, with very limited or inexistent query sup-
port. These limitations provide our motivation to
build upon the current state-of-the-art and develop an
extensible, semantic and hierarchical infrastructure.
3 STORING BIOSIGNALS
In response to the limitations found in previous work,
a list was curated with the most important aspects
and properties a system that stores biosignals should
exhibit. Based on these requirements, a data model
was specified and various implementations of the data
model were investigated.
3.1 Requirements
The following properties were used to evaluate and
compare the various file formats under study: 1) Ac-
cess Performance: Read and write speed, non-
sequential access, data compression, etc.; 2) Cross-
Platform Support: The availability of tools for
different operating systems and programming lan-
guages; 3) Events Support: Events and annotations
are textual comments or values related to a particular
signal, or to the acquisition session as a whole (e.g.
the location of R waves in the ECG). This type of data
is very important, given that only through the evalua-
tion of annotations a human user or a computer algo-
rithm can learn the meaning of specific signal patterns
(Penzel et al., 2001); 4) Extensibility: The ability to
easily add more data to a file; 5) Metadata: Defined
as data about data, pertains to all the additional infor-
mation that characterizes the acquired signals. It in-
cludes general and particular attributes of the biosig-
nals, when, how and by whom the acquisitions were
made, their purpose, and what processing has been
applied. Fields for metadata should be extensible (al-
lowing to add more information along the way) and,
more importantly, should have meaning, this is, the
use of a controlled vocabulary to specify content, e.g
using ontologies (McGuinness, 2003). Some com-
mon metadata fields can be seen in Figure 1. The
use of metadata allows for knowledge to be processed
computationally in a comparable way to numeric data
(Brooks, 2009); 6) Multi-modality: The capability to
store various signal types in a single container struc-
HEALTHINF2013-InternationalConferenceonHealthInformatics
66