minimum of data, and two forms of temporal
constraints. They are duration criteria, which can
select data based on the length of time between two
time points, and timing criteria, which support a
range of temporal criteria using standard Allen
operators (Allen, 1983). Both support operations at
multiple granularities. In addition, instances of these
four criteria can be composed in a single filter
operation (conjunctively or disjunctively) to build
complex selections of data. The four criteria can also
be used when defining the other three operations.
Grouping Operations. Dividing patients into
groups is common when analyzing data. Patients can
be grouped into related categories (e.g., short,
medium, tall), which are then analyzed in different
ways. A grouping operation allows users to define a
set of named groups meeting different sets of
criteria. Each group is specified using a criteria set,
and data for each group is generated from the subset
of input to a grouping operation that meet its
associated criteria.
New Variable Operations. SWEETInfo uses
variables to define data elements associated with
patients (e.g., blood pressure, viral load; see Section
4.2). Variables are the basic units used when
transforming data in SWEETInfo. Eeach patient.
This operation defines a new variable by creating a
restriction on an existing data. For example, a High
RNA variable is defined by selecting RNA values
that are greater than some number, using a value
criterion. Alternatively, a temporal duration criterion
could be used to find patients who had been treated
with a particular drug for longer than one month.
Temporal Context Operations. Defining temporal
patterns is a central requirement when working with
temporal data. Temporal context operations are basic
building blocks in this regard because they allow
users to specify periods that meet a certain pattern.
For example, if a user is interested in post-surgical
patient outcomes, he could create a new temporal
context to represent the period from the beginning of
a hospitalization to 30 days after release. This
context could be used more specifically in
subsequent operations, such as a review of
orthopedic surgeries only. This operation allows
complex temporal criteria to be built iteratively from
a smaller set of simple criteria.
Operations in SWEETInfo ultimately specify the
action taken when certain conditions are met. The
conditions are specified using a criteria set. Each
operation has an associated creation dialog, which
has four subtabs, one for each criteria type. Users
can specify criteria interactively, and, on creation,
they are displayed in summary form (see Figure 1).
3.2 Visualizations
Users can immediately execute an operation that
they have defined and examine the results in a set of
graphical displays, which show population-level
data. They can also drill down to examine individual
patient data. The population-level view provides a
simultaneous display of data for multiple patients.
The view node can also provide summary statistics
for an operation. For example, the number of
patients who have satisfied certain analysis criteria
can be seen in the view node summary statistics.
SWEETInfo provides customized displays for
different types of temporal data, and allows users to
customize display options. After viewing the data, a
user can modify operations in a pipeline immediately
or define additional operations. Users can also define
branching points using filter and group operations to
define parallel analysis paths. The immediate visual
feedback and the ability to quickly modify or extend
pipeline operations allows rapid analyses.
3.3 Pipeline
Biomedical studies are often presented as step-by-
step processes where data are iteratively refined. To
replicate this workflow, we used a pipeline-based
representation of analyses (Figure 2). A pipeline is
composed of chained operation-view pairs. This
approach allows users to see intermediate results of
the step-by-step operations that generate the overall
analysis. Multiple parallel paths can be defined with
filter and grouping operations. A visualization node
is automatically produced by each operation and can
be used to explore data. It can also be used to
summarize the number of patients meeting criteria at
each stage of the pipeline. Each view node can also
be opened to view detailed displays of the data.
3.4 Projects
SWEETInfo includes a project-based mechanism
that allows users to save their analyses and pipeline
definitions. It supports multiple projects per user,
with each project holding a data set and a pipeline.
This mechanism allows users to store, reload, and
execute the analyses defined in a pipeline.
SWEETInfo also allows users to share pipelines
with other users, which allows execution of pre-
defined analyses. Additional fine-grained sharing is
also provided. Each operation node, which can
define a set of complex constraints, can be shared. A
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
354