tify what variations play a key role in both phenotype
expression and gene functionality).
3.1.2 Conceptual View Generation
The next step consisted of creating a conceptual view
(i.e., a subset of elements) of the CSG tailored to the
context of use specified before by means of the ISGE
method (Garc
´
ıa S. et al., 2021a)
2
. Since the CSG pro-
vides a broad, holistic perspective of the genome, it
might contain too much information when applied to
real-world use cases that focus on a specific domain
dimension. Thus, adopting the CSG in real-world use
cases can be more efficient and straightforward if we
only consider those concepts and relationships that
are relevant to that use case in particular.
We started by presenting the CSG to the domain
users. Then, all together generated a conceptual view
of the CSG with the most relevant concepts tailored to
Sibila’s functionality, which we called CSG-Sibila.
These view offered multiple advantages: i) it im-
proved the communication between the final users
and us, ii) it provided a common framework of knowl-
edge for discussion, and iii) it eased data integration.
Fig. 1 shows the resulting conceptual view, which
reduced the number of considered concepts from 60
to 11. Such a reduction in the number of concepts in-
dicated that the working context is significantly nar-
rowed. For instance, no information regarding bio-
logical pathways was selected.
In the CSG-Sibila, the central and most important
concept is the VARIATION, identified by an id and a
name. A variation consists of a change in our DNA
sequence. It is characterized by the reference and al-
ternative alleles (i.e., the value in the DNA sequence
of humans considered “correct” and the altered one
respectively), its type, and the last time it was stud-
ied. Each variation can have a set of HGVS EXPRES-
SIONS (i.e., The HGVS expressions is a way of rep-
resenting DNA variations following a standard pro-
vided by the HUman Genome Organisation (HUGO)
and a set of EXTERNAL ITEMS (i.e., are the appear-
ances of the variation in external data sources). The
EXTERNAL ITEM contains the name, the URL, and the
variation’s specific identifier in a specific data source.
DNA VARIATIONS are located in a specific region
of the DNA sequence. The CSG Sibila represents
the location of VARIATIONS by means of three ap-
praoches. First, the CHROMOSOME where the vari-
ation is located. Second, the set of GENES altered by
2
The ISGE method allows for generating more narrow
conceptual schemes, called conceptual views, from a more
general one. These conceptual views are tailored to a spe-
cific use context and ease the adoption of conceptual model-
based techniques.
the variation. Third, the specific POSITION in the full
genome sequence where the variation is located. In
some cases, the GENES and the specific POSITION of
a VARIATION can be unknown.
The study of VARIATIONS is relevant because
they are known to be responsible for genetic dis-
eases. The extent to which a VARIATION is responsi-
ble for a genetic disease is known as clinical impact.
VARIATIONS are associated with PHENOTYPES (i.e.,
genetic diseases) through a set of clinical SIGNIFI-
CANCES. A significance indicates the pathogenicity
established between a VARIATION and a PHENOTYPE
(e.g., pathogenic, benign, risk factor, etc) and the ev-
idence that supports such assertion (i.e., the method
and the criteria). The ACTIONABILITY concept is
defined as an aggregate calculated from the differ-
ent SIGNIFICANCES of a VARIATION for a PHENO-
TYPE. An ACTIONABILITY is characterized by the
specific clinical actionability (i.e., disease-causing or
not disease-causing) and the level of evidence used
for such a classification (i.e., strong evidence, mod-
erated evidence, limited evidence, or to follow up).
Finally, each VARIATION can contain PHENOTYPE-
related BIBLIOGRAPHY.
3.2 Specification of User Requirements
3.2.1 Concur Task Tree Generation
After collecting the initial information, we comple-
mented it by proposing a task-workflow to be imple-
mented in Sibila based on the previously identified
domain users’ needs. This workflow was revised with
domain experts, being polished in multiple iterations
until they approved it. As a result, we improved our
understanding of their mental model and generated
Sibila’s task model.
We consolidated the task model of the envisioned
system using the CTT notation (see Fig. 2). The CTT
notation offer several advantages: i) it focuses on the
activities that users aim to perform; ii) it provides a hi-
erarchical structure with a wide range of granularity;
iii) it offers a graphical syntax that is easy to interpret;
iv) it defines temporal relationships between tasks.
There are three types of tasks in the CTT nota-
tion. The system task defines tasks that do not require
user interaction (e.g., displaying data) and is depicted
with a monitor. The interaction task defines tasks that
require user interaction (e.g., filling out a form) and
is depicted with a hand. Finally, the abstract task de-
fines higher-level tasks that are decomposed into other
tasks, including abstract, system, or interaction tasks;
this last task is depicted with a cloud.
ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering
18