Different types of models, such as directed graphs,
Boolean networks (Akutsu et al., 1999), Bayesian
Graphical Models (Zou and Conzen, 2005), and var-
ious differential models have been used to describe
gene regulations at various levels of detail and com-
plexity. The choice of the model is often determined
by how much information it tries to capture, taking
into account that the more information a model at-
tempts to infer, the more parameters are needed to
learn it, and the more complex the overall approach
becomes. Specifically, researchers have paid great at-
tention to Bayesian Networks, which can compactly
model dependency relationships between variables
relying on probabilistic measures. Since gene expres-
sion experiments are subject to many measurement er-
rors, the use of statistical methods is expected to be
effective for extracting useful information from such
noisy data. Friedman et al. (Friedman et al., 2000)
proposed both discrete and continuous Bayesian net-
work models relying on linear regression for infer-
ring gene networks. Imoto et al. (Imoto et al., 2001)
succeeded in employing non-parametric regressions
for capturing even non-linear relationships between
genes.
In this work, we perform a comparative study of
different heuristics at the state-of-the-art to perform
the task of inferring the structure of a Bayesian net-
work from breast cancer data. The paper is struc-
tured as follows: Section 2 provides a background of
the biological problem under exam; Section 3 gives
a formal definition of the problem addressed in this
study, along with a description of the different compu-
tational and statistical machineries that we are adopt-
ing, and of the input data. Afterwards, the results of
the described methods on real and simulated data are
presented and discussed in Section 4. Section 5 con-
cludes the paper and suggests avenues for future re-
search.
2 BIOLOGICAL BACKGROUND
Many biological processes are carried out by inter-
actions between proteins, RNA, and DNA. Cells re-
spond to their environment by activating signalling
networks that trigger processes such as growth, sur-
vival, apoptosis (programmed cell death), and migra-
tion. Post-translational modifications, notably phos-
phorylation, play a key role in these signalling events.
In cancer cells, signalling networks frequently be-
come compromised, leading to abnormal behaviours
and responses to external stimuli. Endogenous sig-
nal transduction in cancer cells is systematically dis-
turbed to redirect the cellular decisions from differen-
tiation and apoptosis to proliferation and, later, inva-
sion. Cancer cells acquire their malignancy through
accumulation of advantageous gene mutations by
which the necessary steps to malignancy are obtained.
These selfish adaptations to independence can be de-
scribed as a result from an evolutionary process of di-
versity and selection (Schramm et al., 2010).
Many current and emerging cancer treatments
are designed to block nodes in signalling networks,
thereby altering signalling cascades. Although there
is a wealth of literature describing canonical cell sig-
nalling networks, little is known about exactly how
these networks operate in different cancer cells. Ad-
vancing our understanding of how these networks are
deregulated across cancer cells will ultimately lead to
more effective treatment strategies for patients.
Recently, high-throughput analysis enabled the
possibility to obtain genome-wide information, such
as mRNA expressions, protein-protein interactions,
protein localizations and so on. A lot of attention has
been dedicated on developing computational methods
for extracting valuable information of molecular net-
works from such various types of genomic data.
Currently, statistical models for estimating gene
regulatory networks from genomic data are mainly
based on expression data from DNA microarrays or
RNA-seq experiments. However, since information
from these approaches is limited by their quality,
noise and experimental errors, sophisticated mathe-
matical approaches are necessary for estimating gene
regulatory networks accurately.
On the other hand, protein-protein interaction net-
works are mainly constructed relying on observed
protein-protein interaction data, using approaches
such the two hybrid assays, tandem affinity purifica-
tion experiments and, more recently, protein arrays.
However, protein-protein interaction data often con-
tains some errors, making even more difficult to con-
struct comprehensive protein-protein interaction net-
works from these interaction data alone.
3 METHODS
A Bayesian Network (BN) is a statistical graphical
model that represents a joint distribution over n ran-
dom variables and encodes it by means of a direct
acyclic graph (DAG) depicting the n nodes referring
to the variables. More formally, we define a BN as a
direct acyclic graph G = (V,E), where V is the set
containing the n random variables and E is the set
of the directed arcs over them, representing any con-
ditional dependence among the variables (Parsons,
2011).
ECTA 2016 - 8th International Conference on Evolutionary Computation Theory and Applications
218