where λ ∈ [0,1] weights the R and the C contribu-
tion, the R term quantifies the relevance of the subset
{X
S
,x
k
} and the C term quantifies the causal role of
an input x
k
with respect to the set of selected variables
x
i
∈ X
S
.
The proposed RC algorithm is then a forward se-
lection algorithm which sequentially adds variables
according to the update rule (14). Note that for λ = 0
the algorithm boils down to a conventional forward
selection wrapper which assesses the subsets accord-
ing to the measure R. The RC algorithm is initialized
by selecting the couple of variables {x
i
,x
j
} maximiz-
ing the quantity
(1 − λ)R({x
i
,x
j
};y) +
λ
d
∑
x
i
∈X
S
C(x
i
;x
j
;y)
In the implementation used in the experimental
section, we adopt a linear leave-one-out measure to
quantify the relevance of a subset, i.e. R(X,y) is set
equal to the negative of linear leave-one-out mean-
squared-error of the regression with input X and tar-
get y. Also in order to have comparable values for the
R and the C terms, at each step these quantities are
normalized over the interval [0,1] before performing
their weighted sum.
6 EXPERIMENTS
In this section we assess the efficacy of the RC algo-
rithm by performing a set of causal network inference
experiments. The aim of the experiment is to reverse
engineer both linear and nonlinear scale-free causal
networks, i.e. networks where the distribution of the
degree follows a power law, from a limited amount
of observational data. We consider a set of networks
with a large number n = 5000 of nodes and where
the degree α of the power law ranges between 2.1
and 3. The inference is done on the basis of a small
amount of N = 200 observations. The structural co-
efficients of the linear dependencies have an absolute
value distributed uniformly between 0.5 and 0.8, and
the measurement error follows a standard Normal dis-
tribution. Nonlinear networks are obtained by trans-
forming the linear dependencies between nodes with
a sigmoid function.
We compare the accuracy of several algorithms in
terms of the mean F-measure (the higher, the better)
averaged over 10 runs and over all the nodes with a
number of parents and children larger equal than two.
The F-measure, also known as balanced F-score, is
the weighted harmonic mean of precision and recall
and is conventionally used to provide a compact mea-
sure of the quality of a network inference algorithm.
We considered the following algorithms for compari-
son: the IAMB algorithm (Tsamardinos et al., 2003)
implemented by the Causal Explorer software (Alif-
eris et al., 2003) which estimates for a given variable
the set of variables belonging to its Markov blanket,
the mIMR (Bontempi and Meyer, 2010) algorithm,
the mRMR (Peng et al., 2005) algorithm and three
versions of the RC algorithm with three different val-
ues λ = 0, 0.5, 1. Note that the RC algorithm with
λ = 0 boils down to a conventional wrapper algo-
rithm based on the leave-one-out assessment of the
variables’ subsets.
We also remark that the RC algorithms aims to
return for a given node a prioritization of the other
nodes according to their causal role while the Causal
Explorer implementation of IAMB returns a specific
subset (for a given pvalue). For the sake of compari-
son, we decided to compute the F-measure by setting
the number of putative causes to the number of vari-
ables returned by IAMB.
Tables 1 and 2 report the average F-measures for
different values of α in the linear and nonlinear case,
respectively.
The results show the potential of the criterion C
and of the RC algorithm in network inference tasks
where dependencies between parents are frequent be-
cause of direct links or common ancestors. According
to the F-measures reported in the Tables the RC accu-
racy with λ = 0.5 and λ = 1 is coherently better than
the ones of mIMR, mRMR and IAMB algorithms for
all the considered degrees distributions. However the
most striking result is the clear improvement with re-
spect to a conventional wrapper approach which tar-
gets only prediction accuracy (λ = 0 ) when a causal
criterion C is taken into account together with a pre-
dictive one (λ = 0.5). These results confirm previous
results (Bontempi and Meyer, 2010; Bontempi et al.,
2011) putting into evidence that an effective causal
inference task should combine a relevance criterion
targeting prediction accuracy with a causal term able
to prioritize direct cause and penalize effects.
7 CONCLUSIONS
Causal inference from complex large dimensional
data is taking a growing importance in machine learn-
ing and knowledge discovery. Currently, most of the
existing algorithms are limited by the fact that the dis-
covery of causal directionality is submitted to the de-
tection of a limited set of distinguishable patterns, like
unshielded colliders. However the scarcity of data and
the intricacy of dependencies in networks could make
the detection of such patterns so rare that the resulting
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
164