A New Physarum Learner for Network Structure Learning

from Biomedical Data

T. Sch

¨

on

1,2

, M. Stetter

2

, A. M. Tom

´

e

3,1

and E. W. Lang

1

1

CIML Group, Biophysics, University of Regensburg, Regensburg, Germany

2

University of Applied Science Weihenstephan-Triesdorf, Freising, Germany

3

IEETA, DEETI, University of Aveiro, Aveiro, Portugal

Keywords:

Bayesian Network, Structure Learning, Physarum Solver, LAGD Hill Climber.

Abstract:

A novel structure learning algorithm for Bayesian Networks based on a Physarum Learner is presented. The

length of the connections within an initially fully connected Physarum-Maze is taken as the inverse Pearson

correlation coefﬁcient between the connected nodes. The Physarum Learner then estimates the shortest in-

direct paths between each pair of nodes. In each iteration, a score of the surviving edges is incremented.

Finally, the highest scored connections are combined to form a Bayesian Network. The novel Physarum

Learner method is evaluated with different conﬁgurations and compared to the LAGD Hill Climber showing

comparable performance with respect to quality of training results and increased time efﬁciency for large data

sets.

1 INTRODUCTION

In 2000, Nakagaki et al. (Nakagaki et al., 2000)

showed that the slime mould Physarum polycephalum

can ﬁnd the shortest path between two points in a

maze. In the following years, research efforts focused

on examining the detailed strategy used by Physarum

to understand the adaptive dynamics of its transport

network (Nakagaki et al., 2001; Nakagaki, 2001;

Nakagaki et al., 2004). Tero et al. (Tero et al., 2006;

Tero et al., 2007) proposed a physical model for the

underlying transport network based on hydrodynam-

ics and showed that their model (Physarum Solver)

can solve the shortest path problem of the maze intro-

duced by Nakagaki (Nakagaki et al., 2000) in a man-

ner similar to the biological slime mould. A detailed

mathematical analysis and studies of the convergence

properties of the Physarum Solver are discussed in

(Tomoyuki and Isamu, 2008; Ito et al., 2011; Brum-

mitt et al., 2010) Apart form path ﬁnding problems,

new applications for the Physarum Solver are now be-

ing investigated. In this paper, we present a novel ap-

proach where the strategy used by Physarum Solver

is adapted to a well known NP-hard problem, namely

learning Bayesian Network structure from data.

2 DATA SETS

Three real binary data sets are used for evaluations,

Asia (8 nodes, 8 arcs, average in-degree of 2)(Lau-

ritzen and Spiegelhalter, 1988), Cancer (Korb and

Nicholson, 2010) and Earthquake (5 nodes, 4 arcs,

average in-degree of 1.6, each) (Korb and Nicholson,

2010). Furthermore, a set of 15 binary networks has

been created randomly with different characteristics

using the Bayesian Network sampler implemented in

WEKA. The networks are denoted as nXpYaZ where

X is replaced by the number of nodes, Y by the num-

ber of allowed parents and Z by the number of arcs.

The network conﬁgurations can be seen in Table 1.

For each network, WEKA was used to sample data

sets randomly with 1000 instances each.

3 METHODS

Tero et al. (Tero et al., 2007) introduced a graphical

model for the maze created by Nakagaki (Nakagaki

et al., 2000).

To a graph where at each branch in the maze, a

node N is inserted. If the maze contains a direct way

from node N

i

to N

j

, a connection between the nodes

is added. We refer to this kind of graph as Physarum-

151

Schön T., Stetter M., M. Tomé A. and W. Lang E..

A New Physarum Learner for Network Structure Learning from Biomedical Data.

DOI: 10.5220/0004227401510156

In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS-2013), pages 151-156

ISBN: 978-989-8565-36-5

Copyright

c

2013 SCITEPRESS (Science and Technology Publications, Lda.)

Maze.

For a wide-sense stationary system, the mass ﬂux

Q

i j

through the tube segment M

i j

between two nodes

N

i

and N

j

follows Fick’s ﬁrst law

Q

i j

= −D

i j

∇p

i j

=

D

i j

L

i j

(p

j

−p

i

) = P

i j

(p

j

−p

i

) (1)

where D

i j

is the conductivity, L

i j

is the length of

the edge M

i j

and P

i j

its permeability. Finally, the pres-

sures at node N

i

and N

j

, representing free energy den-

sities, are denoted by p

i

and p

j

. By considering mass

conservation following Kirchhoff’s ﬁrst law, the Pois-

son equation for the pressures result. The ﬂux through

each edge is controlled by a time-dependent conduc-

tivity D

i j

(t) following a Master equation which mod-

els a balance between a positive feedback term, caus-

ing conductivity to increase with increasing ﬂux, and

a relaxation term which occasionally causes edges to

vanish from the graph.

3.1 Building a Physarum-Maze from

Sampling Data

A Physarum-Maze is generated by adding, for each

attribute in the data set, a node to the network, and

adding an undirected connection between each pair of

nodes, so that the maze is fully connected. The con-

ductivity D

i j

of each connection M

i j

is initialized ran-

domly in the range of D

min

≤D

i j

≤D

max

. Estimating

the length L

i j

of the connection between two nodes is

the key task of learning a Bayesian Network struc-

ture with the Physarum Solver. Here we propose that

the length of the connection represents the level of

independence of two adjacent nodes in the network.

The latter are taken to be nominal or ordinal, and

the connections in the Physarum Solver are consid-

ered undirected. Node independence is approximated

by a correlation coefﬁcient ρ

i j

between the connected

nodes. Therefore, a symmetric correlation coefﬁcient

for nominal data is needed. As the main goal of this

study is to examine whether it is generally possible to

learn Bayesian Network structure with the Physarum

Solver, for the sake of simplicity we restricted the

benchmark data sets to be binary. This restriction al-

lows to use the simple normalized correlation coefﬁ-

cient φ given by (Davenport and El-Sanhurry, 1991)

φ =

(BC −AD)

p

(A + B)(C + D)(A +C)(B + D)

(2)

where the normalized value −1 ≤φ

norm

=

φ

φ

max

≤1 is

calculated with

φ

max

=

√

(A+C)(C+D)

√

(B+D)(A+B)

, if B > C

√

(B+D)(A+B)

√

(A+C)(C+D)

, else

. (3)

The coefﬁcient φ uses a two-dimensional Contin-

gency table of the binary variables X and Y , see Table

1.

Table 1: Contingency table.

X \Y y1 y0 Total

x1 A B A+B

x0 C D C+D

Total A+C B+D N

The normalized correlation coefﬁcient is inter-

preted in a similar way as the Pearson correlation co-

efﬁcient. For each connection, the length L

i j

between

node N

i

and N

j

is given by

L

i j

= (10(1 −|φ

norm,(i, j)

|+ l))

γ

(4)

The control parameter γ assures growth of the

length L

i j

of the segment if γ > 1.0. The correla-

tion coefﬁcient φ is subtracted from 1 to yield short

segment lengths between highly correlated nodes. A

constant l ≤ 0.1 is added to avoid connections with

vanishing length L

i j

→ 0 if φ → 1.0.

3.2 The Physarum Learner

Two additional nodes, denoted as source and sink

node, are added to the Physarum-Maze which, ini-

tially, are not connected to any node in the maze.

Furthermore, the Physarum connections M

i j

are com-

pleted by a score s that is initialized with s

i j

=

0 ∀ M

i j

∈M . The Physarum Learner iterates over all

possible connections between two nodes in the maze.

For each pair of nodes N

i

↔ N

j

, the direct connec-

tion M

i j

between N

i

and N

j

is removed. The source

node N

+1

is connected to N

i

while N

j

is connected

to the sink node N

−1

with length L

+1,i

= L

j,−1

= L

0

.

The value of the constant L

0

can be chosen arbi-

trarily, because there is only one connection to the

source and one connection to the sink node which,

therefore, survive in any case, independent of their

length. Then the Physarum Solver (Tero et al., 2008)

is applied to the maze. The score of the connections

which remained in the network after one epoch of the

Physarum Solver is increased by 1. Next the previ-

ously removed connection between N

i

and N

j

is rein-

serted. The source and sink nodes are disconnected

from N

i

and N

j

and re-assigned to other, randomly

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

152

chosen nodes. Finally, the conductivities D

i j

are re-

set before the next epoch is started. After all itera-

tions have ﬁnished, each node has been connected to

the source and to the sink node exactly N −1 times.

The result of the Physarum Learner is an undirected

and fully connected graph, where each connection

has a score 0 ≤ s

i j

≤

n(n−1)

2

. A score s

i j

= 0 indi-

cates that the connection M

i j

has never survived in

any of the Physarum Learner epochs. As the length

of connections are indirectly proportional to the cor-

relation between the connected nodes, a connection

between highly correlated nodes is expected to sur-

vive more often than a connection between uncorre-

lated nodes. In other words, the Physarum Learner

forces the Physarum Solver to ﬁnd indirect paths that

explain the correlation between two nodes, by block-

ing the direct path between these two nodes.

3.3 Generate Bayesian Network from

Physarum-Maze

The basic problem on transforming the scored graph

into a Bayesian Network is that the connections es-

tablished by the Physarum Learner are undirected

whereas a Bayesian Network requires directed edges.

Hence, learning a Bayesian Network from data by the

Physarum Learner is only possible if a correct order-

ing of the nodes is known. Otherwise, the Physarum

Learner generates an undirected graphical model of

the data set, also called Markov network. Transform-

ing a Markov network into a Bayesian Network can be

done by triangulation which, however, often results in

a loss of independence information (Koller and Fried-

man, 2009).

The graphical network is built from the Physarum

Learner output in the following way: A list of con-

nections is created and sorted by their score so that the

connection with the highest score is the ﬁrst entry. An

empty graphical model is initialized. The connections

are added to the graph in the sequence of the list. The

highest scored connection is added ﬁrst, followed by

the second highest scored connection and so on un-

til the score falls below a given threshold θ. In this

work, θ is set to the arithmetic average of the scores

of all connections whenever score 6= 0 for each simu-

lated network. Unconnected nodes, i. e. nodes whose

score falls below the threshold, are added afterwards

according to their highest scored connection.

In case where the ordering is known and a

Bayesian Network is to be learned, the connections

are already directed from N

i

to N

j

. Therefore, the

Physarum Learner only contains correctly directed

connections. In case of unknown ordering, however,

a connection is only added to the Bayesian Network if

it does not cause a directed cycle in the graph. After

the structure of the network has been established from

the data, the WEKA software package is employed to

learn the Bayesian Network parameters.

4 EXPERIMENTS

4.1 Conﬁgurations of the Physarum

Learner

Structure learning results for 18 different test net-

works are quantiﬁed with the Bayesian score BDeu

and the information theoretic score MDL which

is equivalent to the Bayesian information criterion

(BIC) score. Further, we compare the learnt struc-

tures with the known original structures (the ones we

sampled the test data from) by measuring the number

of learnt (A), extra (E), missing (M) and reversed (R)

arcs. The conductivities D

i j

of the connections have

been initialized randomly in the range D

min

= 0.5 ≤

D

i j

≤D

max

= 1.0. The constant l in eqn.(4) was ﬁxed

at l = 0.1.

The Physarum Learner has been applied to all 18

networks employing different values for γ. The fol-

lowing simulations have been performed: Each of

the 18 networks has been learnt three times setting

γ = 1.0, 2.0, 3.0. In these simulations, the best scored

network has been learnt 3 times with γ = 1.0, 8 times

with γ = 2.0 and 8 times with γ = 3.0. The network

structure closest to the original one has been learnt 3

times with γ = 1.0, 9 times with γ = 2.0 and 8 times

with γ = 3.0. The results corroborated that the value

of γ clearly inﬂuences the quality of the results. It

turned out that a larger γ exponent increases the devi-

ations of length values. This results in the selection of

paths over a larger number of nodes in the Physarum

Solver. Additional results, not presented in this paper,

show that for small networks the performance is bet-

ter for γ = 2.0, but for larger networks (in terms of the

number of arcs) γ = 3.0 leads to better results. Prelim-

inary tests also showed a clear advantage in learning

performance if the positive feedback term of the mas-

ter equation was modelled as f (Q) = |Q|

µ

.

Next, the parameter µ was set to µ =

0.8, 1.0, 1.2, 1.4, 1.8 while again learning each

of the 18 networks. As the results for µ 6= 1.0

depend on the randomly initialized values of the

conductivities D

i j

, the results presented have been

averaged over 10 runs. A comparison indicating

which value for µ performed best is given in Table 2.

This table displays the number of cases out of 5 ×18

trials where a learned network was best in terms of

ANewPhysarumLearnerforNetworkStructureLearningfromBiomedicalData

153

structure similarity to the true network (col.2), the

BDeu score (col. 3) and the MDL score (col. 4). The

most similar structure is identiﬁed as the one with the

lowest value of the sums of extra and missing arcs.

Table 2: Counts where values of µ performed best.

µ Structure BDeu MDL

0.8 4 5 6

1.0 13 11 10

1.2 9 7 7

1.4 9 5 5

1.8 6 9 9

The results clearly favour µ = 1.0 in all three met-

rics. Note that the learnt structures for µ 6= 1.0 dif-

fer in their performance, and that only average results

have been compared.

4.2 Comparison of Physarum Learner

and LAGD

Referring to the results presented, the Physarum

Learner is compared to the LAGD Hill Climbing

learning algorithm setting l = 0.1, γ = 2.0, f (Q) =

|Q|. Both learning algorithms are evaluated on 18

benchmark data sets. Again the number of learnt, ex-

tra, missing and reversed arcs is measured compared

to the original reference network from where the data

is sampled. Note that the Physarum Learner cannot

learn reversed arcs as attributes in the data sets are

already given in the correct order. Furthermore, the

BDeu and the MDL scores are computed also. The

results indicate that the LAGD Hill Climbing outper-

forms the Physarum Learner in all test cases with re-

spect to the BDeu and MDL score. This is not sur-

prising, as LAGD optimizes BDeu and MDL during

learning, while the Physarum Learner does not con-

sider any score during learning at all. But especially

in the three real networks, the Physarum Learner was

able to learn network structures with a quality com-

parable to LAGD. In particular, Physarum Learner

learned the exact true network for the Cancer data set,

where LAGD has indeed the better scores, but two of

the four arcs in the network are reversed.

Results also indicate that the Physarum Learner

tends to learn less arcs for networks with a higher

number of arcs. This is due to the fact, that the thresh-

old θ is simply the average of all non-vanishing con-

nection scores. The algorithm has no ability to ﬁt the

threshold to the requirements of the dataset. A de-

tailed comparison of thresholds is needed but is sub-

ject to future work.

Removing the correct ordering from the datasets

and learning the networks under same conﬁgurations

again, the results for the three real datasets learned

with the Physarum Learner are shown in Table 3.

The results show a slight decrease of the scores com-

Table 3: Physarum Learner results for reordered real

datasets.

Dataset A E M R BDeu MDL

Asia 9 3 2 2 -2475 -2498

Cancer 4 0 0 3 -2244 -2251

Earthquake 5 1 0 2 -542 -558

pared to the ordered measurements due to some re-

versed edges. As expected, the Physarum Learner

selected exactly the same arcs as in the ordered ver-

sion, because the direction of the arcs is not consid-

ered within the learner. Therefore, the number of ex-

tra and missing arcs is the same as for the ordered

datasets, but some arcs are reversed. When neglecting

the directions and no ordering is provided, the result

of the Physarum Learner can be seen as a Markov

Network instead of a Bayesian Network. To vali-

date the learning performance of Physarum Learner

for Markov Networks, we transformed the three real

Bayesian Networks to Markov Networks using moral-

ization (Koller and Friedman, 2009). The results are

given in table 4.

Table 4: Physarum Learner networks compared to moral-

ized true networks.

Dataset Mor. Arcs A E M

Asia 10 9 4 3

Cancer 5 4 0 1

Earthquake 5 5 1 1

The column Mor. Arcs displays the number of arcs

in the moralized network. For Asia, the learned net-

work is closer to the Bayesian Network version (2 ex-

tra and 2 missing arcs) than to the Markov Network

(4 extra and 3 missing arcs). The same behaviour

can be seen for Cancer, where Physarum Learner re-

constructed the true network skeleton in table 3 but

misses the extra arc created while moralization in ta-

ble 4. For Earthquake, one additional arc is learned,

but not the correct one.

Finally, the computational load for LAGD and

Physarum Learner for sample sizes of 1000, 10.000,

100.000, 1.000.000 and 10.000.000 has been tested.

It turned out that, although the complexity of the

Physarum Solver grows with the number of nodes as

n

2

, it is three times more efﬁcient as the Physarum

Learner. This is because the latter needs to deal with

the data set only once when calculating

n(n−1)

2

Phi

correlation coefﬁcients.

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

154

Figure 1: NAFLD Net learned by the Physarum Learner.

4.3 Applying the Physarum Learner to

an NAFLD Dataset

Finally, the Physarum Learner performance is evalu-

ated on a medical dataset retrieved at the Medical Uni-

versity of Graz (MUG) from 29 patients with liver-

biopsy conﬁrmed non-alcoholic fatty liver disease

(NALFD). Patient data safety is followed throughout

the entire process according to the standard operating

procedures at the MUG. From the dataset, the param-

eters present in less than 50% of the patients and pa-

rameters with a constant value over all patients have

been removed. The missing values of the remain-

ing 32 attributes have been inserted by modes and

means using the built in ReplaceMissingValues ﬁlter

of WEKA (http://www.cs.waikato.ac.nz/ml/weka/).

Further, the dataset has been extended to 2900 in-

stances by bootstrapping. As the dataset contains a

mixture of real, nominal and ordinal attributes, and

the Physarum Learner requires binary attributes, all

attributes have been transformed to binaries.

The Physarum Learner is applied with conﬁgu-

rations identical to the ones described in 4.2. The

learned network shown in Fig. 1 is evaluated by med-

ical experts of the MUG. As the Physarum Learner

can only learn undirected graphs, hence is insensitive

to the directions of edges, all interactions are consid-

ered undirected. All 42 connections displayed in Fig.

1 have been classiﬁed as comprehensible in a medi-

cal context. However, 35 possible connections, indi-

ANewPhysarumLearnerforNetworkStructureLearningfromBiomedicalData

155

cated by general medical knowledge, have been iden-

tiﬁed as missing. For example, a connection between

Magnesium (Mg) and HDL Cholesterol. From a med-

ical point of view, the NAFLD graph generated by the

Physarum Learner correctly reﬂects current medical

knowledge and shows great potential for future appli-

cations and further reﬁnement. The behaviour of the

Physarum Learner is in line with previous experience

showing that it tends to learn the stronger, hence more

important, connections of the network.

5 CONCLUSIONS

A novel structure learning algorithm for Bayesian

Networks has been introduced that initializes a fully

connected network and uses the Physarum Solver

to delete edges between nodes which are below a

given correlation threshold. The Physarum Learner

was compared to the LAGD Hill Climber where the

Physarum Learner could learn competitive network

structures for the three real networks, but performed

worse than LAGD with respect to the learned BDeu

and MDL score. It was shown, that it is generally

possible to learn the structure of a Bayesian Network

with the Physarum Solver. The Physarum Learner

shows strongly growing execution time for networks

with many nodes. This is in part because the algo-

rithm has not yet been optimized for speed and mem-

ory usage. However, for networks with a small num-

ber of nodes and a large number of data points, the

Physarum Learner is clearly more time efﬁcient than

LAGD and probably other score based heuristic meth-

ods.

ACKNOWLEDGEMENTS

This work was supported by BioPersMed (COMET

K-project 825329), which is funded by the Federal

Ministry of Transport, Innovation and Technology

(BMVIT) and the Federal Ministry of Economics

and Labour/the Federal Ministry of Economy, Family

and Youth (BMWA/BMWFJ) and the Styrian Busi-

ness Promotion Agency (SFG). Valuable discussions

with E. Wichro, N. Lanner, R. Charchoghlyan and K.

Sargsyan are much appreciated.

REFERENCES

Brummitt, C., Laureyns, I., Lin, T., Martin, D., Parry,

D., Timmers, D., Volfson, A., Yang, T., Yaple, H.,

and Rossi, M. L. (2010). A Mathematical Study of

Physarum polycephalum. The Tero Model.

Davenport, E. C. and El-Sanhurry, N. A. (1991).

Phi/Phimax: Review and Synthesis. Educational and

Psychological Measurement, 51(4):821–828.

Ito, K., Johansson, A., Nakagaki, T., and Tero, A. (2011).

Convergence Properties for the Physarum Solver.

arXiv:1101.5249v1.

Koller, D. and Friedman, N. (2009). Probabilistic graphical

models: principles and techniques. The MIT Press.

Korb, K. and Nicholson, A. (2010). Bayesian Artiﬁcial In-

telligence. Chapman and Hall, 2nd edition.

Lauritzen, S. L. and Spiegelhalter, D. J. (1988). Local

Computation with Probabilities on Graphical Struc-

tures and their Application to Expert Systems. Journal

of the Royal Statistical Society: Series B (Statistical

Methodology), 50(2):157–224.

Nakagaki, T. (2001). Smart behavior of true slime mold in a

labyrinth. Research in Microbiology, 152(9):767–770.

Nakagaki, T., Yamada, H., and Hara, M. (2004). Smart net-

work solutions in an amoeboid organism. Biophysical

Chemistry, 107(1):1–5.

Nakagaki, T., Yamada, H., and Toth, A. (2000). Intelli-

gence: Maze-solving by an amoeboid organism. Na-

ture, 407(6803):470.

Nakagaki, T., Yamada, H., and T

´

oth,

´

A. (2001). Path ﬁnd-

ing by tube morphogenesis in an amoeboid organism.

Biophysical Chemistry, 92(1–2):47–52.

Tero, A., Kobayashi, R., and Nakagaki, T. (2006).

Physarum solver: A biologically inspired method of

road-network navigation. Physica A, 363(1):115–119.

Tero, A., Kobayashi, R., and Nakagaki, T. (2007). A math-

ematical model for adaptive transport network in path

ﬁnding by true slime mold. J. Theo. Biol., 244(4):553–

564.

Tero, A., Yumiki, K., Kobayashi, R., Saigusa, T., and Naka-

gaki, T. (2008). Flow-network adaptation in Physarum

amoebae. Theory in Biosciences, 127(2):89–94.

Tomoyuki, M. and Isamu, O. (2008). Physarum can solve

the shortest path problem on riemannian surface math-

ematically rigorously. Int. J. Pure and Appl. Math.,

(47 (3)):353–369.

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

156