STRUCTURAL MOTIF ENUMERATION IN

TRANSCRIPTIONAL REGULATION NETWORKS

Claire Luciano

Department of Computer Science, UC San Diego, 9500 Gilman Drive, La Jolla, CA 92093, U.S.A.

Chun-Hsi Huang

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, U.S.A.

Keywords: Network motif, Transcriptional regulation network, Sampling.

Abstract: Network motifs are small connected subnetworks within a larger network that occur in statistically

significant quantities and may indicate functional regions of the network. Network motif software tools

employ algorithms that compare a network to randomly generated networks in order to identify subnetworks

that occur in frequencies higher than would be expected by random chance. The transcriptional regulation

network of E. coli has been represented as a network and evaluated using both full enumeration and an edge

sampling algorithm. Several significant network motifs were identified, including feedforward loops and

bipartite graphs. This paper applies both full enumeration and a different sampling algorithm, randomized

enumeration, to the E. coli network using the newer software tool FANMOD. Evaluating the E. coli

transcriptional regulation network with FANMOD also identified feedforward loops and bipartite graphs as

significant network motifs. Sampling identified fewer and less significant motifs than full enumeration,

however, sampling enables the evaluation of larger subgraph sizes.

1 INTRODUCTION

Graph theory provides a useful mathematical model

to represent many systems as graphs composed of

vertices, or nodes, and edges that connect the pairs

of vertices. Network motifs are small connected

subnetworks within a larger network that occur in

higher frequencies than would be expected in

random networks (Kashtan 2004, Schreiber and

Schwobbermeyer 2005, Wernicke and Rasche

2006). Network motifs are the building blocks of

networks, providing information about the behavior

or design of the network; they may identify

functional regions within biological systems. A

range of network motif detection software tools have

been developed to analyze network systems and to

identify network motifs. Systems ranging from

social networks to biological systems have been

represented as graphs and analyzed for network

motifs using these software tools. However, there is

still room for improvement in network motif

detection software tools. Since the most common

network motifs in biological systems are different

than those found in other systems, it may be useful

to optimize software tools specifically for network

motif detection in biological systems. This could

improve both outcome and performance.

1.1 Network Motifs and E. Coli

The transcription regulation networks of E. coli and

Saccharomyces cerevisiae have been evaluated with

network motif detection software tools by Alon and

Lee respectively. There are three significant motif

patterns in the transcriptional regulation network of

E. coli. Each of these network motifs becomes

apparent while comparing subnetworks of a

particular size in the E. coli network with those of

the same size in randomly generated networks.

Significant motif patterns in E. coli depend on the

number of nodes within the system. For a graph

with only three nodes, the only

significant network

motif found by the Alon team is the feedforward

loop. The feedforward loop is characterized by three

nodes; X, Y and Z. A transcription factor X

regulates a second transcription factor Y, which both

187

Luciano C. and Huang C. (2010).

STRUCTURAL MOTIF ENUMERATION IN TRANSCRIPTIONAL REGULATION NETWORKS.

In Proceedings of the First International Conference on Bioinformatics, pages 187-192

DOI: 10.5220/0002760001870192

 SciTePress

jointly regulate the operon Z. Alon terms the X the

‘general transcription factor’, Y the ‘specific

transcription factor’ and Z the ‘effector operon’.

Feedforward loops are also significant network

motifs in networks with more than three nodes; in

this case, transcription factors X and Y jointly

regulate one or more operons Z(1)…Z(n). The

feedforward loop has other significant

characteristics, of which the most important is

coherence. Shen-Orr describes coherence, “A

feedforward loop motif is ‘coherent’ if the direct

effect of the general transcription factor on the

effector operons has the same sign (negative or

positive) as its net indirect effect through the

specific transcription factor. For example, if X and

Y both positively regulate Z, and X positively

regulates Y, the feedforward loop is coherent. If, on

the other hand, X represses Y, then the motif is

incoherent” (Shen-Orr, Shai, Milo, Mangan and

Alon, 2002). Most feedforward loops are coherent

(85%). The feedforward loop occurs much more

often within the E. coli transcriptional regulation

network than would be expected by random chance.

Shen-Orr suggests that the coherent feedforward

loop has a significant functional structure; the ability

to act as a circuit that rejects transient activation

signals from the general transcription factor and

responds only to persistent signals, while at the same

time allowing a rapid system shut down through the

control of the general transcription factor. This

structure is useful way to coordinate a rapid

response to an external signal. Also, the abundance

of coherent feedforward loops over incoherent loops

suggests a functional design. Lee’s research on the

transcriptional regulatory networks of

Saccharomyces cerevisiae suggest that the

feedforward loop is also a significant network motif

within that system (Lee et al., Science 02). This

suggests that the feedforward loop may also be

significant within other biological system networks.

Additional network motifs emerge as subgraphs of

increasing numbers of nodes are evaluated. When

subgraphs of four nodes are evaluated, the

overlapping regulation motif becomes apparent. In

the overlapping regulation motif, two operons are

regulated by the same two transcription factors.

This type of overlapping regulation motif is a

smaller and specific form of dense overlapping

regulons (DORs), which are discussed later.

Other significant motif patterns within the

transcriptional regulation network of E. coli can be

seen when graphs with higher numbers of nodes are

evaluated. When subgraphs of larger than three

nodes are evaluated, the single input module (SIM)

network motif becomes significant. The SIM is

defined by a set of operons that are controlled by a

single transcription factor, where all of the operons

are under control of the same sign (positive or

negative). There is no additional transcriptional

regulation of the operons. The transcription factors

involved in SIM systems are mostly autoregulatory

(70%). Most of the autoregulatory transcription

factors are autorepressive. There is a higher rate of

autoregulatory transcription factors within SIM

motifs than in the overall system. In the E. coli

transcription regulation network, 70% of the

transcription factors involved in SIM motifs are

autoregulatory, compared to 50% in the overall

dataset. SIMs are found in systems of genes that

function stochiometrically to form a protein

assembly (e.g. flagella) or a metabolic pathway (e.g.

amino acid biosynthesis). SIM systems may involve

temporal ordering, where the first gene activated is

the last to be deactivated.

Dense Overlapping Regulons (DORs) are a type

of network motif found within E. coli when

evaluating larger subnetworks. DORs are composed

of layers of overlapping interactions between

operons and a group of input transcription factors

organized in a bipartite graph that is much more

dense than corresponding structures in randomized

networks. DORs are not a homogenous mesh of

interconnections; rather, they contain several loosely

connected, internally dense regions of combinatorial

interactions. The regions are somewhat overlapping,

and different criteria can yield slightly different

groupings. One way to quantify DORs is by the

frequency of pairs of genes regulated by the same

two transcription factors. Shen-Orr uses a clustering

approach to define DORs. An algorithm detects

locally dense regions in the network with a high

ratio of connections to transcription factors. Within

the E. coli netw

ork, there are six DORs, where

operons in each DOR share common biological

functions. Usually, every output operon is

controlled by a different combination of input

transcription factors, but there are multi-input

modules in rare cases where several operons in a

DOR are regulated by precisely the same

combination of transcription factors with identical

regulation signs (termed ‘multi-input modules’.

DORs are significant in the larger structure of

biological networks; they seem to partition the

operons into biologically meaningful combinatorial

regulation clusters. DORs also govern how several

different network motifs connect together within the

larger network. Shen-Orr describes patterns in the

overall structure of the E. coli network, “A single

BIOINFORMATICS 2010 - International Conference on Bioinformatics

188

layer of DORs connects most of the transcription

factors to their effector operons. Feedforward loops

and SIMs often occur at the outputs of these DORs.

The DORs are interconnected by the global

transcription factors, which typically control many

genes in one DOR and a few genes in several

DORs” (Shen-Orr et al., Nature 02). Over 70% of

the operons are connected to DORs; the rest of the

operons are in small disjoint systems, with most

disjoint systems having only one to three operons.

1.2 Motif Detection Tools

There are a number of software tools dedicated to

network motif detection. Most of them employ

different algorithms to achieve this task. In order to

find network motifs, the software tool must find

which subgraphs occur in the input network and in

what number, determine which subgraphs are

isomorphic (equivalent), and determine which

subgraph classes of isomorphic graphs are displayed

at higher rates than in random graphs. This means

random graphs must also be generated. FANMOD

is a newer motif detection tool that uses a random

enumeration sampling algorithm. FANMOD uses

the NAUTY algorithm (McKay, 1981) in order to

group isomorphic graphs together into subgraph

classes. It also supports colored graphs, a useful

feature that other software tools do not support.

Support of colored graphs is a highly useful feature

for motif detection in biological networks because

elements that should not be connected to one

another, such as in a bipartite system, can be

assigned the same color. This is a computationally

effective way to avoid the generation of unnecessary

random graphs for comparison. FANMOD employs

a randomized enumeration algorithm called RAND-

ESU. It works by first taking an algorithm for full

enumeration and then modifying it to skip over some

subgraphs randomly as the algorithm is executed.

FANMOD also has the advantage of running much

faster than similar programs. Other software tools

include MAVISTO, which visualizes occurrences of

a motif in a network by a force-directed graph

algorithm, and MFINDER, which uses a different

algorithm called edge sampling (Wernicke and

Rasche, 2006). Edge sampling works by first

selecting a random edge in the input graph, and then

the edge is randomly extended until a connected

subgraph with the desired number of vertices is

obtained. However, edge sampling has distinct

disadvantages. Wernicke has shown that the edge

sampling algorithm results in a sampling bias, and

that the bias cannot be estimated from the number of

edges neighboring the oversampled subgraph alone.

1.3 This Study

Enumeration of the subgraphs of a particular size

within a larger graph is a computationally expensive

and time consuming task. As the size of the

subgraphs increases, the process becomes unwieldy

and current algorithms take far too long to execute.

Two of the major aspects involved in improving

network motif detection tools are improving full

enumeration algorithms for faster runtimes, and

improving the sampling of motifs so that the

algorithm is able to identify those motifs most likely

to be functionally relevant. The transcriptional

regulation network of E. coli has been analyzed by

Shen-Orr using a Markov-chain algorithm to

generate random networks for comparison.

FANMOD uses the previously discussed RAND-

ESU randomized enumeration algorithm to generate

random networks. We chose to evaluate Shen-Orr’s

E. coli transcriptional regulation network data with

FANMOD in order to see if other algorithms would

also identify the network motifs the Shen-Orr team

found significant.

The study consisted of analyzing Shen-Orr’s

E. coli transcription regulation network using the

FANMOD motif detection software tool. We

downloaded the E. coli transcription regulation

network data from the Uri Alon lab website to use as

the input file for FANMOD. Next, we ran both the

full enumeration and sampling algorithms with

FANMOD for subgraphs of increasing size, from

three nodes to five nodes. The sampling algorithm

takes less time to execute, but provides less

information and identifies fewer network motifs than

full enumeration. Sampling improves runtime, but

at the loss of the identification of functional network

motifs. The fact that the FANMOD sampling

algorithm returns far fewer significant network

motifs than full enumeration shows that there is

room to improve the sampling algorithm. However,

the more important quality in a sampling algorithm

is not so how many motifs it identifies compared to

full enumeration, but whether it is able to identify

those motifs that are most functionally relevant.

There are a few features that could improve the user

experience of FANMOD. FANMOD generates

diagrams of the most significant network motifs. It

would be useful to be able to highlight substructures

within these diagrams; for example, to highlight all

feedforward loops within subgraphs of a particular

size larger than three. Also, network motif detection

STRUCTURAL MOTIF ENUMERATION IN TRANSCRIPTIONAL REGULATION NETWORKS

189

tools could allow the user to specify the types of

network motifs that person is most interested in

seeing. For example, the ability to show graphs that

are bipartite or nearly so (80-90% of edges

connecting different colors) would be useful for

those studying such systems.

2 RESULTS

FANMOD provides several statistical values

alongside significant network motifs. The Z-score is

one way of determining how significant a network

motif is. The Z-Score is the original frequency

minus the random frequency divided by the standard

deviation. Motifs with the highest Z-scores are the

most significant, so the following tables of motifs

are organized in order of decreasing Z-score. P-

Values range from zero to one; smaller p-Values

indicate more significant motifs because a smaller p-

value indicates that the motif occurs more often in

the network than would occur by random chance.

The p-Value is calculated in the following way,

“The p-Value of a motif is the number of random

networks in which it occurred more often than in the

original network, divided by the total number of

random networks.”

The following tables show the results for full

enumeration and sampling of the E. coli network,

enumerating subgraphs of size three. All graphs are

ordered by descending Z-Score, so that the most

significant network motifs are listed first. Full

enumeration of the network at subgraph size three

identified two significant network motifs, whereas

the sampling algorithm just identified one significant

network motif. The full enumeration data shows the

average values from three trials. Two of the five

trials for the sampling algorithm of

subgraph size

three resulted in no identification of significant

network motifs. The sampling data shows the

average values from the remaining three trials.

The table below shows the three most significant

network motifs of subgraph size four identified

using full enumeration. All of the seven significant

network motifs for subgraph size four using full

enumeration are bipartite graphs or contain at least

one feedforward loop. The two motifs with the

highest Z-scores contain two feedforward loops, and

the third most significant is a bipartite graph. The

remaining four motifs contain one feedforward loop.

For sampling of subgraph size four, only one trial

out of three trials identified any significant network

motifs at all. The one network motif identified was

the sixth most significant according to the full

enumeration data.

Table 1: Significant network motifs identified using

FANMOD, full enumeration and subgraph size 3 (n=3).

(Three trials)

Feedforward

Loop

Bipartite

Graph

Average

Frequency

[Original]

0.80676% 91.76%

Average

Mean-

Frequency

[Random]

0.14468%

91.216%

Average

Standard-

Deviation

[Random]

0.00058784 0.00048442

Average

Z-Score

11.264 11.221

Average

p-Value

0 0

Table 2: Significant network motifs identified using

FANMOD, sampling and subgraph size 3 (n=3).

(Three Trials)

Feedforward

Loop

Average Frequency

[Original]

0.99259%

Average Mean-

Frequency [Random]

0.10969%

Average Standard-

Deviation [Random]

0.0015703

Average Z-Score 5.6127

Average p-Value 0.003667

The full enumeration algorithm identified the 20

most significant network motifs for subgraphs of

size five. Each of the three trials identified 20

network motifs; 22 distinct motifs total. All of the

22 motifs are either bipartite graphs or contain at

least one feedforward loop. Two of the 22 network

motifs contain three feedforward loops and three of

the motifs are bipartite graphs. Ten of the motifs

contain two feedforward loops, and the remaining

seven graphs contain only one feedforward loop

each. The table below shows the two network

motifs found with full enumeration of subgraph size

five that contain three feedfoward loops. The first

BIOINFORMATICS 2010 - International Conference on Bioinformatics

190

network motif listed was found to have the highest

Z-score for all three trials, and the second was in the

top ten motifs in every trial. The next table shows

the three bipartite graphs.

Table 3: Significant network motifs identified using

FANMOD, full enumeration and subgraph size 4 (n=4).

Network

Motif

Average

Z-score

Motif Rank

Trial 1 Trial 2 Trial 3

24.927 1 1 1

15.641

2 2 2

10.533

3 3 3

Table 4: Significant network motifs identified using

FANMOD, sampling and subgraph size 4 (n=4).

Network Motif

Frequency [Original] 0.62167%

Mean-Frequency

[Random]

0.050375%

Standard-Deviation

[Random]

0.00074301

Z-Score 7.6889

p-Value 0.001

The sampling algorithm identified five

significant motifs in one trial and ten in another.

The third trial identified no significant motifs. The

two trials together identified 12 distinct motifs. 9 of

these 12 motifs are included in the 22 motifs

identified using full enumeration. Of the 12 motifs

identified using sampling, one motif contains three

feedforward loops, one motif contains two

feedforward loops, and seven contain one

feedforward loop. The remaining three motifs

contain no feedforward loops. The second motif

containing three feedforward loops in the table

below was also identified in one of the sampling

trials, ranked first with a Z-score of 10.528. As can

be seen in the table below, the average Z-score for

the same motif identified using full enumeration is

16.047.

Table 5: Significant network motifs containing three

feedforward loops, identified full enumeration and

subgraph size 5 (n=5).

Network

Motif

Average

Z-score

Motif Rank

Trial 1 Trial 2 Trial 3

206.22 1 1 1

16.047 6 8 9

Table 6: Significant network motifs that are bipartite

graphs, identified full enumeration and subgraph size 5

(n=5).

Network

Motif

Average

Z-score

Motif Rank

Trial 1 Trial 2 Trial 3

17.474 7 7 6

11.31933

10 11 10

5.6591

18 19 17

3 CONCLUSIONS

Every single significant network motif identified

using full enumeration for subgraph sizes of three,

four and five is either a bipartite graph or contains

one or more feedforward loops. Most of the

network motifs identified using sampling also

contain one or more feedforward loops. The data

supports the notion that the feedforward loop and

bipartite graphs are statistically significant structures

in the transcriptional regulation network of E. coli.

The data supports Shen-Orr’s research that shows

the feedforward loop as the most significant network

motif at subgraph size three. Additionally, the

second network motif that was detected using full

enumeration is a bipartite graph, suggesting that

bipartite graphs are significant in the E. coli

transcription regulation network. Many of the

network motifs found to be significant from the

enumeration and sampling of larger subnetwork

sizes contain feedforward loops or are bipartite

graphs. When the subgraph size is increased to four,

the full enumeration data shows that the two most

STRUCTURAL MOTIF ENUMERATION IN TRANSCRIPTIONAL REGULATION NETWORKS

191

significant network motifs each contain two

feedforward loops. The third most significant motif

is the previously-described overlapping regulon

motif, a bipartite graph. All of the remaining

identified motifs for full enumeration of size four

graphs contain one feedforward loop with one

additional edge. Overall, six out of the seven

significant motifs contain at least one feedforward

motif, with the two most significant motifs

containing two feedforward loops. The remaining

motif is a bipartite graph. This further supports that

feedforward loops and bipartite systems are

significant within the E. coli transcription regulation

network. The sampling data for subgraphs of size

four only identified one of the significant motifs, the

sixth most significant according to the full

enumeration data. It consists of a feedforward loop

with one additional edge. The most significant

motifs at subgraph size five also contain feedforward

loops, with those ranked highest containing two or

three feedforward loops. The sampling data showed

fewer network motifs than full enumeration in all

cases. For subgraphs of sizes four and five, the most

significant motif identified by sampling was not

represented in the top five most significant motifs

identified using full enumeration. Sampling has

major shortcomings when compared to full

enumerations; it identifies both fewer and less

significant network motifs. Overall, the FANMOD

data shows that feedforward loops and bipartite

graphs are significant network motifs in the

transcriptional regulation network of E. coli.

4 FUTURE WORK

There are several ways to improve network motif

detection tools. One way to improve sampling

results in FANMOD could be to alter the default

probability settings in order to better favor sampling

the network more evenly. This is discussed in detail

in Section 5 of the FANMOD manual. The default

settings are 0.5 for all probability fields, but

organizing the probability fields with high

probabilities in the left fields and lower probabilities

in the right fields increases the chance that the

network will be sampled more evenly. However,

preliminary results for sampling using the

probabilities 0.8, 0.5 and 0.3 for subgraph size three

resulted in identification of the feedforward loop

with a lower average Z-score from three trials

(4.8304) than sampling subgraphs of size three with

the probabilities 0.5, 0.5, 0.5 (Z-score 5.6127). This

indicates that the feedforward loop was actually

found to be less significant using the alternative

descending probabilities. More trials could be

conducted using subgraphs of different sizes to

determine whether an altered probability pattern

improves the results from sampling compared to

those from full enumeration.

FANMOD also supports several models for

randomized network generation. Our study used the

local constant model, where directed edges are

exchanged with one another and the number of

edges connected to each vertex remains constant.

FANMOD also supports a global constant model

that preserves the total number of edges, but the

number of edges connected to a specific vertex may

or may not remain the same. The transcriptional

regulation network of E. coli could also be evaluated

using the global constant model with full

enumeration and sampling in order to see whether

the motifs identified as significant change.

Additionally, it would be useful to test other

transcriptional regulation networks to see if

structures such as the feedforward loop are

significant in other biological networks.

REFERENCES

Kashtan, N., Itzkovitz, K., Milo, R. and Alon, U. (2004)

“Efficient sampling algorithm for estimating subgraph

concentrations and detecting network motifs”,

Bioinformatics, 20(11):1746-1758.

Lee, Tong Ihn et al. (2002) “Transcriptional Regulatory

Networks in Saccharomyces Cerevisiae,” Science,

298(5594):799-804.

McKay, B. (1981) “Practical Graph Isomorphism”,

Congressus Numerantium, Vol 30, 45-87.

Milo, R., Shen-Orr, S. Itzkovitz, K., Chklovskii, D.

and Alon, U. (2002) “Network Motifs: Simple

Building Blocks of Complex Networks”, Science,

298(5594):824-827.

Schreiber. F. and Schwobbermeyer, H. (2005)

“Mavisto: a tool for the exploration of network

motifs,” Bioinformatics, 21(17), 3572-3574.

Shen-Orr, Shai S., Milo, R., Mangan, S. and Alon, U.

(2002) “Network motifs in the transcriptional

regulation network of Escherichia coli,” Nature

Genetics, Vol 31, 64-68.

Wernicke, S. and Rasche, F. (2006) “FANMOD: a tool

for fast network motif detection,” Bioinformatics,

22(9):1152–1153.

BIOINFORMATICS 2010 - International Conference on Bioinformatics

192