Massive Data Flows

Self-organization of Energy, Material, and Information Flows

Takashi Ikegami

and Mizuki Oka

The University of Tokyo, Tokyo, Japan

University of Tsukuba, Tsukuba, Japan

Keywords:

Massive Data Flows, Self-organization, Artiﬁcial Life, Complex Systems, Web.

Abstract:

As opposed to “Big Data” as a buzz word, we attempt to ﬁnd a new pattern or structure generated by self-

organization in the ﬂow of the massive data. We call this approach Massive Data Flows (MDF). Rather than

making use of “Big Data”, we are interested in the new phenomena and theory that allows us to deal with

the data without losing the autonomy, complexity, dynamics and structure that the data itself has. MDF is a

generic term used to identify a new kind of system dynamics: self-organization in complex open environments.

Composed of many interacting heterogeneous elements, MDF systems exhibit self-referential, self-modifying,

and self-sustaining dynamics, that can enable door-opening innovation. While the web may be the best exam-

ple of an MDF system, the concept is generic to natural/artiﬁcial systems such as brains, cells, markets and

ecosystems. In this paper, we exemplify ﬁve systems; the default mode network and the excitability of the

web, the autonomous sensor network, chemical oil droplets, and court and cave computation with a many-core

system as potential MDF systems.

1 INTRODUCTION

Analyses of “Big Data” from the web and sensory

data have recently become the focus of attention.

However, the development of data mining techniques

is still in progress for the analysis of large data sets,

so conventional techniques are being applied. It is

yet difﬁcult to effectively deal with complex data with

possibly a very large degree of freedom using conven-

tional approaches that execute the analysis in a top-

down manner. Thus, a new kind of bottom-up mining

method, which can be referred to as data driven, is

necessary to deal with the “Big Data.”

As opposed to “Big Data” as a buzz word, we at-

tempt to ﬁnd a new pattern or structure generated by

self-organization in the ﬂow of the massive data

. We

call this approach Massive Data Flows (MDF). MDF

is a generic term used to identify a new kind of sys-

tem dynamics: self-organization in complex open en-

vironments. Composed of many interacting heteroge-

neous elements, MDF systems exhibit self-referential,

As part of this effort, we have organized workshops

called Massive Data Flows at Japanese artiﬁcial intelli-

gence conferences since 2011 as well as at an international

workshop of the European Conference on Artiﬁcial Life in

2013.

self-modifying, and self-sustaining dynamics, that

can enable door-opening innovation. While the web

may be the best example of an MDF system, the con-

cept is generic to natural/artiﬁcial systems such as

brains, cells, markets and ecosystems.

Unlike systems studied in isolation or at equi-

librium, MDF systems are open and driven systems

existing within a rich context, constantly changing,

growing, evolving, and thereby autonomously chang-

ing the way in which they interact with the environ-

ment around them. The patterns that they exhibit are

neither imposed from outside, nor arising internally,

but are a consequence of the interface between the

endogenous and exogenous data ﬂows. If “Big Data”

systems exhibit volume, velocity and variety, MDF

systems exhibit vitality.

A series of methods for data analyses and vi-

sualization are being developed, such as a self-

organization map, ant colony optimization, particle

swarm optimization and evolutionary computation.

However, these methods are not created to target large

data, and we need to establish a bottom-up method to

target these data. One of such methods that has re-

cently attracted attention and uses multilayered neu-

ral networks is called deep learning (Hinton et al.,

2006). For example, researchers at Google experi-

237

Ikegami T. and Oka M..

Massive Data Flows - Self-organization of Energy, Material, and Information Flows.

DOI: 10.5220/0004907102370242

In Proceedings of the 6th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2014), pages 237-242

ISBN: 978-989-758-016-1

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Examples of self-organization. (Left) Progogine’s hexagonal lattice; (Middle) Karman vortex street;

(Right) Belousov-Zhabotinsky chemical reaction. Image of B´enard Cell is taken from http://www.dichotomistic.com/

hierarchies

fractals.html. The image of the Karman vortex street is taken from http://en.wikipedia.org/wiki/

Karman

vortex street. The screenshot of the Belousov-Zhabotinsky chemical reaction has been generated by a simulator

at http:// dencity.jp/simulator/bz.html.

mented with the images of YouTube, using an artiﬁ-

cial neural network of 16,000 nodes, and found that

there are speciﬁc neurons that react to videos of a

cat and speciﬁc ones that respond only to a person’s

body (Le et al., 2012). The deep-learning method

takes the approach of extracting the structure that the

system self-organizes when a large amount of data are

involved and shares some conceptual interests with

the MDF approach.

Another example can be found in a project called

SpeecHome by Deb Roy (Vosoughi et al., 2012). Deb

Roy and his colleagues put up video cameras and au-

dio sensors around his house and recorded the growth

of his own child for over three years. On the ba-

sis of this life-log data, Deb Roy captured the entire

process of language acquisition of the child. There

have been previous studies based on anecdotal theory

about children’s developmental processes, but none

involved a longitudinal study with systematic record-

ing of a child in daily life. In addition, the same data

can be different when the point of reference changes

or has a different context, which was clearly shown

by the SpeecHome project. This kind of study sug-

gests that enormous datasets, including non-typical

and those used anecdotally, are needed for unraveling

complex phenomena.

The emphasis of this paper is that we should create

new methods and language in order to synthesize and

describe the self-organizing aspect of massive data

ﬂows. Here, we extend the meaning of data to include

material, energetics and information ﬂows in order to

capture the kind of complexity that we are exploring.

2 SELF-ORGANIZATION AND

MDF

From the long-term studies on non-linear and non-

equilibrium systems, there are ample examples of

self-organization in various systems ranging from

simple physical systems to complex biological ones.

For example, the B´enard Cell is observed in hori-

zontally layered ﬂuid heated from below; this is also

known as Progogine’s hexagonal lattice (Prigogine,

1980). The Karman vortex street is a successive for-

mation of vortices behind a cylinder in ﬂuid ﬂow from

the front. The Belousov Zhabotinsky chemical reac-

tion on a petri dish shows spatial and temporal oscil-

latory patterns (see Figure 1).

For all these examples, patterns emerge by in-

creasing energy or material ﬂows from outside. Be-

yond a certain critical ﬂow value, the patterns are self-

organized. This can be illustrated with the bifurcating

process from a (thermal) equilibrium state to dynamic

non-steady states with different periodicities and even

chaotic phases. It has also been said that the stripes on

ﬁsh and shells are biological examples of these self-

organized patterns (Kondo and Miura, 2010; Mein-

hardt, 2003).

A typical research area that deals with self-

organization is artiﬁcial life as a part of complex sys-

tem sciences. The aim of the study of artiﬁcial life is

to construct life-like phenomena based on programs

or non-organic components. What we call life-like

phenomena are those that have autonomy, evolvabil-

ity, enaction, and adaptability, which we synthesize

by using autonomous robots or algorithmic chemistry.

Recent studies have also explored these ideas as living

technology in real life (Ikegami, 2013).

What will happen if we further increase the energy

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

238

or material ﬂows beyond the critical values? When

a system is exposed to something beyond the crit-

ical value, and further to excessive ﬂows, patterns

will decay and the system may no longer be able

to sustain itself, i.e., a cylinder in the ﬂow will be

destroyed by the pressure; but it may also generate

second order self-organization, i.e., a higher order

self-organization to cope with the excess input ﬂows.

Examples of second-order self-organization could be

the evolution of new species, technological innova-

tions (Bedau, 2012), and new web services, most of

which are strongly related to biological adaptive sys-

tems. It is not the pattern self-organized on the surface

of the bodies but the system itself that will adapt to

the excess ﬂows. In other words, a self-organization

mechanism is not only attributed to the system’s in-

herent dynamics but also to the excess ﬂows from out-

side.

In the following sections, we will see such second-

order self-organization in examples from our recent

studies.

2.1 Web Default Mode Network

The web is a candidate for life-like phenomena in

which services that run on the web must deal with

massive data ﬂows where the underlying structure and

the overlying information ﬂow changes constantly.

Such spatially and temporally extended web space

can be used as a metaphor for living states and/or con-

scious states. Indeed, the web picks up the uncon-

scious state of collective human behaviors (e.g., rec-

ommendations of products or advertisements based

on the user’s collective behaviors are a classic exam-

ple). Analyzing the web data could open up a new

direction of science.

Social networking services (e.g., Twitter, Face-

book, Google Plus) are now major sources of the web

dynamics, together with web search services (e.g.,

Google, Yahoo, and Bing). These two types of Web

services mutually inﬂuence each other but generate

different dynamics. We distinguish two modes of web

dynamics: the reactive mode and the default mode.

It is assumed that Twitter messages (called “tweets”)

and Google search queries react to signiﬁcant so-

cial movements and events, but they also demonstrate

signs of becoming self-activated, thereby forming a

baseline web activity. We deﬁne the former as the

reactive mode and the latter as the default mode of

the web. We investigated these reactive and default

modes of the web’s dynamics using transfer entropy

(TE) (Oka and Ikegami, 2013).

We collected tweets (in Japanese) over a two-

year period by applying morphological analysis to ex-

Figure 2: The role of each keyword. (Top) The ratio of key-

words becoming sources and sinks, shown as a function of

keyword frequency over time. Red shows the source ratio

and blue shows the sink ratio, as a function of keyword fre-

quency over time. The frequent keywords tend to become

source nodes, and infrequent keywords tend to become sink

nodes. (Bottom) Strong mediators are deﬁned as having

ample incoming and outgoing transfer entropy (TE) ﬂow,

and weak mediators are deﬁned as those with both weak

incoming and outgoing TE ﬂow.

tract the 1,000 most frequently used Japanese nouns

in the tweets and used these as keywords. Analysis

of the time series with information transfer measure-

ment shows that the more-frequent keywords become

the upper stream of information ﬂow (in the sense of

transfer entropy), and the less-frequent keywords be-

come the down stream (see Figure 2). The informa-

tion is therefore transferred from the more to less fre-

quent keywords for the minimum time mesh around 1

hour. However, interestingly, the tendency are some-

times reversed for the time mesh of a few minutes.

We interpret this as different causal relationships can

be organized in different time scales, corresponding

to the time scales of local tweeting (less frequent key-

words) and the global atmosphere of Twitter (more

frequent keywords).

Analogous to the default mode network (DMN) in

the brain, we name this information transfer pattern in

Twitter as the web DMN since without a signiﬁcant

event from the outside, the Twitter system can main-

tain and organize its ﬂow pattern. The web DMN also

transfers information to the less frequent keywords,

which often have a bursting behavior reacting to the

external inputs, so that the upper stream of the trans-

fer information ﬂow of the longer time scale can serve

as a default mode. We argue that DMN is an example

of self-organization of the MDF since internal and ex-

ternal information transfer across the web is the cause

of this DMN. The web network topology is constantly

changing, and the constituent elements are very het-

erogeneous, which is an aspect of second-order self-

organization.

MassiveDataFlows-Self-organizationofEnergy,Material,andInformationFlows

239

12/06/30 12/07/28 12/08/25 12/09/22 12/10/20 12/11/17 12/12/15 13/01/12 13/02/09 13/03/09

Date

12/06/30 12/07/28 12/08/25 12/09/22 12/10/20 12/11/17 12/12/15 13/01/12 13/02/09 13/03/09

Date

12/06/30 12/07/28 12/08/25 12/09/22 12/10/20 12/11/17 12/12/15 13/01/12 13/02/09 13/03/09

Frequency FrequencyFrequency Frequency

Date

Type 1: constant - “joy”

Type 2: periodic (small time scale) - “Monday”

Type 2’: periodic (large time scale) - “flu”

Type 3: intermittent - “earthquke”

Figure 3: Examples of time series (red lines) and detected

bursts (gray bars; the height indicates the burst level) with

different dynamics: type 1) the noisy type (joy); type 2) the

periodic type with a small time scale (Monday); type 2’) the

periodic type with a large time scale (ﬂu); and type 3) the

intermittent type (earthquake).

2.2 Self-organization of Bursting

Behaviors on Social Media

Twitter can be taken as an extended sensor of people’s

collective interests. The output pattern of the sen-

sor for each fact/event appears in the time series that

contains the keywords in their tweets. An increase

in the popularity of events, which are reﬂected in the

time series as a burst, cause an increase in frequency

(see Figure 3). We studied bursting behavior in rela-

tion to the structure of ﬂuctuation to reveal the origin

of bursts. More speciﬁcally, we studied the tempo-

ral relationship between a preceding baseline ﬂuctu-

ation and the successive burst response, using noun

frequency from Twitter data as described above.

As a result, we found a speciﬁc ﬂuctuation thresh-

old beyond which a strong burst occurs (Oka et al.,

2014). The bursts below this threshold are caused by

interactions among the social network, and the thresh-

old is self-organized as a result of such interactions.

Above this threshold, the response size becomes un-

predictable, and a wide range of burst sizes appear.

The threshold is different for a time series of each

noun. Including a power-law behavior of burst sizes,

there are a variety of ﬂuctuation dynamics that self-

organize this threshold for each noun. This excitable

property of Twitter can also be taken as a sign of self-

organization driven by the MDF because the variety

of information ﬂow behind the web and real-world

events mutually affect each other to determine its na-

ture as an excitable media.

Figure 4: (Top) Implemented sensor unit and (Bottom) in-

stallation of the autonomous sensor network (ASN) system

as a sound installation in a gallery in Tokyo.

2.3 Autonomous Sensor Networks

We previously proposed and studied an autonomous

sensor network (ASN) as a new challenge for study-

ing self-organization in a long-term and open-ended

environment (Maruyama et al., 2013). We proposed

an ASN that is spatially distributed in the real world

(see Figure 4.) One node has two sensors, light and

humidity, that sense the corresponding environmen-

tal information with an adaptive sensing periodicity

(or cycle). The sensor information obtained by each

node, which is controlled by two XBees and two Ar-

duinos, is sent to other nodes via wireless connec-

tions. The uniqueness of the sensor network is that

we employ artiﬁcial chemistry to control the sampling

rate of each sensor autonomously (i.e., sensors are not

simply reacting to environmental changes but some-

times resisting them).

In each sensor unit, we let the sensory inputs cause

the reaction, and the reaction speed determines the

sampling rates of each sensor. A minimal nonlinear-

ity introduced by the artiﬁcial chemistry can foster

some unexpected spontaneous temporal oscillations

in the sampling rates, which we call the resonating

state as opposed to the resting state of the network.

The resonating state can vary drastically depending

on the light intensity and the coupling with the hu-

midity sensor. The resting state is similar to the de-

fault mode of the network, which organizes the base-

line activity of the network. We studied an eight-

node autonomous sensor network to see the dynamic

changes of network states in a week in a half-open

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

240

space. A most interesting behavior of ASN is the

spontaneous transition between a resting state and the

resonating state. We argue that ASN provides a prin-

ciple to make a second-order self-organization driven

by the MDF. Again, the condition for MDF-driven

self-organization is a reaction between internal dy-

namics and huge input ﬂows from outside. In the

case of ASN, light intensity and humidity ﬂow cou-

pled with the sensor network with adaptive sampling

rate dynamicsdetermine the self-organization. We are

still investigating its complex long-term behavior in

open space.

Figure 5: A photo image of self-moving oil droplets emerg-

ing autonomously. The convection ﬂow inside the droplet

is observed and the product of reaction (mostly oleic acid

molecules) being secreted from the tail.

2.4 Self-moving Oil Droplets

Another example of MDF can be found in self-

moving oil droplets (Hanczyc et al., 2007; Hanczyc

and Ikegami, 2010). We experimented and discov-

ered the emergence of self-moving oil droplets about

several hundred micrometers in size by pouring oleic

anhydrous acid into a high pH aqueous solution (see

Figure 5). An oil droplet is covered with oleic acid

as a reaction between the oil and water, it senses the

chemical gradient by generating an internal pH gradi-

ent; it avoids low pH regions (< 10), preferring high

pH (> 11) regions.

Its movement comes from the chemical reaction

on the surface of the droplet, inside convection ﬂows

and the droplet shape. Such a self-moving droplet can

be viewed as the origin of a soft-bodied robot. We say

this is the MDF example, since it is self-sustaining

self-organizing system copying with the environmen-

tal ﬂows. If this droplet could sense and adapt to more

diverse environmental patterns and ﬂows, it would

show more complex functionalities. This also pro-

vides a new design principle for MDF for producing

a self-organizing robot.

Figure 6: Overview of the cave and court computation

scheme on a many-core machine.

2.5 Concurrent Computation

Architecture on a Many-core

Machine

The web is made accessible through search engines,

such as Google, that construct the architecture so that

the system can handle huge amounts of data by opti-

mizing the throughput of the system. In particular,

this can maintain the consistency of the data when

running on many machines with many processors. We

are interested in understanding how concurrently pro-

cessing computational threads can compete indepen-

dently but cooperatively to resolve the inconsistency

produced by the concurrent process.

To examine this question, we investigated a many-

core machine that performs concurrent operations and

found that non-cooperativecomputational threads can

successfully organize a whole computational task.

More speciﬁcally, we proposed a concurrent architec-

ture, which enables effective concurrent computation

on a many-core machine by separating two phases;

court and cave (Oka et al., 2013). A unique point of

the court and cave computation is that it performs op-

erations simultaneously on shared resources without

excluding access for each thread (see Figure 6). We

conducted data management experiments by varying

the different number of cores on a multi-core machine

and investigated the characteristic dynamics for when

the highest performance is observed. We discovered

that the temporal dynamics of the number of opera-

tions changes from a noisy to bursty pattern at an op-

timal point.

The cave and court computational architecture is

another type of MDF self-organization since it is self-

modifying system coupling with a large data set. The

input data stream is distributed among many threads

in the cave phase but those threads are interacting in

the court phase in order to resolve inconsistency and

MassiveDataFlows-Self-organizationofEnergy,Material,andInformationFlows

241

re-organizing the CPU resource distributions. Syn-

chronization and desynchronization of the temporal

dynamics of each thread lead to the emergenceof self-

organization in this concurrent computation schema.

3 CONCLUSIONS

The concept of MDF provides a new methodology for

understanding data ﬂows, including material, energy

and information ﬂows. Analogous to the Darwinian

evolution and the organization of an ecological sys-

tem, MDF patterns grow, and this growth determines

the organization of system’s own state autonomously,

i.e. organization of data by the data for the data.

The self-organization we see here is related to

what we call open-ended evolution, i.e., formation of

innovative properties due to evolutionary dynamics.

In the ﬁeld of artiﬁcial life, ﬁnding the prerequisite

conditions for having open-ended evolution has been

an obsession. For example, the emergence of popu-

lations of patents issued in the U.S. has been studied

by Bedau et al. (Bedau, 2012) to show which patent

leads the subsequent evolution of patents; they exam-

ined the complexity of the evolution of patents and

compared this to biological evolution.

MDF is the generic term that explains the co-

evolution of excess ﬂows and the adaptive system

in which self-organizational patterns successively oc-

cur. The default mode network and the excitability

of the web, the autonomous sensor network, chemi-

cal oil droplets, and court and cave computation with

a many-core system are examples of potential MDF

systems.

ACKNOWLEDGEMENTS

We would like to express our sincere gratitude to

our collaborators, Dr. Yasuhiro Hashimoto, Profes-

sor Kazuhiko Kato and Norihiro Maruyama for the

studies mentioned in this paper. We would also

like to express the deepest appreciation to Profes-

sor Seth Bullock for stimulating and insightful com-

ments and discussions. This work was supported

by the Japan Society for the Promotion of Science

Grant-in-Aid for Young Scientists (B) (#25730184),

Grant-in-Aid for Scientiﬁc Research on Innovative

Areas (#24120704), and Grand-in-Aid for Scientiﬁc

Research (B) (#24300080).

REFERENCES

Bedau, M. A. (2012). Minimal memetics and the evolution

of patented technology. Foundations of Science, pages

1–17.

Hanczyc, M. M. and Ikegami, T. (2010). Chemical basis for

minimal cognition. Artiﬁcial Life, 16(3):233–243.

Hanczyc, M. M., Toyota, T., Ikegami, T., Packard, N., and

Sugawara, T. (2007). Chemistry at the oil-water inter-

face: Self-propelled oil droplets. J. Am. Chem. Soc.,

129(30):9386–9391.

Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast

learning algorithm for deep belief nets. Neural Com-

putation, 18(7):15271554.

Ikegami, T. (2013). A design for living technology: Ex-

periments with the mind time machine. Artiﬁcial Life,

19(3-4):387–400.

Kondo, S. and Miura, T. (2010). Reaction-diffusion model

as a framework for understanding biological pattern

formation. Science, 329(5999):1616–1620.

Le, Q., Ranzato, M., Monga, R., Devin, M., Corrado, G.,

Chen, K., Dean, J., and Ng, A. (2012). Building high-

level features using large scale unsupervised learning.

In Proc. of the 29th International Conference in Ma-

chine Learning, pages 81–88.

Maruyama, N., Oka, M., and Ikegami, T. (2013). Creat-

ing space-time affordances via an autonomous sensor

network. In Proc. of the 2013 IEEE Symposium on

Artiﬁcial Life, pages 67–73.

Meinhardt, H. (2003). The Algorithmic Beauty of Sea

Shells. Springer.

Oka, M., Hashimoto, Y., and Ikegami, T. (2014). Self-

organization on social media: endo-exo bursts and

baseline ﬂuctuations. In submitted, pages –.

Oka, M. and Ikegami, T. (2013). Exploring default mode

and information ﬂow on the web. PLoS ONE,

8(4):e60398.

Oka, M., Ikegami, T., Woodward, A., Zhu, Y., and Kato,

K. (2013). Cooperation, congestion and chaos in con-

current computation. In Proc. of the 12th European

Conference on Artiﬁcial Life, pages 498–504.

Prigogine, I. (1980). From Being to Becoming: Time and

Complexity in the Physical Sciences. W.H.Freeman

and Co Ltd.

Vosoughi, S., Goodwin, M. S., Washabaugh, B., and Roy,

D. (2012). A portable audio/video recorder for longi-

tudinal study of child development. In Proc. of the

14th ACM International Conference on Multimodal

Interaction, pages 193–200.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

242