cessing, feature extraction, and training/classification.
The classical MARF was extended (Mokhov,
2006) to allow the stages of the pipeline to run
as distributed nodes as approximately illustrated in
Figure 1. The basic stages and the front-end
were implemented without backup recovery or hot-
swappable capabilities at this point; just communi-
cation over Java RMI (Wollrath and Waldo, 2005),
CORBA (Sun Microsystems, 2004), and XML-RPC
WebServices (Sun Microsystems, 2006).
Figure 1: The Distributed MARF Pipeline.
There are a number of applications that test
MARF’s functionality and serve as examples of
how to use MARF’s modules. One of the most
prominent applications is SpeakerIdentApp – Text-
Independent Speaker Identification (who is the
speaker, their gender, accent, spoken language, etc.).
Its distributed extension is designed to support high-
volume processing of recorded audio, textual, or im-
agery data among possible pattern-recognition and
biometric applications of DMARF. Most of the em-
phasis in MARF was in audio, such as conference
recordings (Mokhov, 2007) with purpose of attribu-
tion of uttered material to speakers’ identities. Sim-
ilarly, a bulk of recorded phone conversations can
be processed in collaborating police departments for
forensic analysis and biometric subject identification.
Here through runs of MARF’s pipeline instances on
a remote machine an investigator has the ability of
uploading from, e.g., a laptop, PDA, or cellphone
collected voice samples to the servers constituting a
DMARF-implementing network.
DMARF Self-optimization Requirements
DMARF’s capture as an autonomic system primar-
ily covers the autonomic functioning of the dis-
tributed pattern-recognition pipeline and its optimiza-
tion, specifically its most computationally and I/O
intensive Classification stage. The two major func-
tional requirements applicable to large DMARF in-
stallations related to self-optimization are discussed
further:
• Training set classification data replication. A
DMARF-based system may do a lot of mul-
timedia data processing and number crunching
throughout the pipeline. The bulk of I/O-bound
data processing falls on the sample loading stage
and the classification stage. The preprocessing,
feature extraction, and classification stages also
do a lot of CPU-bound number crunching, matrix
operations, and other potentially heavy computa-
tions. The stand-alone local MARF instance em-
ploys dynamic programming to cache intermedi-
ate results, usually in the form of feature vectors,
inverse co-variance matrices, and other array-like
data. A lot of these data are absorbed by the clas-
sification stage. In the case of the DMARF, such
data may end up being stored on different hosts
that run the classification service potentially caus-
ing recomputation of the already computed data
on another classification host that did a similar
evaluation already. Thus, the classification stage
nodes need to communicate to exchange the data
they have lazily acquired among all the classifi-
cation members. Such data mirroring/replication
would optimize a lot of computational effort on
the end nodes.
• Dynamic communication protocol selection. An-
other aspect of self-optimization is automatic se-
lection of the available most efficient communica-
tion protocol in the current run-time environment.
E.g. if DMARF initially uses WebServices XML-
RPC and later discovers all of its nodes can also
communicate using say Java RMI, they can switch
to that as their default protocol in order to avoid
marshaling and demarshaling heavy SOAP XML
messages that are always a subject of a big over-
head even in the compressed form.
2.2 ASSL
The Autonomic System Specification Language
(ASSL) (Vassev, 2008) approaches the problem of
formal specification and code generation of auto-
nomic systems (ASs) within a framework. The core
of this framework is a special formal notation and a
toolset including tools that allow ASSL specifications
to be edited and validated. In general, ASSL consid-
ers ASs as composed of autonomic elements (AEs)
communicating over interaction protocols. To specify
those, ASSL is defined through the formalization of
tiers. The ASSL tiers (cf. Figure 2) are abstractions of
different aspects of any given AS. There are three ma-
jor tiers (three major abstraction perspectives), each
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
332