A Multiagent-based Framework for Solving Computationally

Intensive Problems on Heterogeneous Architectures

Bioinformatics Algorithms as a Case Study

H. M. Faheem

and B. König-Ries

Computer Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Faculty of Mathematics and Computer Science, Jena University, Jena, Germany

Keywords: Bioinformatics, Heterogeneous Architectures, Motif Finding Problem and Multiagent Systems.

Abstract: The exponential increase of the amount of data available in several domains and the need for processing

such data makes problems become computationally intensive. Consequently, it is infeasible to carry out

sequential analysis, so the need for parallel processing. Over the last few years, the widespread deployment

of multicore architectures, accelerators, grids, clusters, and other powerful architectures such as FPGAs and

ASICs has encouraged researchers to write parallel algorithms using available parallel computing paradigms

to solve such problems. The major challenge now is to take advantage of these architectures irrespective of

their heterogeneity. This is due to the fact that designing an execution model that can unify all computing

resources is still very difficult. Moreover, scheduling tasks to run efficiently on heterogeneous architectures

still needs a lot of research. Existing solutions tend to focus on individual architectures or deal with

heterogeneity among CPUs and GPUs only, but in reality, often, heterogeneous systems exist. Up to now

very cumbersome, manual adaption is required to take advantage of these heterogeneous architectures. The

aim of this paper is to provide a proposal for a functional-level design of a multiagent-based framework to

deal with the heterogeneity of hardware architectures and parallel computing paradigms deployed to solve

those problems. Bioinformatics will be selected as a case study.

1 INTRODUCTION

Heterogeneous architectures in modern data centers

include different subsystems that may have CPUs,

GPUs, grids, clusters, FPGAs, and ASICs.

Performance of each subsystem in handling

computationally intensive problems depends mainly

on its computing power. Unified access to all the

heterogeneous systems is still in its initial phases.

Several trials to solve the heterogeneity among

CPUs and GPUs are currently available (Augonnet,

C., et. al, 2009), (Arabnejad, H. and Barbosa, J.,

2013), and (Rauber and Rünger, 2010). These trials

offered run-time systems that allowed the

programmer to select or even provide a user-defined

scheduling strategy but they didn’t provide any

support to FPGA, ASIC, or any other special

purpose parallel architectures. Other trials are

focusing on CPUs, GPUs, and FPGAs (Inta, R.,

Bowman, D., and Scott, M., 2012). These trials are

proposing the design of algorithms that can use all

the existing resources in the machine or a cluster

such that the algorithm will manipulate the CPU,

GPU, and FPGA. Algorithms design is completely

depending on the programmer capabilities in

allocating the hardware resources efficiently.

However, all the trials didn’t provide any

mechanism to automatically schedule tasks

according to the existing hardware. Currently,

neither standards nor functional-level descriptions

are available to define necessary rules or functions

to efficiently schedule tasks on heterogeneous

architectures. In principle, having a framework that

is able to integrate different heterogeneous

architectures and treating them as a unified

computing resource constitutes a dream to

programmers. The intended framework should be

able to: 1) interactively analyze the task dependency

of the algorithm used to solve a given problem, 2)

dynamically allocate computing resources with

tasks, 3) autonomously respond to hardware

topology changes, and 4) intelligently generate

relevant parallel codes that best fit the existing

computing resources. The software paradigm best

526

M. Faheem H. and König-Ries B..

A Multiagent-based Framework for Solving Computationally Intensive Problems on Heterogeneous Architectures - Bioinformatics Algorithms as a Case

Study.

DOI: 10.5220/0004967105260533

In Proceedings of the 16th International Conference on Enterprise Information Systems (ICEIS-2014), pages 526-533

ISBN: 978-989-758-027-7

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

fitting these requirements is the multiagent-based

system (Russel and Norvig, 1995). What we are

proposing in this paper is a functional-level design

of automatic but intelligent task scheduling

mechanism that distributes tasks after exploring the

existing hardware and then allows for automatic

parallel code generation (using available parallel

computing paradigms) for each hardware resource.

This will be carried out using a multiagent-based

framework. The remainder of this paper is organized

as follows: Section 2 explores the suggested

multiagent-based framework. Section 3 describes

how the four layers can be customized to suit the

bioinformatics domain. Section 4 illustrates the

structure of the multiagent-based framework. It also

provides some attributes of the agents such as name,

percept sequence, action, goals, and type. Section 5

shows how the framework can be used to solve the

motif finding problem as an example. Section 6

provides conclusion and directions for future work.

2 MULTIAGENT-BASED

FRAMEWORK FOR SOLVING

COMPUTATIONALLY

INTENSIVE PROBLEMS

The suggested multiagent framework for solving

computationally intensive problems depicted in Fig.

1 consists of four layers: problem description layer,

computer algorithms layer, abstraction layer, and

architectures layer. The problem description layer

describes the problem. The description of the

problem can be provided in several ways. It can be

presented as a checklist, written in a special purpose

language, or simply written as a set of descriptive

statements. Checklists allow the agent program to

automatically generate the relevant actions. In case

of a written problem with a special purpose

language, a compiler is needed to compile the

language statements and produce necessary

executable codes. Descriptive statements need much

more effort since domain expert intervention is

required to pass the parameters, attributes, and

formulations to the system. The computer

algorithms layer is responsible for mapping the

problem from its specific domain (bioinformatics,

climatology, etc.) to the computer domain. It is

interested in the use of computer and information

sciences and mathematics to model and analyze the

problem. The abstraction layer contains intelligent

agents responsible for performing several tasks

among them: gathering computer algorithms of a

specific problem, exploring architectures features,

checking for updates of problems, checking for

updates of hardware architectures features, ideal

scheduling of tasks, setting scheduling strategy,

actual scheduling of tasks, visual presentation of

task dependency diagrams, and managing the

configuration of hardware architectures using the

drivers of these architectures. Agents can traverse

across different layers to perform their intended

tasks. This layer also provides the mapping between

the algorithm intended to solve the problem and the

intended hardware architectures to perform this

algorithm. This layer constitutes the shielding of

problem domain researchers from hardware

architectures level configuration details. The

architectures layer contains different hardware

architectures that can communicate and cooperate

with agents. Software drivers of architectures can

negotiate with the system agents such that data can

be sent and received by the system agents. The

software drivers should support different

architecture requirements such as programs and data

handling mechanisms. ASIC, FPGA, and DNA-

based Self Assembled Architectures should have

their own drivers developed by the hardware

manufacturer and should be able to accept

configuration profiles or scripts. Clusters and other

Figure 1: Layers of the Multiagent-based Framework.

AMultiagent-basedFrameworkforSolvingComputationallyIntensiveProblemsonHeterogeneousArchitectures-

BioinformaticsAlgorithmsasaCaseStudy

527

Figure 2: The Four Layers of Multiagent-based Framework for Bioinformatics.

standard systems can communicate directly with the

multiagent system using standard multiagent

schemes.

3 MAPPING THE FOUR LAYERS

TO THE BIOINFORMATICS

DOMAIN

Hugh amounts of biological data deposited in Web

databases are currently available. Access to this data

is very important to biological researchers.

Accessibility to such databases has encountered a lot

of difficulties due to the heterogeneity among

biological databases in data formats, data

representations, and data source schema (Miled et

al., 2003).

Bioinformatics tools proved remarkable success

in different areas of bioinformatics like gene finding

and sequence alignment. Many approaches have

been proposed and one can find many published

papers describing novel algorithms to address such

computationally intensive bioinformatics problems.

Parallelism seems to be the trend. Different

bioinformatics algorithms are currently developed in

a parallel format such that a significant improvement

in terms of speedup has been achieved. Parallel

processing researchers are thinking of the

Bioinformatics problems from the point of studying

task dependency such that concurrent execution of

parallel tasks can dramatically reduce the overall

execution time required to solve a given problem.

Biologists are thinking of the problem in a different

way such that they believe they have to find

relations and correlations related to different

sequences irrespective of the time constraint. They

spend a lot of time seeking for solutions to their

computationally intensive problems. In fact all they

need is a simple way to issue an order to find a

certain motif, or align a sequence, etc. such that they

can proceed in their work to extract conclusions.

Bioinformatics researchers are trying to understand

the problem definition from the biologists in order to

invent new algorithms to solve such given problems.

In doing so, they are trying to convert the problem

from the biology domain to the information

processing domain. A set of questions may arise,

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

528

among them:

 Can we develop a simple bioinformatics

language that can be used by researchers of

biology to simply perform a certain task or solve

a specific problem?

 Can we design a system that is able to identify

the problem and extract the implicit parallelism,

study task dependency, and provide an ideal task

scheduling mechanism?

 Can we query the existing and available

hardware infrastructure in a way that can guide

us to map the tasks efficiently to be implemented

on such hardware platforms irrespective of their

heterogeneity?

 Can we perform these tasks in an intelligent

way?

Several trials have been concerned with deploying

agent-oriented technology in the Bioinformatics

domain. Most of them were focusing on integrating

biological data from different data sources as in

(Shunmaganathan, Deepika, and Deeba, 2008).

Agent-oriented technology has also been deployed

in task scheduling as in (Konwinski, 2012). In our

proposal we show how to deal with the

heterogeneity of data sources, hardware

architectures, and parallel computing paradigms

deployed to solve bioinformatics problems.

The advent of agent technology for

bioinformatics yields remarkable advantages. Since

the multiagent system deploys a concurrent,

cooperative working technique then it can fit easily

to the distributed programming approach used for

parallel bioinformatics algorithms. A multiagent

system consists of several agents that can run on a

distributed system. Agents can cooperate with each

other to perform a specific task or a set of tasks.

During parallel processing of a bioinformatics

algorithm, a set of processing units work in a

distributed system and communicate via data

streams to perform tasks. This conforms well with

decomposition of processing tasks to dedicated

agents which in turn coordinate and perform tasks.

Agents are appropriate for efficient, distributed

planning. This encourages the utilization of agents

for planning the parallel processing of

bioinformatics tasks. Agents can be used for

distributed resource management. Consequently,

they will be used for distributing data and collecting

results. Agents are able to plan and perform parallel

bioinformatics algorithms using parallel processing.

Rational agents have the ability of learning. This

enables them to implement dynamic load balance

strategies that can be used to handle optimal

distribution of data and tasks. Moreover, the ability

of intelligent agents to learn can help in improving

the scheduling strategies of the agent. The four

layers of the proposed multiagent-based system can

be related to the bioinformatics domain as shown in

Fig. 2. A set of bioinformatics problems is presented

in the problem description layer. Bioinformatics

algorithms are included in computer algorithms

layer. Abstraction layer has agents that perform

some functions described earlier. Some agents from

this layer can traverse to other layers such that they

can visit the problem layer to collect the problem,

visit the architectures layer to collect hardware

features, and can also visit the computer algorithms

layer to select the appropriate algorithm for solving

a given bio problem. The fourth layer

“Architectures Layer” contains the same set of

heterogeneous hardware architectures.

4 MULTIAGENT-BASED

FRAMEWORK STRUCTURE

A bioinformatics problem is picked up from a text

file written by the biologist using the Bio Problem

Collector Agent. This agent is responsible for

interpreting the biology problem into a specific

bioinformatics language. The agent can also provide

a GUI to allow the bioinformatics programs

developer to simply select a suitable set of

statements. It is assumed that the problem writer has

enough experience to select among different

bioinformatics language statements. The bio

problem collector agent periodically collects bio

problems and forwards them to the Mapping Agent.

The Bio Problem Updater Agent tracks changes and

searches for any updates into the problem and

provides these updates to the Mapping Agent that

specifies the class of the computer algorithms

relevant to the bio problem and then forwards this

classification to the Ideal Task Scheduling Agent.

The Ideal Task Scheduling Agent is responsible for

analyzing the problem and extracting the task

dependency diagram irrespective of the hardware

architecture it will run on. The task dependency

diagram generated by the Ideal task scheduling

agent is forwarded to the Scheduling Strategy Agent.

The Feature Collector Agent provides the Topology

Explorer Agent with specific details of the hardware

architecture it belongs to. The Topology Explorer

Agent collects different architectures features and

status and forwards them to the Scheduling Strategy

Agent. Now the Scheduling Strategy Agent has both

the task dependency diagram (tree) and the available

hardware topology and features. The Scheduling

AMultiagent-basedFrameworkforSolvingComputationallyIntensiveProblemsonHeterogeneousArchitectures-

BioinformaticsAlgorithmsasaCaseStudy

529

Strategy Agent is now able to decide on the

appropriate scheduling strategy that best assigns

tasks to hardware architectures. The Actual Task

Scheduling Agent enforces the scheduling strategy

and assigns specific tasks to specific hardware

architectures and sends this assignment to the

Parallel Paradigm Agent(s) which in turn generates

the relevant code that will run on the relevant

architectures. The code generated by the Parallel

Paradigm Agent(s) will then move to the

Architecture Specific Agent which is responsible for

dealing with its specific architecture. The

Architecture Update Agent keeps track of any

changes related to the hardware architecture and

forwards these changes to the Topology Explorer

Agent. In fact, some sort of integration with a

multiagent-based system for integrating biological

data should be addressed. Data coming from

different data sources to be entered to the hardware

architectures for executing the intended code should

come through standard and common agent interface

as in (Maghrabi, F., et al, 2008). Other agents may

perform some task monitoring functions such as

Visual Task Scheduling Agent. A brief description of

each agent including its name, percept sequence,

actions, goals, and type is listed in Table 1 while the

suggested multiagent-based framework structure is

shown in Fig.3

5 SOLVING MOTIF FINDING

PROBLEM

Motif is generally defined as a recurring pattern in

the sequence of nucleotides or amino acids. In the

DNA sequence, it is usually a short segment that

occurs frequently, but not required to be an exact

copy for each occurrence (typical pattern matching

problem). The Motif Finding Problem MFP has

Figure 3: Multiagent-based System Structure.

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

530

Table 1: Agents Description.

Agent Name Percepts Actions Goals Type

Bio Problem

Collector

Written Bio

Problem

Mapping problem into

standard common bio-

problem (written in

bioinformatics language)

Keep track of the bio

problem

Simple Reflex

Bio Problem

Updater

Written Bio

Problem

Matches gathered problem

with previously stored

problem

Notification of Mapping

Agent with new problem

updates

Agent that

keeps track of

the world

Mapping

Bio Problems

and updates

(written in

Bioinformatics

language)

Classifies received bio

problem and then maps it to

an equivalent computer

algorithm

Determination of the

matched class of the

computer algorithm

Goal-based

Ideal Task

Scheduling

Computer

Algorithm

describing the

bio problem

Analyze the algorithm to

extract task dependency

diagram

Extracting task

dependencies of the

algorithm

Utility-based

Task

Scheduling

Strategy

Ideal task

scheduling and

available

architectures

Analysis of the properties of

different architectures to

allocate a certain set of tasks

to each

Deciding the appropriate

scheduling strategy that

best fit the available

architectures

Utility-based

Actual Task

Scheduling

Strategy of

scheduling

Applying scheduling

strategy

Task distribution to each

architecture

Utility-based

Topology

Explorer

Receives

different features

of the available

architectures or

features updates

List available architectures,

its status, its attributes, and

suitable computing paradigm

Clarify the available

topology structure

Utility-based

Architecture

Specific

Instructions from

Parallel Paradigm

Agent

Converts parallel

instructions to scripts and

negotiates drivers of

architectures to perform

necessary operations

Setting up, configuring,

and running appropriate

code on a specific

architecture

Utility-based

Feature

Collector

Specific

architecture

features

Mapping the gathered

architecture features into a

readable format suitable for

the topology explorer agent

Keep track of the

architecture features

Simple Reflex

Architecture

Updates

Architecture

Features

Forwarding architecture

features or updates to

Topology Explorer Agent

Notification of the

Topology Explorer Agent

with new updates of the

architecture

Agent that

keeps track of

the world

Parallel

Paradigm

Actual task

scheduling

scheme

Converting actual task

scheduling scheme into a

well-defined code that can

run on a specific architecture

Submitting a specific code

written in a specific

parallel paradigm to an

Architecture Specific

Agent

Utility-based

Visual Task

Scheduling

Actual task

scheduling

scheme

Convert the actual task

scheduling into a graphical

diagram showing task

dependencies

Providing a visual

monitoring for task

dependencies and their tree

Goal-based

AMultiagent-basedFrameworkforSolvingComputationallyIntensiveProblemsonHeterogeneousArchitectures-

BioinformaticsAlgorithmsasaCaseStudy

531

Figure 4: The operation of the multiagent-based framework to solve motif finding problem.

been tackled several times as in (Rajasekaran, Balla,

Huang, 2005). Different architectures and algorithms

are designed to solve such computationally intensive

problem. MFP can be illustrated as follows: Planted

(l, d) - Motif Problem: Find the motif consensus M

which is a fixed but unknown nucleotide sequence

of length l. Suppose that M occurs once in each of t

background sequences of common length n. Each

occurrence of M is corrupted by exactly d point

substitutions in positions chosen independently at

random. Given the t sequences, recover the motif

occurrences and the consensus M. We also will

consider that we have a typical

heterogeneous environment having CPUs, GPUs,

FPGA architecture as in (Farouk, El-Deeb, and

Faheem, 2011), and ASIC as in (Faheem, 2010). The

skip brute force SBF algorithm was selected to solve

the MFP. The operation of the multiagent-based

framework is illustrated in Fig. 4. It is well

understood that there is no task dependency after

expanding input sequences since comparison

processes are carried out between a specific l-mer

and all the input sequences windows that have the

same length. It is assumed that we will use

Saccharomyces Cerevisiae Promoter Database

SCPD public database as a data source. This is

obvious in the statement generated by the bio

problem collector agent.

The skip brute force SBF algorithm has been

selected by the mapping agent to solve the MFP.

Each agent in the proposed framework performs a

specific operation to perform a specific task to solve

the problem of MFP. However, special focus on the

task scheduling strategy agent should be taken into

account. This agent decides the scheduling strategy

based on the topology of the heterogeneous

architectures and the set of tasks to be performed.

Predefined scheduling strategies should be supported

such as greedy policy, priority queues,

Heterogeneous Earliest Finish Time, etc. User-

defined policies should also be supported such that

the user can define his scheduling policy.

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

532

The actual task scheduling agent in our case will

provide the parallel paradigm agents with the actual

task list to be executed. Clearly, CUDA agent will

generate the appropriate code for the GPU.

OpenMP agent will generate the appropriate code

for the multicore CPUs. Specific Input / Output code

will be generated to allow data exchange between

the FPGA agent and its relevant hardware and the

same concept will be applied to the associative

memory array processor.

6 CONCLUSION

In our trial to standardize the functions related to

solving computationally intensive problems on

heterogeneous architectures, a functional level

description of a multiagent-based framework is

proposed. The bioinformatics domain has been

selected as a case study. The function of each agent

in the system is clarified. The operation of the

system is described through an example of solving

MFP. The framework is in its initial phase. As a

next step, the actual development of such proposed

system will be implemented using available

multiagent based frameworks such as JADE,

EtherYatri, AgentBuilder, etc. Clarification of the

initial rules that will be used by each agent will be

addressed. Learning mechanisms of the agents will

also be considered. We believe that it is an initial

draft version of a multiagent-based system that can

be established and can move towards an efficient

system to solve computationally intensive problems

on heterogeneous architectures.

REFERENCES

Miled, Z. et al., 2003. An Ontology for Semantic

Integration of Life Science Web Databases.

International Journal of Cooperative Information

Systems.12 (02).

Rauber, T., Rünger, G., 2010.Parallel Programming: for

Multicore and Cluster Systems.Springer.

Farouk, Y., El-Deeb, T., and Faheem, H., 2011.Massively

Parallelized DNA Motif Search on

FPGA .Bioinformatics – Trends and

Methodologies.INTECH.

Faheem, H. M., 2010. “Associative Memory Array

Processor for Solving Motif Finding Problem”. The

International Conference on Artificial Intelligence and

Applications (AIA). Austria.

Rajasekaran, S., Balla, S. and Huang, C.H., 2005.Exact

algorithm for planted motif challenge problems.

Proceedings of Asia-Pacific Bioinformatics

Conference, 249–259.

Shunmaganathan, K., Deepika, K., Deeba, K., 2008.

Agent Based Bioinformatics Integration using

RESTINA. The International Arab Journal of

Information Technology. 5(3):258-264.

Konwinski, A., 2012.Multi-agent Cluster Scheduling for

Scalability and Flexibility. Technical Report

No.UCB/EECS-2012-273.

Russel, S., Norvig, P. 1995. Artificial Intellegence – A

Modern Approach.Printice-Hall.

Inta, R., Bowman, D., and Scott, M., 2012. The

“Chimera”: An Off-The-Shelf CPU / GPGPU / FPGA

Hybrid Computing Platform. International Journal of

Reconfigurable Computing. Vol. 2012.

AMultiagent-basedFrameworkforSolvingComputationallyIntensiveProblemsonHeterogeneousArchitectures-

BioinformaticsAlgorithmsasaCaseStudy

533