BIOLOGICAL

CONCEPT FORMATION GRAMMARS

A Flexible, Multiagent Linguistic Tool for Biological Processes

Veronica Dahl

1,2

, Pedro Barahona

, Gemma Bel-Enguix

and Ludwig Krippahl

Research Group on Mathematical Linguistics, Rovira i Virgili University, Avda. Catalunya 35, Tarragona Spain

Logic and Functional Programming Group, Simon Fraser Unviersity, 8888 University Dr., Burnaby, Canada

Centria - Centro de Intelig

encia Artiﬁcial, Departamento de Inform

atica, Universidade Nova de Lisboa, Portugal

Keywords:

Biology, Cognitive sciences, Concept formation, Multi-agent systems, Molecular biology, Nucleic acid string

analysis, Lung cancer detection, Logic programming, Constraint handling rules, Logic grammars, Constraint

handling rule grammars, Language processing, RNA design.

Abstract:

Constraint based models that are useful for processing biological information have been successfully put

forward recently, e.g. for representing multi-disciplinary biological knowledge in view of cancer diagnosis,

and for reconstructing RNA sequences from secondary structure. Here we generalize such results into a

model for biological concept formation which can interact with heterogeneous agents through constraint-based

reasoning. Our model includes linguistic agents, probabilistic agents for mining nucleic acid, and illness

diagnosis agents. Information is selected automatically as a side effect of (the system) searching through

applicable CHR rules, and automatically transformed when a rule triggers; decisions follow from the normal

operation of the rules, and cognitive structure is given by properties that the concepts a given rule is trying to

relate must satisfy. Moreover the user can declare under what circumstances a given property or properties can

be relaxed. Concepts formed under relaxed properties result in output which signals not only what concepts

were formed, but which of the properties associated with the construction of those concepts were satisﬁed and

which were not. This allows us human-like ﬂexibility while maintaining direct executability.

1 INTRODUCTION

Computational linguistics traditionally breaks up the

tasks of processing language into separate compo-

nents which are often applied in sequence: ﬁrst a lex-

ical analysis annotates the input with word category

information, then a syntactic parser produces a parse

tree or graph, next a semantic component translates

the parse tree into a meaning representation form, and

so on.

Sometimes some of these components intermin-

gle, e.g. syntactico-semantic parsers build structure in

parallel with meaning representations, in particular to

exploit the fact that semantics can inform syntax, and

viceversa. In such cases, the semantic information is

either embedded into the grammar rules themselves,

or forms part of a separate component, for instance a

taxonomy, which the grammar rules consult.

Even in cases in which the components are sepa-

rate, the master-servant modality of traditional com-

puting science reigns, in that for instance grammar

rules might invoke a taxonomy but never allow the

taxonomy itself to decide when to intervene.

Modern language models and systems, in contrast,

stress interactions between modularized components

of a grammar which collaborate in a more democratic

way. In this respect they approach modern software

agents, which must act in a relationship of agency,

that is, they must perform an action on behalf of an-

other program module, unsolicited. In fact there have

been several recent proposals to explicitly incorporate

agents into grammars, in view of speciﬁc applications

such as generating architectonical designs (Grabska

et al., 2009), or designing 3-D structures (Jacob and

Mammen, 2009).

An interesting recent development in logic

grammars- Constraint Handling Rule Grammars, or

CHRG (Christiansen, 2005)- propitiate the introduc-

tion of agents, in that they allow multi-headed rules to

drive the parsing process in a daemon-like fashion: if

all heads in a rule have matching counterparts within

the working store (also called constraint store), then

the rule applies, making no hierarchical distinction

among the different heads of a rule, which contribute

388

Dahl V., Barahona P., Bel-Enguix G. and Krippahl L. (2010).

BIOLOGICAL CONCEPT FORMATION GRAMMARS - A Flexible, Multiagent Linguistic Tool for Biological Processes.

In Proceedings of the 2nd International Conference on Agents and Artiﬁcial Intelligence, pages 388-394

DOI: 10.5220/0002786203880394

 SciTePress

all equally. Thus, a CHRG rule can deal for instance

with both syntactic and semantic information in one

stroke, by having syntactic and semantic agents coex-

ist as heads in the same rule.

This capability can be taken even further, since

CHRG rules can pick up working store elements com-

ing from a variety of disparate sources, and thus

they lend themselves ideally for incorporating multi-

agents that collaborate in tasks that require intelligent

interactions with non-grammatical kinds of agents.

In recent years, CHRG models that are useful

for processing biological information have been suc-

cessfully put forward, e.g. for representing multi-

disciplinary biological knowledge in view of cancer

diagnosis (Barranco-Mendoza et al., 2004), and for

reconstructing RNA sequences from secondary struc-

ture (Bavarian and Dahl, 2005).

In this article we generalize such results into a

model for biological concept formation which can in-

teract with heterogeneous agents through constraint-

based reasoning, in view of ﬁne-tuning the process

to obtain richer and more accurate results. These

results include traditional parsing results such as a

sentence

s structure and meaning representation, but

also less (for a parser) conventional results such as

mined nucleic acid information, or medical diagno-

sis. In this sense, our model can be viewed as an ex-

ecutable, multi-agent based cross between a grammar

and a biological expert system. It consists basically

of a Constraint Handling Rule grammar that incor-

porates agents for syntax and semantics, but also for

domain dependent biological information, including

probabilities, and allows these agents to interact with

grammatical information in a natural and expressive,

while effective, way.

2 MOTIVATION

Intelligent systems such as those we aim ultimately at

with our new paradigm must exhibit distributed intel-

ligence, since they must represent and consult knowl-

edge from different disciplines (e.g. linguistics, biol-

ogy, parsing). Such knowledge is naturally, and more

manageably, expressed in different modules, one or

several for each discipline. However these modules

must communicate between them in effective ways.

This is no easy task, as anyone who has interacted

with human experts in view of interdisciplinary col-

laboration will attest to. Where computers are in-

volved, it becomes even more challenging.

Even within the one discipline of computational

linguistics, the variety of subareas involved presents

an already formidable challenge. Speech analysis re-

quires different specialized knowledge, for instance,

than text processing.

We typically divide in order to reign, and in many

cases it is straightforward to do so because the pro-

cesses involved can be done independently. For in-

stance, we can ﬁrst glean text from speech through

one of the many currently available speech processors

out there, and then utilize a text processing program

to mine the text appropriately for our needs. In this

article we will assume speech has already been trans-

lated into text, and focus on text processing.

Even by thus restricting our problem, however, we

cannot always divide to reign in the sense of identi-

fying independent processess that can be run in se-

quence. As mentioned in the Introduction, it may

be desirable to intermingle some of the components.

Thus, syntactico-semantic parsers build structure in

parallel with meaning representations, so that seman-

tics can inform syntax, and viceversa.

As an example of syntax informing semantics,

when a sentence contains several natural language

quantiﬁers, it is often easier to determine what their

respective scopes -and hence, the sentence

s ﬁnal

meaning- should be once an entire syntactic structure

is initially produced and can be looked at as a whole.

As an example of semantics informing syntax, se-

mantic types associated to lexical items may help dis-

ambiguate what would otherwise be a structurally am-

biguous sentence. For instance “ﬁnd the price of a

computer having Java” has a sensible reading, namely

“ﬁnd a computer that has Java and then calculate its

price”, as opposed to the nonsensical (to us, but not

to machines) interpretation “ﬁnd a computer

s price

such that this price has Java”. The nonsensical read-

ing can be disallowed if we only incorporate the se-

mantic type information that computers can have Java

whereas prices cannot. Of course, if we allow the

nonsensical reading it will fail at knowledge-based

consultation time, but disallowing it at the parsing

stage is more efﬁcient than allowing the parser to gen-

erate a “meaning” representation which we know is

doomed to failure.

Semantic type information, however, belongs

epistemologically (and also from a practical point

of view, since it can be consulted per se) in the

knowledge base component of a traditional question-

answering system, and therefore needs to be con-

sulted from its grammar component for the purpose of

blocking non-sensical readings- unless the type infor-

mation were to be redundantly included in the gram-

mar as well, as some systems dealing with semantic

type checking do.

However in our multi-agent model, a single agent

containing all the semantic type information of a par-

BIOLOGICAL CONCEPT FORMATION GRAMMARS - A Flexible, Multiagent Linguistic Tool for Biological Processes

389

ticular application is enough, as it can be consulted

either per se (e.g. for replying to questions such as:

“Is a bank a ﬁnancial institution?”) or as part of an in-

put sentence

s semantic correctness analysis process.

In fact, it can be consulted, if desired, from the same

grammar rule for both purposes.

More importantly, it can pop up in daemon-like

fashion whenever needed, as opposed to depending

on e.g. the parser for consulting it explicitly. For

instance, the lexical entry for “price” could throw

in the workspace not only the grammatical informa-

tion that “price” is a noun with meaning representa-

tion “price(X-T,P)” (the price of an individual X of

type T is P), but also the pragmatic constraint: “con-

sistent(T,price)” (i.e., X

s type T must be consistent

with X having a price). Likewise, the lexical entry

for “computer” would attach the appropriate value to

T (namely, “computer”), and once both are in, a bi-

headed rule would provoke a failure if T were not the

type of an object allowed to have a price.

Such issues, which are mostly (computationally)

linguistic in nature, ﬁrst motivated our quest for a

cognitive science formalism that could abstract the

problems into a general paradigm, concept formation,

that could be directly executable and yet ﬂexible. Our

incursions into computational molecular biology and

other biological applications led, as we next recount,

into interesting extensions of this framework.

3 BACKGROUND: FROM

LINGUISTIC TO BIOLOGICAL

CONCEPT FORMATION

Biological Concept Formation Grammars have

evolved from parsing methods we ﬁrst developed

speciﬁcally for natural language (Dahl and Blache,

2004), then generalized into an executable cognitive

model of knowledge construction inspired as well

in constructivist theory (Dahl and Voll, 2004), and

ﬁnally adapted to very different biological process

modeling tasks: early cancer diagnosis (Barranco-

Mendoza et al., 2004), RNA reconstruction from

secondary structure (Bavarian and Dahl, 2005) and

molecular biology text mining (Bel-Enguix et al.,

2009). From each of these applications we have

distilled both common threads and speciﬁc idiosyn-

cracies which have served to ﬁne-tune our initial,

general-purpose tool of Concept Formation Rules

into the ﬂexible while specialized biological oriented

model we present in this article.

Succinctly put, Concept Formation exploits the

natural connections (discussed in (Dahl and Voll,

2004)) between constructivism, cognitive logic and

logic programming

s recent new paradigm, Constraint

Handling Rules (Fruhwirth, 2002), to develop a cog-

nitive model of knowledge construction which can be

directly executed through (a specialized system im-

plemented in) CHR. In this model, information is se-

lected automatically as a side effect of (the system)

searching through applicable CHR rules, and auto-

matically transformed (or simply augmented) when

a rule triggers; hypotheses can be made in the form

of assumptions; decisions follow from the normal op-

eration of the rules, and cognitive structure is given

by properties that the concepts a given rule is trying

to relate must satisfy. Moreover some latitude is pro-

vided by which rather than rigidly having to satisfy

all properties deﬁned as necessary for a concept to

form, the user can declare under what circumstances

a given property or properties can be relaxed. Con-

cepts formed under relaxed properties result in output

which signals not only what concepts were formed,

but which of the properties associated with that con-

cepts construction were satisﬁed and which were not.

This allows us human-like ﬂexibility while maintain-

ing direct executability.

3.1 Concept Formation Rules: The

Basic Formalism

Human mind can be seen as a dynamically evolv-

ing store of Knowledge which constatly updates it-

self from new information built from previous infor-

mation with some kind of reasoning (Dahl and Voll,

2004). Concept formation can be deﬁned as the pro-

cess of constructing new pieces of knowledge from

previously known ones, process that might roughly

look as follows: c

, c

, ..., c

→ newc.

From the perpective of CHR, new pieces of

knowledge are the Body of the rule, and the pre-

viously known ones the Head. Concept Formation

Rules have the same general form as CHR rules, ex-

cept that the guard may include any number of prop-

erty calls for properties which have been deﬁned by

the user:

Head ==> Guard | Body

Head and Body are conjunctions of atoms and

Guard a test constructed from (Prolog) built-in or

system-deﬁned predicates, including the reserved bi-

nary predicate “prop” (for “property”); the variables

in Guard and Body occur also in Head; if the Guard

is the constant true then it is omitted together with the

vertical bar. Its logical meaning is the formula

∀ (Guard → (Head → Body)

and the meaning of a program is given by conjunction.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

390

Vagueness is expressed by relaxing properties be-

tween concepts in accordance with a user’s criteria.

The criteria can be ﬂexibly and modularly adjusted for

experimental purposes while maintaining direct exe-

cutability.

Thus, rather than inﬂexibly allowing for a concept

to be formed if a test succeeds and disallowing its for-

mation if that test fails, we single out those tests for

which we want to allow ﬂexibility as properties. Prop-

erties are like any other test, except that their failure

does not result in the rule itself necessarily failing: the

concept will still be formed, and two lists will be asso-

ciated with it: a list of the properties that the concept

satisﬁes (S) and a list of those which it violates (V).

This allows us to construct possibly incorrect

or incomplete concepts, plus the information on in

which way they are not totally warranted. The user

then has all the information pertaining to the construc-

tion of a particular concept and can therefore interpret

these results in a much more informed, holistic way

than if the degree of randomness or vagueness had

been blindly computed from those assigned a priori

to each individual piece of a reasoning puzzle.

Although the lists of satisﬁed (S) and unsatisﬁed

(V) properties are not explicit, they are managed by

the system. The notion of vaguenes lies in the distri-

bution of the lists.

For instance, if we want to accept incorrect sen-

tences in a parsing system that checks for number

agreement, we might designate as properties all tests

on rule applicability that would correspond to correct

parses, and then relax some of them (e.g. the num-

ber agreement property). This would be useful for in-

stance in a second language tutoring system which al-

lows the user to make certain types of mistakes, while

pointing out the reason why those are mistakes (i.e.,

which properties are not being satisﬁed).

The property calls are automatically handled by

the system provided that the user deﬁnes the proper-

ties as follows:

a) a property must be named and deﬁned through

the binary predicate prop, whose ﬁrst argument is the

property’s name and whose second argument is the

list of arguments involved in checking, and in sig-

nalling the results of checking, the property. For in-

stance, in a grammar that needs to check for number

agreement between a determiner and a noun, say, and

to produce either the (agreeing) number of both, or

an indication of mismatch, we can choose the name

agreement for the property, and deﬁne it as follows:

prop(agreement,[Ndet,Nn,N]):- Ndet=Nn,

!, N=Nn.

prop(agreement,[Ndet,Nn,mismatch]).

b) Acceptability of a property that has thus

been deﬁned must be checked in the concerned rule

through the binary system predicate “acceptable”,

whose ﬁrst argument is the prop atom with all its ar-

guments and whose second argument will evaluate to

either true, false, or a degree of acceptability, accord-

ing to whether (or how much of) the property is satis-

ﬁed. For our example, we can write:

determiner(Ndet), noun(Nn) = =>

acceptable(prop(agreement,[Ndet,Nn,N]),B) |

noun_phrase(N).

c) In order to relax a property named Name (i.e.

to allow the derivation of concepts that require it but

for which it is not satisﬁed), we simply write the fol-

lowing:

relax(Name).

Degrees of acceptability can be deﬁned through a

binary version of the relaxing primitive, where L is

the prop atom with all its arguments and D is a mea-

sure of acceptability:

relax(L,D).

A list of satisﬁed and violated properties, together

with the degree of violation if appropriate, will be out-

put for each property deﬁned in a given CF program.

3.2 Biological Concept Formation Rules

Although it was not emphasized in the ﬁrst stages of

our work, Concept Formation Rules involve agents,

in the sense of cooperating processes that trigger au-

tonomously upon need, i.e. when the constraint store

acquires (either through a user

s query or through the

normal working of the rules) enough data to trigger a

given rule, it will trigger on its own and produce, if all

involved properties hold with an acceptable tolerance

level, the appropriate new concepts to be added to the

working store. The process continues until no more

new concepts can be added.

Since Concept Formation is a Cognitive Sciences

model, note that although our examples so far have

been grammatical in nature, it can also be used to

model any other problem domain.

Having the main mechanism for Concept Forma-

tion, we need to introduce the agents that can help

to design efﬁcienty applications for BioMedicine. In

particular, we can identify three main types of agents:

• A Property Agent: which manages the user-

deﬁned properties.

• A Concept Formation Agent: which invokes it in

order to enforce or relax those properties accord-

ing to the user

s speciﬁcations as explained above.

BIOLOGICAL CONCEPT FORMATION GRAMMARS - A Flexible, Multiagent Linguistic Tool for Biological Processes

391

• A Probabilistic Agent: which is needed for the

application of formalism to fuzzy domains, like a

multidisciplinary approach to cancer diagnosis, as

well as from our work on RNA sequence discov-

ery from secondary structure.

• A BioMining Agent: which can identify substrings

of interest within given strings of nucleic acid.

The deﬁnition of such agent derives from our

work on mining molecular biology texts.

In the next section we describe the resulting new

model of concept formation in its rightfully earned

conceptualization as a multiagent system for biologi-

cal concept formation.

4 OUR MODEL

S LINGUISTIC

AGENTS

Our model comprises two types of linguistic agent

systems: those that can process human language input

and those which can analyse nucleic acid sequences.

We next explain the general idea of Human Language

processing Agent Systems and describe Nucleic Acid

Language Agent Systems, stressing their probabilistic

agents and suggesting an application to illnes diago-

nis.

4.1 Human Language Processing Agent

Systems

A linguistic agent that can process human language

input allows non-computer specialists to pose ques-

tions directly in natural language, thus largely remov-

ing the need for them to either learn a specialized

computing language, or depend on computer special-

ists who may not be too conversant with the biolog-

ical side of things. The subset of language admitted

by this agent will vary according to the application,

but a core, extensible parser conforms a basic lin-

guistic agent which, because expressed in Constraint

Handling Rule format, allows for smooth interaction

between the grammar and the biological knowledge

base agents (by allowing one of them to inform the

other one through integrity constraints in CHR). The

main features of this agent, described in (Dahl, 2009),

are that it allows for eager discarding of wrong lines

of reasoning, and for paraphrases of a given ques-

tion without ill-effects in the execution, thus accept-

ing fairly rich input.

4.2 Nucleic Acid Language Agent

Systems

In (Bel-Enguix et al., 2009), we proposed a CHR

based mining technique- the Parallel Matching

methodology- for problems that are interesting both in

molecular biology and in linguistics, such as identify-

ing subsequences (of English words, for instance, or

of nucleotides) that are common to a group several se-

quences, matching ambiguous subsequences, ﬁnding

a substring

s frequency, ﬁnding gapped patterns. The

resulting set of primitives can be viewed as conform-

ing a nucleic acid decoding agent, which has more-

over been expanded to include further nucleic acid

decoding primitives, and into a grammatical formu-

lation (Dahl, 2009) allowing interaction with human

language processing agents, so that the nucleic acid

agents can be consulted from human language com-

mands.

Our CHR plus CHRG formulation more readily

allows us to use eagerly any constraints of the prob-

lem which could serve to early prune the search space.

For instance, if the presence of phenylalanine pre-

cluded that of leucine and we had detected phenylala-

nine in our input string, there would be no point in

searching for leucine or for the sequence that encodes

it. An integrity constraint that provokes failure if both

are found in the working store (a single line of code)

is all that is needed. In a straight Prolog formulation,

in contrast, adding something globally would not be

possible: we would have to enter the deﬁnition of the

substring ﬁnding predicate to include a test in some

appropriate place within it.

4.2.1 Probabilistic Agents

To mine more complex biological structures, such as

RNA secondary structure, in order for instance to re-

construct the RNA sequence that folds into a given

secondary structure, we also use CHR but add a prob-

abilistic agent that follows the methodology devel-

oped by (Bavarian and Dahl, 2005). This agent op-

erates in the guard of a CHR rule, and uses the prob-

abilities that are believed to govern the proportion of

base pairs within RNA sequences. Bavarian and Dahl

calculated these probabilities by comparing several

RNAs together from Gutell lab

s comparative RNA

website (Cannone et al., 2002), a database of known

RNA secondary structures. After comparing 100 test

cases with various length from 100 to 1500 bases,

they found the following probabilities for each base

pair:

= 0.53, P

= 0.35, P

= 0.12

The other probabilities which are of interest are

the probabilities for an unpaired base to be one of A,

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

392

C, G, or U. The results are as follows:

= 0.18, P

= 0.34, P

= 0.27, P

= 0.21

Inserting the probabilities into the grammar rules

is done by generating a random variable in the guard

section of the rules, which is the only part that accepts

Prolog predicates. This random variable then is tested

according to the probabilities: for instance if the ran-

dom variable in the guard of a rule that assigns nu-

cleotides to positions known to be paired is less than

0.53, it will assign a GC pair. The average error is es-

timated to be about 18%, meaning that 18% of the nu-

cleotides might be paired with a nucleotide in a wrong

position (in the original structure they might be either

unpaired or be paired with another nucleotide).

4.2.2 Illness Diagnosis Agents

In previous work of one of the authors with Alma

Barranco-Mendoza, specialized concept formation

rules were used for representing knowledge in view

of diagnosing diseases such as lung cancer (Barranco-

Mendoza et al., 2004). Following this work, yet an-

other kind of probabilistic agent materializes as an ad-

ditional parameter of each constraint in the special-

ized concept formation rules of our multi-agent sys-

tem.The application introduced in this paper aims to

aid in early stage detection of some types of cancer,

like lung and oral, which have poor prognosis because

they are very difﬁcult to diagnose at the early stages.

Our concept formation methodology assists in the

integration and analysis of multidisciplinary agents

containing genetic and molecular information along

with the radiological, serum and sputum data. In par-

ticular, it provides some kind of diagnosis even if

given incomplete patient information, as not all tests

can or will be done on a given patient at a given time.

This is achieved by relaxing certain properties, where-

upon the analysis will be completed even if the infor-

mation is not complete. The list of violated properties

can provide a list of suggested follow-up tests to im-

prove the accuracy of the diagnosis.

As part of the input concepts it accepts the pa-

tients age, smoking history, malignancy history, ra-

diological, serum and sputum data. The knowledge

store includes the properties that should be evaluated

for each input data element as well as the relations

amongst them. The diagnosis is given as a probability

of cancer that is calculated as a function of the con-

cepts used in the analysis. As well, the diagnosis will

list those diagnostic properties that were satisﬁed and

those that were not. For example:

const(Prob),age(x,A),history(x,smoker,T),

serum_data(x,marker_type,in_range)<=>

marker(x,marker_type,in_range,P,B),

acceptable(marker(x,marker_type,in_range, P),B),

probability(P,Prob,x, B),

acceptable(probability(P,Prob,x),B)|

possible_lung_cancer(yes,Prob,x).

relax(marker(x,marker_type,in_range,P,B)).

This rule evaluates for a patient x if a speciﬁc

biomarker, marker-type, found in serum data is within

a certain value range for a patient with an age of A

who is a type T smoker (T depends on the number of

cigarettes or cigars smoked daily). If true, then the

diagnosis of possible lung cancer is going to be true

with a probability increase of P (where P is a func-

tion of the patient’s age, health history, and this par-

ticular biomarker presence). But if we relax the re-

quirement of the presence of the biomarker, then the

system can evaluate patient records that do not have

this particular information and report in the diagnosis

listing that this information was not included in the

record, which could be valuable information as rec-

ommended follow-up tests for that particular patient.

5 CONCLUSIONS

We have abstracted, from recent different realiza-

tions of the linguistically inspired Concept Formation

paradigm, a multi-agent model for Biological Con-

cept Formation which can be considered as a com-

putational metaphor for the (biological) mind, with

direct executability implications. Due to the general-

ized use of Constraint Handling Rules or their gram-

matical counterpart, we are able to integrate human

language processing techniques into our approach

which are not only useful for all types of concept for-

mation but also allow us a smooth integration of hu-

man language processing agents, as well as their in-

teractions with the knowledge base agents. Another

interesting feature of our proposal is its robustness:

due to the capability of relaxing some of the prop-

erties involved in concept formation, results that can

be useful are provided even in the absence of all the

information “necessary” to form the concepts in ques-

tion.

Concept formation rules are applicable to many

other AI and cognitive problems as well, most no-

tably, those involving the need to reason with incom-

plete or incorrect concepts.

The ﬂexibility allowed by relaxing properties was

argued in our initial paper (Dahl and Voll, 2004) to

provide a more appealing solution to the need for

ﬂexibility than the two main alternatives out there,

namely probabilities and fuzzy logic. The probabilis-

tic approach had been discounted as inappropriate for

measuring the meaning of information, although ad-

BIOLOGICAL CONCEPT FORMATION GRAMMARS - A Flexible, Multiagent Linguistic Tool for Biological Processes

393

equate to measure the randomness of information.

However, after our work on reconstructing RNA se-

quences from the structure into which they fold, plus

our work on using Concept Formation as an aid for

early diagnosis of lung cancer, we must rectify that

statement. We are now convinced that probabilistic

agents, both of the randomness measure kind and in

the form of combining the probabilities of individual

biomarkers into an overall probability of a disease de-

veloping, are very useful agents that have a rightful

place in a biological rendition of the Concept Forma-

tion paradigm.

Admittedly, the range of disparate biological

problems we have attempted to cover under a sin-

gle paradigm is perhaps too ambitious to allow us

as homogeneous a model as we would have liked.

However we feel it is an important step towards

achieving an encompassing and robust multi-agent

model for these various tasks, in that it allows for au-

tonomous triggering of the agents needed at a given

time, for easy synchronization between the various

agents, mainly through integrity constraints, and for

ﬂexibility, through property relaxation, in the face of

either incomplete or erroneous data- a problem that

biological systems aspiring to deal with real life prob-

lems must absolutely face.

ACKNOWLEDGEMENTS

This paper is supported by the European Commis-

sion in the form of V. Dahl

s Marie Curie Chair

of Excellence, the Canadian National Sciences Re-

search Council, the project “Constraint - and Hypo-

thetical - based reasoning for BioInformatics”, refer-

ence HP2008-0029, and “Logic, Automata and For-

mal Languages”, MTM2007-63422.

REFERENCES

Barranco-Mendoza, A., Persaoud, D., and Dahl, V. (2004).

A property-based model for lung cancer diagnosis. In

RECOMB (2004) 27–31.

Bavarian, M. and Dahl, V. (2005). Rna secondary structure

design using constraint handling rules. In WCB’05,

Workshop on Constraints for Bioinformatics.

Bel-Enguix, G., Jim

enez-L

opez, D., and Dahl, V. (2009).

Dna and natural languages: Text mining. In Proc.

International Joint Conference on Knowledge Discov-

ery, Knowledge Engineering and Knowledge Manage-

ment, KDIR 2009, pages 140–145.

Cannone, J., Subramanian, S., Schnare, M., J.R. Collett,

L. D., Du, Y., Feng, B., Lin, B., Madabusi, L., Muller,

K., Pande, N., Shang, Z., Yu, N., and Gutell, R.

(2002). The comparative rna web (crw) site: An on-

line database of comparative sequence and structure

information for ribosomal, intron, and other rnas. In

BioMed Central Bioinfo. 3:2.

Christiansen, H. (2005). CHR grammars. In Journal on

Theory and Practice of Logic Programming, vol. 5,

number 4-5.

Dahl, V. (2009). Decoding nucleic acid strings through hu-

man language. In Manuscript Submitted for Publica-

tion.

Dahl, V. and Blache, P. (2004). Directly executable con-

straint based grammars. In Proc. Journees Fran-

cophones de Programmation en Logique avec Con-

traintes.

Dahl, V. and Voll, K. (2004). Concept formation rules: An

executable cognitive model of knowledge construc-

tion. In NLUCS’04, International Workshop on Natu-

ral Language Understanding and Cognitive Sciences.

Fruhwirth, T. (2002). Theory and practice of constraint han-

dling rules. In Special Issue on Constraint Logic Pro-

gramming (P. Stuckey and K. Marriot, Eds.), Journal

of Logic Programming.

Grabska, E., Strug, B., and Slusarczyk, G. (2009). A multi-

agent distributed design system. In 7th International

Conference on PAAMS’09, AISC 55. Springer-Verlag

Berlin Heidelberg.

Jacob, C. and Mammen, S. V. (2009). Swarm grammars:

growing dynamic structures in 3d agent spaces. In

Digital Creativity, Volume 18, Issue 1, March 2007.

Routledge.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

394