A NICHED PARETO GENETIC ALGORITHM

For Multiple Sequence Alignment Optimization

Fernando José Mateus da Silva

Dept. of Informatics Engineering, School of Technology and Management, Polytechnic Institute of Leiria, Portugal

Juan Manuel Sánchez Pérez, Juan Antonio Gómez Pulido, Miguel A. Vega Rodríguez

Dept. Tecnologías Computadores y Comunicaciones, Escuela Politécnica, Universidad de Extremadura, Spain

Keywords: Multiple sequence alignments, Genetic algorithms, Multiobjective optimization, Niched Pareto, Equivalence

class sharing, Bioinformatics.

Abstract: The alignment of molecular sequences is a recurring task in bioinformatics, but it is not a trivial problem.

The size and complexity of the search space involved difficult the task of finding the optimal alignment of a

set of sequences. Due to its adaptive capacity in large and complex spaces, Genetic Algorithms emerge as

good candidates for this problem. Although they are often used in single objective domains, its use in

multidimensional problems allows finding a set of solutions which provide the best possible optimization of

the objectives – the Pareto front. Niching methods, such as sharing, distribute these solutions in space,

maximizing their diversity along the front. We present a niched Pareto Genetic Algorithm for sequence

alignment which we have tested with six BAliBASE alignments, taking conclusions regarding population

evolution and quality of the final results. Whereas methods for finding the best alignment are mathematical,

not biological, having a set of solutions which facilitate experts’ choice, is a possibility to consider.

1 INTRODUCTION

The alignment of protein, DNA and RNA sequences

is a very frequent task in bioinformatics. Multiple

sequence alignment is an optimization problem

which consists on finding the best alignment from

large complex search spaces (Horng et al., 2005). Its

main goal is to help in the comparison of sequence

structure relationship, by identifying sequences’

similarities and differences (Pal et al., 2006).

Genetic Algorithms (GAs) are search algorithms

based on the principals of natural evolution and

genetics (Goldberg, 1989). They are able to take

advantage of gathering information about an initially

unknown search space, in order to bias subsequent

search into useful subspaces. This quality makes

them suitable for problems with large, complex, and

poorly understood search spaces (De Jong, 1988),

such as multiple sequence alignment. Although GAs

are often used in single objective problems, they can

also be used in multiobjective problems, on which

the GA is used to find all possible tradeoffs among

the multiple conflicting objectives (Horn et al.,

1994). The resulting non-dominated solutions lie on

the Pareto optimal frontier, meaning that there are no

other solutions superior in all objectives.

Niching methods, such as sharing, helps in

maintaining the diversity of certain properties within

the population, preventing the convergence to a

single point in the Pareto front and allowing parallel

convergence into multiple good solutions (Shir and

Back, 2006).

In our prior investigation we have developed

AlineaGA, a genetic algorithm which performs

multiple sequence alignment. In our first approach,

we tested AlineaGA with a single objective fitness

function – the sum-of-pairs (Silva et al., 2008).

Later, we tested the weighted sum of the

sum-of-pairs value with the number of fully identical

columns to perform alignment evaluation (Silva et

al., 2009). Now, we present a multiobjective strategy

which tries to maximize both the sum-of-pairs and

the number of fully identical columns by means of a

niching mechanism named equivalence class sharing

(Horn et al., 1994). Our objective is to evaluate the

quality of the found solutions using this approach.

For this matter, we have tested AlineaGA with six

BAliBASE (Thompson et al., 1999) alignments.

323

José Mateus da Silva F., Manuel Sánchez Pérez J., Antonio Gómez Pulido J. and A. Vega Rodríguez M. (2010).

A NICHED PARETO GENETIC ALGORITHM - For Multiple Sequence Alignment Optimization.

In Proceedings of the 2nd International Conference on Agents and Artiﬁcial Intelligence - Artiﬁcial Intelligence, pages 323-329

DOI: 10.5220/0002729303230329

 SciTePress

This paper is organized as follows. In the next

Section we introduce concepts underlying our

research. In Section 3, we present a brief explanation

regarding AlineaGA methods. Section 4 presents

AlineaGA’s niched Pareto approach. The

experiments performed in order to observe the

impact of these strategy are discussed in Section 5.

Finally, the concluding Section presents final

considerations and topics for future work.

2 BACKGROUND

Although it may not be obvious, multiple sequence

alignments are present in most of the computational

methods used in molecular biology. They are used in

different areas such as functional genomics,

structure modelling, mutagenesis experiments,

evolutionary studies and drug design.

There are several approaches to the sequence

alignment. The two most important ones are based

on progressive and iterative methods.

When progressive methods are used, the

alignment is gradually built up by aligning the two

most similar sequences first, and adding the less

similar ones one after another. This fast and simple

method has a critical problem: if a mistake is made

at an intermediate step, it cannot be corrected later

by adding the remaining sequences. Also, it does not

provide a metric which allows the comparison of

two different alignments of the same set of

sequences, or which can be used to say that the best

possible alignment, for a set of parameters, have

been found (Notredame and Higgins, 1996).

Iterative methods try to optimize a scoring

function which reflects the biological events which

took place in the evolution of the sequences.

Optimizing this score leads to a correct alignment

(Lassmann and Sonnhammer, 2002). One example

of iterative methods are GAs, other examples may

be found in our prior review (Silva et al., 2007).

2.1 Alignment

An alignment is an arrangement of two or more

sequences in a way which reveals where the

sequences are similar, and where they differ. An

optimal alignment exhibits the most

correspondences and the fewest differences, even if

it will not be biologically meaningful (Pal et al.,

2006). Figure 1 shows an example of an alignment

of four hypothetical protein sequences.

Figure 1: Example of a multiple sequence alignment.

Sequences may have different lengths and each

one is represented in a different line. Columns with

the same characters, presented in bold, show that in

that specific position, no mutation occurs among the

sequences. On the other hand, columns which

present different characters show that mutation

events have taken place. The characters used to

represent the elements of the molecular sequences

are often referred as residues.

Gaps can be introduced in the sequences,

allowing the alignment to be extended into regions

where its sequences may have lost or gained

residues. These gaps are usually represented by the

symbol “–”.

2.2 Genetic Algorithms

GAs, are a class of evolutionary algorithms

introduced by Holland (Holland, 1975). Its search

methods model some natural facts: genetic

inheritance and Darwinian strife for survival

(Michalewicz, 1996).

In GAs, the adaptation is done by keeping a

population of structures from which new structures

are produced through genetic operators, such as

crossover and mutation(De Jong, 1988).

In crossover, characteristics of two randomly

chosen individuals (parents), are combined to form

two similar offspring by swapping corresponding

segments of parents. Mutation randomly alters some

values within the individual by a arbitrary change

(Anbarasu et al., 2000). Each structure of the

population has a fitness score, which is used to

choose which structures will be used to form new

ones (De Jong, 1988).

The ability to gather information about a search

space, initially unknown, to direct the search for

useful subspaces, is a distinguishing characteristic of

GAs. This ability makes them suitable for solving

problems with large, complex and unknown search

spaces (De Jong, 1988).

2.3 Fitness Sharing

Fitness sharing (Goldberg and Richardson, 1987) is

a mechanism for maintaining population diversity. It

distributes the population over different peaks in the

search space by reducing the fitness of highly

similar solutions.

-TISCTGNIGAG-NHVKWYQQLPG

-RLSCSSIFSS--YAMYWVRQAPG

L-LTCTVSFDD--YYSTWVR

PPG

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

324

Equation 1 presents the shared fitness of an

individual i, where f

is the individual raw fitness and

is the nich count, representing how crowded is the

neighborhood of individual i.

f =

(1)

The nich count is computed by adding a sharing

function over all members of the population as

follows:

()

∑

jii

dShm

(2)

Where Sh(d

i,j

) represents the sharing function,

presented in Equation 3, and d

i,j

is the distance

between the i and j individuals, which can be based

on either phenotype or genotype similarity.

Sh(d

i,j

) =

1−

if d ≤ σ

0 if d > σ

hare

(3)

The niche radius is given by σ

. Solutions

within this radius are in the same neighborhood,

reducing each other’s fitness.

3 AlineaGA METHODS

In AlineaGA, the initial population is randomly

generated, and then the individuals are selected,

combined and mutated in order to produce new

solutions through the course of a defined number of

generations. This section presents a brief explanation

regarding AlineaGA’s representation, evaluation,

crossover and mutation.

3.1 Representation

We use a non-codified representation of the

individuals. Real multiple sequence alignments, as

the one presented in Figure 1, are used as data

structures for each individual. Chromosomes are

represented by arrays of characters on which each

line corresponds to a sequence in the alignment, and

each column represents a residue at a specific

position.

3.2 Evaluation

To perform the evaluation of each solution, two

attributes are used: the sum-of-pairs and the identity

of the alignment. The sum-of-pairs function,

presented in Equation 4, is assessed by scoring all of

the pairwise comparisons between each residue in

each column of an alignment and adding the scores

together (Wang and Lefkowitz, 2005).

),(

llrixScoringMatPairsofSum

∑∑

−

=+=

=−−

(4)

For this purpose, a scoring matrix which

determines the cost of substituting a residue for

another is used, as well as a gap penalty value to

determine the cost of aligning a residue with a gap.

We use the PAM 350 (Dayhoff et al., 1978) scoring

matrix with a gap penalty of -10 (Silva et al., 2008).

The identity of the alignment is simply the

number of fully identical columns in the alignment.

3.3 Crossover

AlineaGA uses one of the two crossover operators,

randomly selected within each generation. The One

Point crossover derives from Goldberg’s standard

one point crossover operator (Goldberg, 1989) with

an extension that treats the existing gaps in each

sequence. On RecombineMatchedCol (Chellapilla

and Fogel, 1999), the fully identical columns of the

first parent which do not appear in the second one

are identified, and then, one of these fully aligned

columns is randomly selected and is generated in the

second alignment, originating the offspring.

3.4 Mutation

Each mutation operator is randomly selected from a

pool of six operators and it is applied to an

individual according with the defined mutation

probability. Whenever the mutated solution is worst

than the original one, a new mutation must be

applied to the mutated individual. This process is

repeated until the fitness improves or during a

specific number of attempts. We opted for the

maximum of 2 tries. This strategy allows a good

tradeoff between speed and robustness, without

transforming completely the solutions in a single

generation.

The Gap Insertion operator extends the

alignments by inserting gaps into the sequences in a

random fashion, such as in GenAlignRefine (Wang

and Lefkowitz, 2005) gap insertion operator.

Shifting gaps is another way to introduce new

alignment configurations. In the Gap Shifting

mutation operator, a gap is randomly chosen in an

alignment and it is moved to another position in the

same sequence (Notredame et al., 1997).

The Merge Space operator merges together two

or three spaces of a sequence (Horng et al., 2000). It

randomly selects two or three consecutive gaps of a

sequence, adjacent or not adjacent, and then merges

these gaps together. After that, they are shifted to a

randomly chosen position in the same sequence.

A NICHED PARETO GENETIC ALGORITHM - For Multiple Sequence Alignment Optimization

325

The Smart Merge Space is similar to the Merge

Space operator, but it only applies the mutation if

the fitness of the mutated solution is greater than the

fitness of the original one (Silva et al., 2009).

The Smart Gap Insertion is a variation of the Gap

Insertion operator which only produces the mutation

when the fitness of the mutated alignment is greater

than the fitness of the original one (Silva et al.,

2008). The insertion of additional gaps is determined

by a direction probability which reflects the success

of inserting gaps at the beginning or at the end of the

alignment. If the operator does not improve the

alignment at the first attempt, it chooses a new

random position of insertion and repeats the whole

process. The defined number of maximum attempts

is set to 3, but it can be customized according to

user’s needs.

The Smart Gap Shifting, tries to move the gaps

of an alignment until its fitness improves (Silva et

al., 2008). As in the Smart Gap Insertion operator,

the shift direction is determined by a direction

probability which is updated when better alignments

are found. Likewise, the mutation occurs only if the

fitness of the generated alignment is greater than the

original one.

The use of crossover and mutation operators can

produce columns completely formed by gaps in the

alignment. To remove these gap columns we use the

Gap Column Remover (Silva et al., 2008), which is

not conditioned by the mutation probability and it is

applied at the end of each generation.

4 NICHED PARETO GA

The Niched Pareto GA is characterized by its

selection mechanism. In previous works (Silva et al.,

2008, Silva et al., 2009), we use tournament

selection to choose the solutions of the current

generation that will prevail for the next one.

However, throughout the generations, this technique

tends to lead the population to a single point in the

search space. To maintain multiple Pareto optimal

solutions and avoid convergence, we use Pareto

domination tournaments and equivalence class

sharing (Horn et al., 1994), which we now present.

4.1 Pareto Domination Tournaments

In a normal binary tournament, two randomly

selected individuals compete for domination. If one

dominates the other, it wins. However, this condition

does not produce a sufficient domination pressure.

Pareto domination tournaments (Horn et al., 1994)

use a sampling scheme which offers control over the

domination pressure. In this method, two candidate

solutions are randomly chosen from the population

for selection purposes. Also, a comparison set is

formed by randomly choosing individuals from the

population. Then, each candidate solution is

compared with every individual in the comparison

set. The candidate which dominates all the

individuals in the comparison set is selected for

reproduction. If both candidates dominate or are

dominated by the comparison set, then sharing is

used to select the winner, as section 4.2 explains.

Adjusting the size of the comparison set allows

the control of the domination pressure. High values

for this parameter tend to increase the pressure

towards a small portion of the front. On the other

hand, small comparison sets result in many

dominated solutions. Typically, a comparison set

with size of 10% of the population, yields a tight and

complete distribution over the front (Horn et al.,

1994).

4.2 Equivalence Class Sharing

To avoid genetic drift, whenever the candidate

solutions are both dominated or both non-dominated

by the comparison set, the winner is selected by

equivalence class sharing (Horn et al., 1994).

This particular method of sharing does not

degrade the fitness of the individuals. Instead, it

assumes that candidates, mutually dominated or

non-dominated, are equally fit. Therefore, in order to

maintain diversity along the Pareto front, this

method computes the nich count of both candidates

and selects the one which has the smallest number of

individuals on its neighbourhood.

4.2.1 Distance Metric

The distance metric may be based on either

phenotype or genotype similarity. In our particular

case, the genotype and phenotype representation are

the same. As we are trying to maximize two

different objectives represented in a 2 dimensional

space, we opt for using the Euclidean distance as a

similarity measure.

4.2.2 Niche Radius σ

Defining the radius which determines each nich

range is not a trivial mater. Such as (Shir and Back,

2006), we determine the σ

value according with

Equation 5.

(5)

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

326

Table 1: Results for the AlineaGA Niched Pareto test configurations.

Dataset

BAliBASE

Number of

Peaks

AlineaGA

SOP ID Avg. Best SOP Avg. Best ID Best SOP Best ID

1aho 2015 12

81 1974,83 10,90 2155 13

49 1974,60 11,03 2141 13

4 1960,03 10,93 2112 13

1fmb 1706 25

36 1817,03 24,97 1864 27

100 1811,07 24,93 1860 27

4 1807 25,40 1864 27

1plc 2403 18

4 2356 17,33 2590 20

25 2353,87 17,60 2589 20

100 2340,60 17,10 2576 20

1hpi 1208 10

4 1135,43 12,17 1198 14

81 1128,30 12,37 1198 14

36 1120,17 12,64 1201 15

1pfc 2216 13

16 2442,97 14,23 2519 15

4 2435,90 14,33 2536 17

49 2425,17 14,17 2533 16

1ycc 963 11

36 883,93 6,9 1091 10

9 864,03 7,2 1093 10

64 859,47 6,7 1045 11

SOP, sum-of-pairs; ID, identity; Avg., Average. Avg. Best SOP and Avg. Best ID were obtained by averaging the results of 30 runs.

The existing theory for setting this value,

assumes that the solution set has a previously known

finite number of peaks q (Shir and Back, 2006).

By knowing the upper and lower bounds of each

objective, r is defined as follows:

()

∑

−=

kmsxk

xxr

min,,

(6)

Where n defines the number of objectives, which

in our particular case, is 2.

The lower and upper bounds of each dimension

are computed on every generation, presenting

different values as population evolves. However, in

multiple sequence alignment, there is no practical

way of knowing the maximum number of peaks

beforehand. Therefore, we opt to test several values

for this parameter, as next section describes.

5 TESTING AND RESULTS

Our goal is to find the best possible solutions which

maximize the sum-of-pairs and the identity of each

alignment. We test the sharing function with

different σ

values, which are obtained by

computing the nich radius for various peak values.

In our tests, we use six datasets from the

Reference 1 alignments of BAliBASE (Thompson et

al., 1999). Three of these datasets (1aho, 1fmb,

1plc,) have more than 35% of identity among its

sequences; and the rest (1hpi, 1pfc, 1ycc) present

20% to 40% of identity. We have measured the

sum-of-pairs score and the identity of each one of

these datasets. Later we use these reference results to

evaluate the different test configurations.

5.1 Test Configurations

Although we have tested all our datasets for 4, 9, 16,

25, 36, 49, 64, 81 and 100 peaks, we only present

the results for the 3 configurations which obtained

the best results on each dataset. Also, we have

started by executing the algorithm during 10000

generations with a mutation probability of 0.05, but

we have realized that an equivalent final solution set

could be achieved in 2000 generations in less time,

by increasing the mutation probability to 0.4.

Therefore, we have opted for this latter setting. The

remaining parameters are the same in all

configurations: the population size is 100, the

crossover probability is 0.8 and the number of

inserted gaps by the Gap Insertion and Smart Gap

Insertion operators is 10. Finally, the size of the

comparison set for the Pareto domination

tournaments is set to 10.

5.2 Results

Next we present the results of tests performed. All

the results were obtained by averaging the

sum-of-pairs and the identity scores, from 30 runs of

AlineaGA, for each configuration/dataset.

5.2.1 Performance

Table 1 summarizes the performance of the top 3

configurations for each test dataset. The “SOP” of

BAliBASE alignment column, presents the

sum-of-pairs score for the different datasets. This

value was computed using the PAM 350 scoring

matrix and a gap penalty of -10. The “ID” of

BAliBASE shows the number of fully aligned

A NICHED PARETO GENETIC ALGORITHM - For Multiple Sequence Alignment Optimization

327

columns on each BAliBASE’s alignment. Columns

“Avg. Best SOP” and “Avg. Best ID”, show the

average sum-of-pairs and the average identity scores

obtained in 30 runs of AlineaGA. The best values

found for the sum-of-pairs and identity scores are

presented in columns “Best SOP” and “Best ID”.

As the results state, it is not possible to establish

a direct relation between the number of peaks and

the percentage of identity of the alignments. This

parameter is directly related with each particular

alignment and can not be determined in such generic

way. Comparing with the BAliBASE alignments,

and with the exception of 1hpi dataset, it is possible

to find equal or higher values for both objectives

simultaneously in our results. However, the average

sum-of-pairs and average identity of the 30

executions of each test are superior only in 1fmb and

1pfc datasets.

5.2.2 Population’s Evolution

Figures 2 to 7, present the population’s fitness

evolution for the best configurations on each dataset.

These values were obtained by averaging each

solution’s sum-of-pairs and identity scores from the

30 runs of the program. Each figure shows the

representation of the population throughout the

generations in 4 particular moments: generations

500, 1000, 1500 and 2000 - the final solution set.

We can observe that high values for one of the

objectives, will necessarily lower other objective’

score. Also, after 2000 generations, we can see that

the majority of the population is tightly distributed

along the front. Nevertheless, there are a few

dominated solutions. These solutions result of

crossover and mutation, but generally, they are not

held. Dataset 1pfc, shown in Figure 6, presented the

most atypical evolution, with the resulting front

solutions distributed in a small space on which could

have featured some individuals with higher identity

values present in generation 1500.

6 CONCLUSIONS

By using a multiobjective approach in this domain,

we try to offer a solution to a very significant

limitation of multiple sequence alignment: its

mathematical approach. As stated before, the best

alignment is the one which presents the most

correspondences and the fewest differences, but

which may or may not be biologically meaningful

knowledge is needed to validate the results of an

alignment tool. By presenting a set of solutions

instead of a single one, it is possible for a biologist

to observe several hypotheses and so choose the one

which is closer to the biological reality.

9,5

10,5

11,5

1600 1650 1700 1750 1800 1850 1900 1950 2000

SOP

Generation: 500 Generation: 1000 Generation: 1500 Generation: 2000

Figure 2: Population average fitness for 1aho, 81 peaks.

23,5

24,5

25,5

1450 1500 1550 1600 1650 1700 1750 1800 1850

SOP

Generation: 500 Generation: 1000 Generation: 1500 Generation: 2000

Figure 3: Population average fitness for 1fmb, 36 peaks.

15,5

16,5

17,5

2000 2050 2100 2150 2200 2250 2300 2350 2400

SOP

Generation: 500 Generation: 1000 Generation: 1500 Generation: 2000

Figure 4: Population average fitness for 1plc, 4 peaks.

10,5

11,5

12,5

775 825 875 925 975 1025 1075 1125 1175

SOP

Generation: 500 Generation: 1000 Generation: 1500 Generation: 2000

Figure 5: Population average fitness for 1hpi, 4 peaks.

11,5

12,5

13,5

2100 2150 2200 2250 2300 2350 2400 2450 2500

SOP

Generation: 500 Generation: 1000 Generation: 1500 Generation: 2000

Figure 6: Population average fitness for 1pfc, 16 peaks.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

328

5,5

6,5

7,5

525 575 625 675 725 775 825 875 925

SOP

Generation: 500 Generation: 1000 Generation: 1500 Generation: 2000

Figure 7: Population average fitness for 1ycc, 36 peaks.

The main drawback of this method, as it is

implemented, is its dependence of previously

knowing the expected number of peaks in the search

space. This problem may be overcome by trying to

identify the number of peaks in the population

dynamically, or by using a different approach when

computing the nich radius, σ

Alternative objectives, such as minimizing the

number of gaps, may be used instead of maximizing

the identity. However, this kind of approach may

have poor results when several gaps are needed to

maximize the similarity among the sequences. A

possible solution is to increase the complexity of the

problem by optimizing three objectives: maximize

identity and sum-of-pairs scores, and minimize the

number of gaps in the alignment.

REFERENCES

Anbarasu, L. A., Narayanasamy, P. & Sundararajan, V.

(2000) Multiple molecular sequence alignment by

island parallel genetic algorithm. Current Science, 78,

858-863.

Chellapilla, K. & Fogel, G. B. (1999) Multiple sequence

alignment using evolutionary programming. IN

Angeline, P. J., Michalewicz, Z., Schoenauer, M.,

Yao, X. & Zalzala, A. (Eds.) Proceedings of the 1999

Congress on Evolutionary Computation. Washington

DC, USA, IEEE Press.

Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. C. (1978) A

Model of Evolutionary Change in Proteins. Atlas of

Protein Sequence and Structure. National Biomedical

Research Foundation.

De Jong, K. (1988) Learning with genetic algorithms: An

overview. Mach Learning, 3, 121-138.

Goldberg, D. E. (1989) Genetic Algorithms in Search,

Optimization, and Machine Learning Reading, MA,

Addison-Wesley Publishing Company.

Goldberg, D. E. & Richardson, J. (1987) Genetic

algorithms with sharing for multimodal function

optimization. Proceedings of the Second International

Conference on Genetic Algorithms on Genetic

algorithms and their application. Cambridge,

Massachusetts, United States, L. Erlbaum Associates

Inc.

Holland, J. H. (1975) Adaptation in natural and artificial

systems, Univ Mich Press. Ann Arbor.

Horn, J., Nafpliotis, N. & Goldberg, D. E. (1994) A niched

Pareto genetic algorithm for multiobjective

optimization. Proceedings of the First IEEE

Conference on Evolutionary Computation, IEEE

World Congress on Computational Intelligence 1, 82-

87.

Horng, J.-T., Lin, C.-M., Liu, B.-J. & Kao, C.-Y. (2000)

Using Genetic Algorithms to Solve Multiple Sequence

Alignments. IN Whitley, L. D., Goldberg, D. E.,

Cantu-Paz, E., Spector, L., Parmee, I. C. & Beyer, H.-

G. (Eds.) Proceedings of the Genetic and Evolutionary

Computation Conference (GECCO-2000). Las Vegas,

Nevada, USA, Morgan Kaufmann.

Horng, J., Wu, L., Lin, C. & Yang, B. (2005) A genetic

algorithm for multiple sequence alignment. Soft

Computing, 9, 407-420.

Lassmann, T. & Sonnhammer, E. L. L. (2002) Quality

assessment of multiple alignment programs. FEBS

Letters, 529, 126-130.

Michalewicz, Z. (1996) Genetic algorithms + data

structures = evolution programs - Third, Revised and

Extended Edition, Springer.

Notredame, C. & Higgins, D. G. (1996) SAGA: sequence

alignment by genetic algorithm. Nucleic Acids

Research, 24, 1515-1524.

Notredame, C., O'Brien, E. A. & Higgins, D. G. (1997)

RAGA: RNA sequence alignment by genetic

algorithm. Nucleic Acids Research, 25, 4570-4580.

Pal, S. K., Bandyopadhyay, S. & Ray, S. S. (2006)

Evolutionary computation in bioinformatics: A

review. IEEE Transactions on Systems Man and

Cybernetics Part C-Appl and Rev, 36, 601-615.

Shir, O. M. & Back, T. (2006) Niche radius adaptation in

the cma-es niching algorithm. Lecture Notes in

Computer Science, 4193, 142.

Silva, F. J. M., Sánchez Pérez, J. M., Gómez Pulido, J. A.

& Vega Rodríguez, M. Á. (2007) Alineamiento

Múltiple de Secuencias utilizando Algoritmos

Genéticos: Revisión. Segundo Congreso Español de

Informática. Zaragoza, Spain, CEDI.

Silva, F. J. M., Sánchez Pérez, J. M., Gómez Pulido, J. A.

& Vega Rodríguez, M. Á. (2008) AlineaGA: A

Genetic Algorithm for Multiple Sequence Alignment.

IN Nguyen, N. T. & Katarzyniak, R. (Eds.) New

Challenges in Applied Intelligence Technologies.

Springer-Verlag.

Silva, F. J. M., Sánchez Pérez, J. M., Gómez Pulido, J. A.

& Vega Rodríguez, M. Á. (2009) AlineaGA - A

Genetic Algorithm with Local Search Optimization for

Multiple Sequence Alignment. Applied Intelligence, 1-

Thompson, J. D., Plewniak, F. & Poch, O. (1999)

BAliBASE: a benchmark alignment database for the

evaluation of multiple alignment programs.

Bioinformatics, 15, 87-88.

Wang, C. & Lefkowitz, E. J. (2005) Genomic multiple

sequence alignments: refinement using a genetic

algorithm. BMC Bioinformatics, 6.

A NICHED PARETO GENETIC ALGORITHM - For Multiple Sequence Alignment Optimization

329