architecture is also very demanding on the number
of Block RAM memories, as for “large” datasets, the
IP core uses the total amount of BRAMs whereas for
“small” datasets, it utilizes about 13% of the
memory. The designed system can process up to
10,000 sequences with 200 residues, each. As shown
in Table 3, the design was tested with six different
kinds of input sequences. The clock rate of this
architecture is 135 MHz for a single IP core. Table 3
shows the run times for all the measured input
sequences for a general purpose processor running
original software and for the designed IP core and
the perspective speedup. For the “large” datasets IP
Core is 4 to 5 times faster. For the “small” datasets
things are not that good but following the
considerations that we made for the T-Coffee IP
core, for the “small” datasets of MAFFT we can
assume that for a large modern FPGA device we can
have up to 15 parallel MAFFT IP cores, thus
achieving speedup from 10 to 55 times vs. a high
end general purpose processor.
These IP cores can be set up for different sizes of
datasets, which makes reconfigurable computing
preferable to VLSI due to the resulting flexibility to
“tune” the design to the dataset type.
5 CONCLUSIONS
Two of the five best known algorithms for multiple
sequence alignment implemented and used by the
European Bioinformatics Institute (EBI) are the
MAFFT and T-Coffee algorithms. This work
presents FPGA technology-based IP cores for the T-
Coffee and MAFFT algorithms. This is to the
authors’ knowledge the first work in the literature in
which there is an attempt to model these two
algorithms in reconfigurable hardware. Experimental
results show that reconfigurable technology can
offer significant performance boosting, especially in
cases in which the input data allows for high
parallelism. Future research will focus on
performance improvement of the designed IP cores
by increasing the number of parallel machines. As
internal memory (BRAMS) is the critical resource,
storing the input sequences in external memory
(DDR) can free the internal memory for more
parallel machines. The hardware integration of the
designed IP cores with the rest of the algorithm
running in software can lead to systems that can be
used by biologists.
ACKNOWLEDGEMENTS
This publication is based on work performed in the
framework of the FP7 Project OSMOSIS, which is
funded from the European Community’s Seventh
Framework Programme(FP7/2007-2013) under grant
agreement FP7-SME-222077.The authors would like
to acknowledge: the contributions to the OSMOSIS
Project of their colleagues in Algosystems SA,
Electronic Design Ltd, Dunvegan, Politecnico di
Torino, CEA, and TSI.
REFERENCES
Thompson, J. D., Higgins, D. G., Gibson, T. J., 1994.
CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through
sequence weighting, position-specific gap penalties
and weight matrix choice. Nucleic Acids Research, vol
22, pp. 4673-4690.
Notredame, C., Higgins, D. G., Heringa, J., 2000. T–
Coffee: A novel method for fast and accurate multiple
sequence alignment. Journal of Molecular Biology,
vol. 302, issue 1, pp. 205-217.
Katoh, K., Kuma, K., Toh, H., Miyata, T., 2005. MAFFT
version 5: improvement in accuracy of multiple
sequence alignment. Nucleic Acids Research, vol 33,
pp. 511-518.
Edgar, R. C., 2004. MUSCLE: a multiple sequence
alignment method with reduced time and space
complexity. BMC Bioinformatics, 5:113.
Do, C. B., Mahabhashyam, M. S. P., Brudno, M.,
Batzoglou, S., 2005. PROBCONS: Probabilistic
Consistency-based Multiple Sequence Alignment.
Genome Research, vol. 15, pp. 330-340.
Morgenster, B., French, K., Dress, A., Werner, T, 1998.
DIALIGN: Finding local similiraties by multiple
sequrnce alignment. Bioinformatics, vol. 14, No. 3, pp.
290-294.
Oliver, T., Schmidt, B., Nathan, D., Clemens, R., Maskel,
D., 2005. Using reconfigurable hardware to accelerate
multiple sequence alignment with ClustalW.
Bioinformatics, vol. 21, No. 16500, pp. 3431-3132.
Lin, X., Peiheng, Z., Dongbo, B., Shengzhong, F.,
Ninghui, S., 2005. To accelerate Multiple Sequence
Alignment using FPGAs. High-Performance
Computing in Asia-Pacific Region, pp. 176-180.
Boukerche, A., Correa, J. M., Melo, A. C. M. A., Jacobi,
R. P., Rocha, A. F., 2007. An FPGA-based accelerator
for multiple biological sequence alignment with
DIALIGN. International Conference on High
Performance Computing, pp. 71-82.
Masuno, S., Maruyama, T., et al., 2007. An FPGA
Implementation of Multiple Sequence Alignment
Based on Carrillo-Lipman Method. In Proceedings of
Field Programmable Logic and Applications, pp. 489-
492.
RECONFIGURABLE COMPUTING IP CORES FOR MULTIPLE SEQUENCE ALIGNMENT
221