be concluded that while Xeon Phi has 60 cores that
can execute four threads each, the number of threads
executed by each core should be one or two to get the
best performance on Xeon Phi for BWA and Bowtie2.
Meanwhile, a quad-core x86 CPU, Core i7 920
2.67GHz, did the same task by using eight threads
in 145 and 190 seconds for Bowtie2 and BWA, re-
spectively. Accordingly, this study is only the first
step towards acceleration of genome mapping by us-
ing Xeon Phi.
5 CONCLUDING REMARKS
Two well-known mapping tools, BWA and Bowtie2,
were ported to a many-core processor Xeon Phi. Pri-
mary obstacles in porting BWA and Bowtie2 were in-
compatibilities of vector operations used in these pro-
grams. These incompatibilities were circumvented by
emulating vector operations of x86 processors with
those of Xeon Phi. In a computational experiment,
it was confirmed that the more threads were used up
to 60 threads, the higher the performances of ported
programs were. The peak performances for BWA and
Bowtie2 were observed when 120 and 60 threads are
used, respectively. These results imply that using tens
of threads on the many-core processor Xeon Phi is
very much promising for accelerating mapping. In
addition, the ported programs successfully generated
exactly the same mapping results as the original BWA
and Bowtie2.
In future, the performances of BWA and Bowtie2
on Xeon Phi are expected to be further improved by
three ways. First, fully exploiting computation power
of Xeon Phi; for example, using all 32 vector regis-
ters at once. In this study, only vector operations of
x86 that has eight 128-bit vector registers were em-
ulated. Second, using Xeon Phi with x86 processors
in a coordinated manner. This enables x86 processors
and Xeon Phi to execute steps that fit their respective
architectures. Because the latest x86 processors are
faster than Xeon Phi for single-threaded processes,
steps that cannot be concurrently executed should be
done on x86 processors. Third, improving the rewrit-
ten code; for example, removing max operations and
min operations when results of mapping are not af-
fected by removal.
The hardware of Xeon Phi will also be updated.
The current release of Xeon Phi, codenamed Knights
Corner, is only the first product of a lineup of many-
core processors. It adopts a ring bus that becomes
a bottleneck when a large amount of data is moved
between cores and the memory. As new designs
come out, the architecture of Xeon Phi will evolve to
provide low-latency and high-bandwidth communica-
tions between cores.
REFERENCES
1000 Genomes Project Consortium (2010). A map of hu-
man genome variation from population-scale sequenc-
ing. Nature, 467(7319):1061–1073.
Farrar, M. (2007). Striped smith-waterman speeds database
searches six times over other simd implementations.
Bioinformatics, 23(2):156–161.
Gotoh, O. (1982). An improved algorithm for matching
biological sequences. Journal of Molecular Biology,
162(3):705 – 708.
Hatem, A., Bozdag, D., Toland, A., and Catalyurek, U.
(2013). Benchmarking short sequence mapping tools.
BMC Bioinformatics, 14:184.
Klus, P., Lam, S., Lyberg, D., Cheung, M., Pullan, G., Mc-
Farlane, I., Yeo, G., and Lam, B. (2012). Barracuda -
a fast short read sequence aligner using graphics pro-
cessing units. BMC Research Notes, 5:27.
Kurtz, M., Esteban, F. J., Hern´andez, P., Caballero, J. A.,
Guevara, A., Dorado, G., and G´alvez, S. (2013).
Many-core Tile64 vs. multi-core Intel Xeon: Bioinfor-
matics performance comparison. In VI Latin Ameri-
can Symposium on High Performance Computing HP-
CLatAm 2013, pages 134–144.
Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read
alignment with Bowtie 2. Nat Meth, 9(4):357–359.
Li, H. and Durbin, R. (2009). Fast and accurate short read
alignment with Burrows-Wheeler transform. Bioin-
formatics, 25(14):1754–1760.
Li, H. and Homer, N. (2010). A survey of sequence
alignment algorithms for next-generation sequencing.
Briefings in Bioinformatics, 11(5):473–483.
Liu, Y., Li, J.-Y., Mao, Y.-Q., Wang, X.-L., and Zhao, D.-S.
(2013). A literature evaluation of CUDA compatible
sequence aligners. In Bioinformatics 2013.
Manavski, S. and Valle, G. (2008). CUDA compatible GPU
cards as efficient hardware accelerators for smith-
waterman sequence alignment. BMC Bioinformatics,
9(Suppl 2):S10.
Smith, T. and Waterman, M. (1981). Identification of com-
mon molecular subsequences. Journal of Molecular
Biology, 147(1):195 – 197.
BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
232