4 DISCUSSION AND
CONCLUSIONS
Software transactional memory is most attractive
when the program can be structured as set of mostly-
independent operations, and where each operation
only involves a small set of variables. If the oper-
ations are completely independent, the problem most
likely can be trivially partitioned, and if the number of
variables involved in each operation is large, perfor-
mance will deteriorate as the transaction log increases
in size.
The overlap-layout-consensus approach to the se-
quence assembly problem fits well with these criteria,
and is well suited to an STM approach. In the im-
plementation presented, we observe a small overhead
for using software transactional memory compared to
regular arrays, and an additional overhead for using
a multi-threaded implementation compared to a sin-
gle threaded one, but the STM implementation scales
well with the number of threads, and already with two
threads it is substantially faster. Although the results
here are very promising, it remains to be seen how far
they generalize, both as the number of CPUs increase,
and to variations of the algorithm.
This analysis has concentrated on how to improve
the run-time performance of the scaffolding process.
This is an important goal in itself, but it is even
more important to improve the quality of the result-
ing genome assembly.
The composability of STM lets the programmer
easily refactor the program or otherwise modify the
algorithm without introducing deadlocks or other syn-
chronization problems. For instance, the current
implementation only considers the potential nearest
neighbors of each contig. Extending it to take into ac-
count a larger subgraph is one possibility in improv-
ing the result. With a traditional locking scheme, this
would likely increase the complexity substantially.
With STM, it would at worst increase the chance of
collisions between transactions, leading to more re-
tries, and consequently a slightly slower program.
The source code for the implementation is avail-
able
4
under the General Public License.
REFERENCES
454 Life Sciences Corp. (2010). 454 Sequencing System
Software Manual, v 2.5p1, part C. 454 Life Sciences
Corp., Branford, CT 06405.
4
http://malde.org/∼ketil/biohaskell/stmasm
Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D., and
Pirovano, W. (2011). Scaffolding pre-assembled con-
tigs using SSPACE. Bioinformatics, 27:578–579.
Bonfield, J. K., Smith, K. F., and Staden, R. (1995). A
new DNA sequence assembly program. Nucleic Acids
Research, 23:4992–4999.
Brevnov, E., Dolgov, Y., Kuznetsov, B., Yershov, D.,
Shakin, V., Chen, D.-Y., Menon, V., and Srinivas, S.
(2008). Practical experiences with java software trans-
actional memory. In Proceedings of the 13th ACM
SIGPLAN Symposium on Principles and practice of
parallel programming, PPoPP ’08, pages 287–288,
New York, NY, USA. ACM.
Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy,
M. (2005). Composable memory transactions. In
Proceedings of the tenth ACM SIGPLAN symposium
on Principles and practice of parallel programming,
PPoPP ’05, pages 48–60, New York, NY, USA. ACM.
Lee, E. A. (2006). The problem with threads. Technical
Report UCB/EECS-2006-1, EECS Department, Uni-
versity of California, Berkeley. The published version
of this paper is in IEEE Computer 39(5):33-42, May
2006.
Li, H. and Durbin, R. (2009). Fast and accurate short read
alignment with burrows-wheeler transform. Bioinfor-
matics, 25:1754–1760.
Margulies, M., Egholm, M., Altman, W. E., Attiya, S.,
Bader, J. S., et al. (2005). Genome sequencing in mi-
crofabricated high-density picolitre reactors. Nature,
437:376–380.
Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M.,
Fasulo, D. P., Flanigan, M. J., et al. (2000). A
whole-genome assembly of drosophila. Science,
287(5461):2196–2204.
Ni, Y., Welc, A., Adl-Tabatabai, A.-R., Bach, M.,
Berkowits, S., Cownie, J., Geva, R., Kozhukow, S.,
Narayanaswamy, R., Olivier, J., Preis, S., Saha, B.,
Tal, A., and Tian, X. (2008). Design and implementa-
tion of transactional constructs for c/c++. In Proceed-
ings of the 23rd ACM SIGPLAN conference on Object-
oriented programming systems languages and appli-
cations, OOPSLA ’08, pages 195–212, New York,
NY, USA. ACM.
Pevzner, P. A., Tang, H., and Waterman, M. S. (2001).
An eulerian path approach to dna fragment assem-
bly. Proceedings of the National Academy of Sciences,
98(17):9748–9753.
Shavit, N. and Touitou, D. (1995). Software transactional
memory. In Proceedings of the fourteenth annual
ACM symposium on Principles of distributed comput-
ing, PODC ’95, pages 204–213, New York, NY, USA.
ACM.
The SAM Format Specification Working Group (2011). The
SAM Format Specification.
APPENDIX
The code for the merging operation (as illustrated in
Figure 2 is given below. Note that the type signature is
CanSoftwareTransactionalMemoryMakeConcurrentProgramsSimpleandSafe?
227