performed a barrier after the packing. By contrast,
when the number of processes is 64, the latency of
put operations is increased if we performed a barrier
after the packing even in inter-node only simulation.
In this case, inter-node put operations may cause
communication contention.
5 CONCLUSIONS
In this paper, we introduced NSIM-ACE, a new
interconnection network simulator for RDMA
evaluation. We implemented it by extending NSIM
simulator for large-scale interconnection networks.
The NSIM-ACE has a user-friendly interface where
the communication pattern is given in a similar way
to RDMA-based parallel programs. We performed
three experiments for evaluating the simulation
accuracy and predicting performance scalability.
The experiment on random ring bandwidth shows
that the simulator produces bandwidth degradation
due to communication contention. The experiment
on synchronization barrier indicates that the
simulation accuracy is sufficient to compare
performance of RDMA-based algorithms and find
algorithm characteristics. In addition, NSIM-ACE
can predict the better algorithm for a communication
pattern appearing in a particle simulation.
ACKNOWLEDGEMENTS
This work is supported by Core Research for
Evolutional Science and Technology (CREST)
Program of Japan Science and Technology Agency
(JST), Research Area “Development of System
Software Technologies for post-Peta Scale High
Performance Computing”, Research Theme
“Development of Scalable Communication Library
with Technologies for Memory Saving and Runtime
Optimization“. A part of computation was carried
out using the computer facilities at Research
Institute for Information Technology, Kyushu
University.
REFERENCES
Adiga, N. R. et al., 2005. Blue Gene/L Torus
Interconnection Network. IBM Journal of Research
and Development, Vol.49, pp.265-276.
Bonachea, D., 2002. GASNet Specification, v1.1. U.C.
Berkeley Tech Report (UCB/CSD-02-1207).
Casanova, H., Giersch. A., Legrand, A., Quinson, M. and
Sutero, F., 2014. Versatile, Scalable, and Accurate
Simulation of Distributed Applications and Platforms.
Journal of Parallel and Distributed Computing,
Vol.74, No.10, pp. 2899-2917.
Choudhury, N., Mehta, Y., Wilmarth, T. L., Bohm, E. J.
and Kale, L.V., 2005. Scaling an Optimistic Parallel
Simulation of Large-scale Interconnection Networks.
Proc. 37th Conference on Winter Simulation,
Conference, WSC ’05, pp.591-600.
Luszczek, P. R., Bailey, D. H., Dongarra, J. J., Kepner, J.,
Lucas, R. F., Rabenseifner, R. and Takahashi, D.,
2006. The HPC Challenge (HPCC) Benchmark Suite.
Proc. 2006 ACM/IEEE Conference on
Supercomputing, SC ’06.
Miwa, H. et al., 2011. NSIM: An Interconnection Network
Simulator for Extreme-Scale Parallel Computers,
IEICE Transactions on Information and Systems,
Vol.94, No.12, pp.2298-2308.
Nieplocha, J. and Carpenter, B., 1999. ARMCI: A
Portable Remote Memory Copy Library for
Distributed Array Libraries and Compiler Run-time
Systems. Proc. RTSPP of IPPS/SDP’99.
Ridruejo, F. J. and Alonso, J. M., 2005. INSEE: An
Interconnection Network Simulation and Evaluation
Environment. Proc. 11th Euro-Par Parallel
Processing Conference 2005, Euro-Par’05, pp.1014-
1023.
Sumimoto, S., Ajima, Y., Saga, K., Nose, T., Shida, N.
and Nanri, T. 2016. The Design of Advanced
Communication to Reduce Memory Usage for Exa-
scale Systems. Proc. 12th International Meeting on
High Performance Computing for Computational
Science (accepted).
Susukita, R., Morie, Y., Nanri, T. and Shibamura, H.,
2015. Performance Evaluation of RDMA
Communication Patterns by Means of Simulations,
Proc. 2015 Joint International Mechanical, Electronic
and Information Technology Conference (JIMET
2015), pp.141-147.
Zheng, G., Kakulapati, G. and Kalé, L.V., 2004. BigSim:
A Parallel Simulator for Performance Prediction of
Extremely Large Parallel Machines. Parallel and
Distributed Processing Symposium, International,
Vol.1, p.78b.