performed a barrier after the packing. By contrast, 
when the number of processes is 64, the latency of 
put operations is increased if we performed a barrier 
after the packing even in inter-node only simulation. 
In this case, inter-node put operations may cause 
communication contention. 
5 CONCLUSIONS 
In this paper, we introduced NSIM-ACE, a new 
interconnection network simulator for RDMA 
evaluation. We implemented it by extending NSIM 
simulator for large-scale interconnection networks. 
The NSIM-ACE has a user-friendly interface where 
the communication pattern is given in a similar way 
to RDMA-based parallel programs. We performed 
three experiments for evaluating the simulation 
accuracy and predicting performance scalability. 
The experiment on random ring bandwidth shows 
that the simulator produces bandwidth degradation 
due to communication contention. The experiment 
on synchronization barrier indicates that the 
simulation accuracy is sufficient to compare 
performance of RDMA-based algorithms and find 
algorithm characteristics. In addition, NSIM-ACE 
can predict the better algorithm for a communication 
pattern appearing in a particle simulation. 
ACKNOWLEDGEMENTS 
This work is supported by Core Research for 
Evolutional Science and Technology (CREST) 
Program of Japan Science and Technology Agency 
(JST), Research Area “Development of System 
Software Technologies for post-Peta Scale High 
Performance Computing”, Research Theme 
“Development of Scalable Communication Library 
with Technologies for Memory Saving and Runtime 
Optimization“. A part of computation was carried 
out using the computer facilities at Research 
Institute for Information Technology, Kyushu 
University. 
REFERENCES 
Adiga, N. R. et al., 2005. Blue Gene/L Torus 
Interconnection Network. IBM Journal of Research 
and Development, Vol.49, pp.265-276. 
Bonachea, D., 2002. GASNet Specification, v1.1. U.C. 
Berkeley Tech Report (UCB/CSD-02-1207). 
Casanova, H., Giersch. A., Legrand, A., Quinson, M. and 
Sutero, F., 2014. Versatile, Scalable, and Accurate 
Simulation of Distributed Applications and Platforms. 
Journal of Parallel and Distributed Computing, 
Vol.74, No.10, pp. 2899-2917. 
Choudhury, N., Mehta, Y., Wilmarth, T. L., Bohm, E. J. 
and Kale, L.V., 2005. Scaling an Optimistic Parallel 
Simulation of Large-scale Interconnection Networks. 
Proc. 37th Conference on Winter Simulation, 
Conference, WSC ’05, pp.591-600. 
Luszczek, P. R., Bailey, D. H., Dongarra, J. J., Kepner, J., 
Lucas, R. F., Rabenseifner, R. and Takahashi, D., 
2006. The HPC Challenge (HPCC) Benchmark Suite. 
Proc. 2006 ACM/IEEE Conference on 
Supercomputing, SC ’06. 
Miwa, H. et al., 2011. NSIM: An Interconnection Network 
Simulator for Extreme-Scale Parallel Computers, 
IEICE Transactions on Information and Systems, 
Vol.94, No.12, pp.2298-2308. 
Nieplocha, J. and Carpenter, B., 1999. ARMCI: A 
Portable Remote Memory Copy Library for 
Distributed Array Libraries and Compiler Run-time 
Systems. Proc. RTSPP of IPPS/SDP’99. 
Ridruejo, F. J. and Alonso, J. M., 2005. INSEE: An 
Interconnection Network Simulation and Evaluation 
Environment.  Proc. 11th Euro-Par Parallel 
Processing Conference 2005, Euro-Par’05, pp.1014-
1023. 
Sumimoto, S., Ajima, Y., Saga, K., Nose, T., Shida, N. 
and Nanri, T. 2016. The Design of Advanced 
Communication to Reduce Memory Usage for Exa-
scale Systems. Proc. 12th International Meeting on 
High Performance Computing for Computational 
Science (accepted). 
Susukita, R., Morie, Y., Nanri, T. and Shibamura, H., 
2015. Performance Evaluation of RDMA 
Communication Patterns by Means of Simulations, 
Proc. 2015 Joint International Mechanical, Electronic 
and Information Technology Conference (JIMET 
2015), pp.141-147. 
Zheng, G., Kakulapati, G. and Kalé, L.V., 2004. BigSim: 
A Parallel Simulator for Performance Prediction of 
Extremely Large Parallel Machines. Parallel and 
Distributed Processing Symposium, International, 
Vol.1, p.78b.