packet per cycle metric and simulated cycles. In case
of category 2 applications, light bursty traffic, i.e.
peak traffic rate less than three times of the average,
are added to nodes randomly.
5 FUTURE WORK
We plan to try different alternative distributions for
the data sets, especially for the category 2 data sets,
because power law distribution was less suitable for
them than for the category 1 data sets. One possibil-
ity is to use piecewise functions so that we fit different
power law distributions for different regions of values
of X. This corresponds to fitting a piecewise linear
function in the log-log domain of the data sets. We
also note that most of the data points in the log-log
domains in our tests gathered around higher values of
the ln X-axis with some outliers in the low-end of the
axis. This means that the high-end points dominate
the fitting process of least-squares method. We could
give more importance to the low-end points by weigh-
ing the low-end points more than high-end points.
This approach will also be tried in future work.
We also intent to analyse the distances between
source nodes and destination nodes in packets. Pre-
liminary results show that the data follow Gamma or
log-normal distribution, and a polynomial fitting can
be a viable solution. Moreover for real applications,
the average distance of all source-destination pairs in
packets seems to be higher than uniform random traf-
fic. The interval of packets is another possible topic,
however more applications are needed to be analysed
in the future. To show the effectiveness of the pro-
posed model, we aim to compare the generated traffic
with real application traffic with different metrics.
6 CONCLUSION
In this paper we investigated the detailed traffic pro-
files of different parallel and high performance com-
puting applications. We proposed a generic traffic
model based on the mathematical analysis of the traf-
fic traces. It is discovered that parallel applications
show different traffic patterns, however the patterns
can be categorized into groups, each with specific
parallel programming paradigms. Simulation results
show that both hot-spot and bursty traffic can be ob-
served. Several metrics concerning the applications
were studied. In addition we found the packet injec-
tion amount of nodes followed the power-law distri-
bution. Least squares fitting method was applied to
gather the parameters of the distribution of injected
packets by different nodes.
REFERENCES
Badr, M. and Jerger, N. (2014). Synfull: Synthetic traffic
models capturing cache coherent behaviour. In Com-
puter Architecture (ISCA), 2014 ACM/IEEE 41st In-
ternational Symposium on, pages 109–120.
Bahn, J. H. and Bagherzadeh, N. (2008). A generic traffic
model for on-chip interconnection networks. Network
on Chip Architectures, page 22.
Bienia, C., Kumar, S., Singh, J. P., and Li, K. (2008). The
parsec benchmark suite: characterization and archi-
tectural implications. In Proceedings of the 17th in-
ternational conference on Parallel architectures and
compilation techniques, PACT ’08, pages 72–81, New
York, NY, USA. ACM.
Bogdan, P., Kas, M., Marculescu, R., and Mutlu, O.
(2010). Quale: A quantum-leap inspired model
for non-stationary analysis of noc traffic in chip
multi-processors. In Networks-on-Chip (NOCS),
2010 Fourth ACM/IEEE International Symposium on,
pages 241–248.
Dally, W. J. and Towles, B. (2003). Principles and Practices
of Interconnection Networks. Morgan Kaufmann.
Intel (2015). Intel xeon processor e5-2699 v3.
http://ark.intel.com/products/81061/.
Kim, C., Burger, D., and Keckler, S. W. (2002). An adap-
tive, non-uniform cache structure for wire-delay dom-
inated on-chip caches. SIGARCH Comput. Archit.
News, 30(5):211–222.
Lee, Y., Grover, V., Krashinsky, R., Stephenson, M., Keck-
ler, S., and Asanovic, K. (2014). Exploring the de-
sign space of spmd divergence management on data-
parallel architectures. In Microarchitecture (MICRO),
2014 47th Annual IEEE/ACM International Sympo-
sium on, pages 101–113.
Liu, W., Xu, J., Wu, X., Ye, Y., Wang, X., Zhang, W.,
Nikdast, M., and Wang, Z. (2011). A noc traffic suite
based on real applications. In VLSI (ISVLSI), 2011
IEEE Computer Society Annual Symposium on, pages
66–71.
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D.,
Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A.,
and Werner, B. (2002). Simics: A full system simula-
tion platform. Computer, 35(2):50–58.
Martin, M. M., Sorin, D. J., Beckmann, B. M., Marty,
M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill,
M. D., and Wood, D. A. (2005). Multifacet’s gen-
eral execution-driven multiprocessor simulator (gems)
toolset. Computer Architecture News.
Mediatek (2015). Mediatek - true octa-core.
http://event.mediatek.com/ en octacore/.
Mostaghim, S., Branke, J., Lewis, A., and Schmeck,
H. (2008). Parallel multi-objective optimization us-
ing master-slave model on heterogeneous resources.
In Evolutionary Computation, 2008. CEC 2008.
(IEEE World Congress on Computational Intelli-
gence). IEEE Congress on, pages 1981–1987.
ICSOFT-EA2015-10thInternationalConferenceonSoftwareEngineeringandApplications
448