Davies, A. and Orsaria, A. (2013). Scale out with Glus-
terFS. Linux Journal, 2013(235).
Dean, J. and Ghemawat, S. (2004). Mapreduce: Simplified
data processing on large clusters. In Proceedings of
the 6th Conference on Symposium on Operating Sys-
tems Design & Implementation, OSDI’04, pages 10–
10. USENIX Association.
Delimitrou, C. and Kozyrakis, C. (2013). Paragon: QoS-
aware Scheduling for Heterogeneous Datacenters. In
Proceedings of the Eighteenth International Confer-
ence on Architectural Support for Programming Lan-
guages and Operating Systems, ASPLOS ’13, pages
77–88. ACM.
Delimitrou, C. and Kozyrakis, C. (2014). Quasar:
Resource-efficient and QoS-aware Cluster Manage-
ment. In Proceedings of the 19th International Con-
ference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS ’14,
pages 127–144. ACM.
Ferguson, A. D., Bodik, P., Kandula, S., Boutin, E., and
Fonseca, R. (2012). Jockey: Guaranteed Job La-
tency in Data Parallel Clusters. In Proceedings of the
7th ACM European Conference on Computer Systems,
EuroSys ’12, pages 99–112. ACM.
Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Sori-
ente, C., and Valduriez, P. (2012). StreamCloud: An
Elastic and Scalable Data Streaming System. IEEE
Trans. Parallel Distrib. Syst., 23(12):2351–2365.
Herodotou, H., Dong, F., and Babu, S. (2011a). No One
(Cluster) Size Fits All: Automatic Cluster Sizing for
Data-intensive Analytics. In Proceedings of the 2Nd
ACM Symposium on Cloud Computing, SOCC ’11,
pages 18:1–18:14. ACM.
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L.,
Cetin, F. B., and Babu, S. (2011b). Starfish: A Self-
tuning System for Big Data Analytics. In Proceedings
of the the 5th Conference on Innovative Data Systems
Research, CIDR ’11. CIDR 2011.
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A.,
Joseph, A. D., Katz, R., Shenker, S., and Stoica, I.
(2011). Mesos: A Platform for Fine-grained Resource
Sharing in the Data Center. In Proceedings of the
8th USENIX Conference on Networked Systems De-
sign and Implementation, NSDI’11, pages 295–308.
USENIX Association.
Hupfeld, F., Cortes, T., Kolbeck, B., Stender, J., Focht,
E., Hess, M., Malo, J., Marti, J., and Cesario, E.
(2008). The xtreemfs architecturea case for object-
based file systems in grids. Concurrency and compu-
tation: Practice and experience, 20(17):2049–2060.
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D.
(2007). Dryad: Distributed Data-parallel Programs
from Sequential Building Blocks. In Proceedings of
the 2Nd ACM SIGOPS/EuroSys European Conference
on Computer Systems 2007, EuroSys ’07, pages 59–
72. ACM.
Jia, Z., Zhan, J., Wang, L., Han, R., McKee, S. A., Yang,
Q., Luo, C., and Li, J. (2014). Characterizing and sub-
setting big data workloads. In Workload Characteri-
zation (IISWC), 2014 IEEE International Symposium
on, pages 191–201. IEEE.
Li, H., Ghodsi, A., Zaharia, M., Shenker, S., and Stoica, I.
(2014). Tachyon: Reliable, memory speed storage for
cluster computing frameworks. In Proceedings of the
ACM Symposium on Cloud Computing, pages 1–15.
ACM.
Lohrmann, B., Janacik, P., and Kao, O. (2015). Elastic
Stream Processing with Latency Guarantees. In Pro-
ceedings of the 35th IEEE International Conference
on Distributed Computing Systems, ICDCS’15, pages
399–410.
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkatara-
man, S., Liu, D., Freeman, J., Tsai, D., Amde, M.,
Owen, S., et al. (2016). Mllib: Machine learning in
apache spark. Journal of Machine Learning Research,
17(34):1–7.
Renner, T., Thamsen, L., and Kao, O. (2016). Coloc:
Distributed data and container colocation for data-
intensive applications. In Big Data (Big Data), 2016
IEEE International Conference on, pages 1–6. IEEE.
Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and
Wilkes, J. (2013). Omega: Flexible, Scalable Sched-
ulers for Large Compute Clusters. In Proceedings of
the 8th ACM European Conference on Computer Sys-
tems, EuroSys ’13, pages 351–364. ACM.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R.
(2010). The hadoop distributed file system. In
Mass Storage Systems and Technologies (MSST),
2010 IEEE 26th Symposium on, pages 1–10.
Thamsen, L., Rabier, B., Schmidt, F., Renner, T., and
Kao, O. (2017). Scheduling Recurring Distributed
Dataflow Jobs Based on Resource Utilization and In-
terference. In 2017 IEEE International Congress on
Big Data (BigData Congress), page to appear. IEEE.
Thamsen, L., Renner, T., and Kao, O. (2016a). Contin-
uously improving the resource utilization of iterative
parallel dataflows. In Distributed Computing Systems
Workshops (ICDCSW), 2016 IEEE 36th International
Conference on, pages 1–6. IEEE.
Thamsen, L., Verbitskiy, I., Schmidt, F., Renner, T., and
Kao, O. (2016b). Selecting resources for distributed
dataflow systems according to runtime targets. In In-
ternational Performance Computing and Communica-
tions Conference (IPCCC), 2016 IEEE 35th Interna-
tional Conference on, pages 1–6. IEEE.
Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S.,
Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.,
Seth, S., Saha, B., Curino, C., O’Malley, O., Radia,
S., Reed, B., and Baldeschwieler, E. (2013). Apache
Hadoop YARN: Yet Another Resource Negotiator. In
Proceedings of the 4th Annual Symposium on Cloud
Computing, SOCC ’13, pages 5:1–5:16. ACM.
Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D.,
Tune, E., and Wilkes, J. (2015). Large-scale Cluster
Management at Google with Borg. In Proceedings of
the Tenth European Conference on Computer Systems,
EuroSys ’15, pages 18:1–18:17. ACM.
Warneke, D. and Kao, O. (2009). Nephele: Efficient Par-
allel Data Processing in the Cloud. In Proceedings
of the 2Nd Workshop on Many-Task Computing on
Grids and Supercomputers, MTAGS ’09, pages 8:1–
8:10. ACM.
DATA 2017 - 6th International Conference on Data Science, Technology and Applications
46