Non-functional Requirements for Real World Big Data Systems
An Investigation of Big Data Architectures at Facebook, Twitter and Netflix
Thalita Vergilio and Muthu Ramachandran
School of Computing, Creative Technologies and Engineering, Leeds Beckett University, Leeds, U.K.
Keywords: Big Data, Facebook, Twitter, Netflix, Scalability, Distribution, Fault Tolerance, Processing Guarantees.
Abstract: This research represents a unique contribution to the field of Software Engineering for Big Data in the form
of an investigation of the big data architectures of three well-known real-world companies: Facebook, Twitter
and Netflix. The purpose of this investigation is to gather significant non-functional requirements for real-
world big data systems, with an aim to addressing these requirements in the design of our own unique
architecture for big data processing in the cloud: MC-BDP (Multi-Cloud Big Data Processing). MC-BDP
represents an evolution of the PaaS-BDP architectural pattern, previously developed by the authors. However,
its presentation is not within the scope of this paper. The scope of this comparative study is limited to the
examination of academic papers, technical blogs, presentations, source code and documentation officially
published by the companies under investigation. Ten non-functional requirements are identified and discussed
in the context of these companies’ architectures: batch data, stream data, late and out-of-order data, processing
guarantees, integration and extensibility, distribution and scalability, cloud support and elasticity, fault-
tolerance, flow control, and flexibility and technology agnosticism. They are followed by the conclusion and
considerations for future work.
1 INTRODUCTION
Big data is defined as data that challenges existing
technology for being too large in volume, too fast, or
too varied in structure. Big data is also characterised
by its complexity, with associated issues and
problems that challenge current data science
processes and methods (Cao, 2017). Large internet-
based companies have the biggest and most complex
data, which explains their leading role in the
development of state-of-the-art big data technology.
This paper contributes to the existing knowledge
in the area of Software Engineering for Big Data by
performing a search of the existing literature
published by three major companies, and an outline
of the strategies devised by them to cope with the
technological challenges posed by big data in their
production systems. Non-Functional requirements
are important quality attributes which influence the
architectural design of a system (Chung and Prado
Leite, 2009). Ten non-functional requirements for big
data systems are identified and discussed in the
context of these real-world implementations. These
requirements are used to guide the design and
development of a new architecture for big data
processing in the cloud: MC-BDP. The presentation,
evaluation and discussion of MC-BDP shall be
addressed in a future paper.
The companies targeted for this study are
Facebook, Twitter and Netflix. The methodology
used in this comparative study is explained in Section
2, which covers the scope of this research, as well as
the selection criteria used. Section 4 presents the non-
functional requirements and discusses how they are
implemented by the three companies in their
production systems. Finally, section 5 presents the
conclusion and considerations for future work.
2 METHODOLOGY
This section starts by defining the scope of this
research. This is followed by an explanation of the
criteria used to select the companies under
examination.
2.1 Scope
The scope of this paper is limited to academic papers,
technical documentation, and presentations or blog
Vergilio, T. and Ramachandran, M.
Non-functional Requirements for Real World Big Data Systems - An Investigation of Big Data Architectures at Facebook, Twitter and Netflix.
DOI: 10.5220/0006825408330840
In Proceedings of the 13th International Conference on Software Technologies (ICSOFT 2018), pages 833-840
ISBN: 978-989-758-320-9
Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
833
posts officially published by the companies under
evaluation. Table 1 shows a summary of the materials
used as source for this research, classified by type.
Table 1: Classification and summary of source materials.
Company
Academic Paper
Technical
Blog
Presentation
Code/
Documentation
Faceboo
k
2 2 1 1
Twitte
r
2 2 1 3
Netflix 1 8 2 0
2.2 Selection Criteria
An initial survey was conducted, limited to peer-
reviewed academic papers. Three search engines
were primarily used to perform the searches: Google
Scholar, IEEE Xplore Digital Library and ACM
Digital Library. The initial survey searched for terms
such as “big data”, “big data processing”, “big data
software” and “big data architecture”. For the sake of
thoroughness, synonyms were used to replace key
terms where appropriate, e.g. “system” for
“software”.
The first classification which became apparent
was in terms of who developed the solutions
presented. The results found comprised technologies
developed 1) by academia, 2) by real-world big data
companies, 3) by industry experts as open-source
projects, or 4) by a combination of the above. This
research focuses on category number 2.
A further classification can be drawn from the
academic papers reviewed, this time in terms of how
the contributions presented were evaluated. Three
cases were encountered:
A) cases where there is no empirical evaluation of the
proposed solution.
B) cases where the empirical evaluation of the
proposed solution is purely experimental.
C) cases where peer-reviewed published material
was found describing the results of implementing
the proposed solution in large-scale commercial
big data settings.
In order to select suitable companies to include in this
study, the focus of this research was limited to
category C.
Three companies or cases were selected within the
criteria characterised above: Facebook, Twitter and
Netflix. These were selected from a wider pool of
qualifying companies which included Microsoft
(Eliot, 2010), (Bernstein et al., 2014), Google
(Akidau et al., 2013), (Akidau et al., 2015), and
Santander (Cheng et al., 2015). The rationale for
choosing the three aforementioned companies is
based on the quantity, quality and clarity of the
information encountered, as well as availability of
technical material online such as project
documentation and architectural diagrams.
3 NON-FUNCTIONAL
REQUIREMENTS
This section presents ten non-functional requirements
discussed in the literature published by the three
companies selected in section 2.2. It then examines
how they implemented these requirements and
compares the different solutions.
3.1 Batch Data
This requirement refers to the capability to process
data which is finite and usually large in volume, e.g.
data archived in distributed file systems or databases.
Both Facebook and Twitter estimate that the finite
data they hold on disk reaches hundreds of petabytes,
with a daily processing volume of tens of petabytes
(Krishnan, 2016). Netflix’s big data is one order of
magnitude smaller, with tens of petabytes in store and
daily reads of approximately 3 petabytes (Gianos and
Weeks, 2016).
Facebook uses a combination of three
independent, but communicating systems to manage
its stored data: an Operational Data Store (ODS),
Scuba, Hive and Laser (Chen et al., 2016).
Twitter’s batch data is stored in Hadoop clusters
and traditional databases, and is processed using
Scalding and Presto (Krishnan, 2016). Scalding is a
Scala library developed in-house to facilitate the
specification of map-reduce jobs (Twitter, Inc.,
2018). Presto, on the other hand, was originally
developed by Facebook. It was open-sourced in 2013
(Pearce, 2013), and has since been adopted not only
by Twitter, but also by Netflix (Tse et al., 2014).
Differently from the previous two companies,
Netflix’s Hadoop installation is cloud-based, and it
uses an in-house developed system called Genie to
manage query jobs submitted via Hadoop, Hive or
Pig. Data is also persisted in Amazon S3 databases
(Krishnan and Tse, 2013).
3.2 Stream Data
This requirement refers to the capability to process
data which is potentially infinite and usually flowing
at high velocity, e.g. monitoring data captured and
SE-CLOUD 2018 - Special Session on Software Engineering for Service and Cloud Computing
834
processed in real-time, or close to real-time.
Stream processing at Facebook is done by a suite
of in-house developed applications: Puma, Swift and
Stylus. Puma is a stream processing application with
a SQL-like query language optimised for compiled
queries. Swift is a much simpler application, used for
checkpointing. Finally, Stylus is a stream processing
framework which combines stateful or stateless units
of processing into more complex DAGs (Chen et al.,
2016).
Storm, one of the most popular stream processing
frameworks in use today, was developed by Twitter
(Toshniwal et al., 2014). Less than five years after the
initial release of Storm, however, Twitter announced
that it had replaced it with a better performing system,
Heron, and that Storm had been officially
decommissioned (Fu et al., 2017). Heron uses Mesos,
an open-source cluster management tool designed for
large clusters. It also uses Aurora, a Mesos
framework developed by Twitter to schedule jobs on
a distributed cluster.
Netflix also uses Mesos to manage its large cluster
of cloud resources. Scheduling is done by a custom
library called Fenzo, whereas stream processing is
done by Mantis, which is also custom-developed.
3.3 Late and out of Order Data
This requirement relates to stream processing and
refers to the capability to process data which arrives
late or in a different order from that in which it was
emitted. Streaming data from mobile users, for
example, could be delayed if the user loses reception
for a moment. In order to handle late and out of order
data, a system must have been designed with this
requirement in mind.
All three streaming architectures utilise the
concept of windows of data to transform infinite
streaming data into finite windows that can be
processed individually.
For handling late and out of order data,
Facebook’s Stylus utilises low watermarks. No
mention was found in Twitter Heron’s academic
paper of whether it provides a mechanism for dealing
with late or out of order data. However, looking at the
source code for the Heron API, the
BaseWindowedBolt class, merged into the master
project in 2017, has a method called withLag(), which
allows the developer to specify the maximum amount
of time by which a record can be out of order (Peng,
2017).
No mention was found in documentation
published by Netflix of Mantis’s strategy for dealing
with late and out of order data. Because the source
code for Mantis is proprietary, further investigation
was limited.
3.4 Processing Guarantees
This requirement refers to a stream system’s
capability to offer processing guarantees, i.e. exactly
once, at least once and at most once. While exactly
once processing is ideal, it comes at a cost which
could translate into increased latency.
Exactly once semantics involves some level of
checkpointing to persist state. There is therefore an
inherent latency cost associated with it, which is why
not all use-cases are implemented this way. Scuba at
Facebook, for example, is a system where data is
intended to be sampled, so completeness of the data
is not a requirement. Duplication, however, would not
be acceptable. In this case, at most once is a more
fitting processing guarantee than exactly once (Chen
et al., 2016). Stylus is the only real-time system at
Facebook designed with optimisations to provide at
least once processing semantics. This is enabled by
Swift’s use of Scribe as a messaging system, which is
backed by Swift for checkpointing. (Chen et al.,
2016).
At Twitter, both Storm and its successor, Heron,
offered at least once and at most once guarantees.
Identified as a shortcoming by Kulkarni et al., (2015),
the lack of exactly once semantics in Heron was
recently addressed and implemented as “effectively
once semantics”. This means that data may be
processed more than once (the topology would
undergo a rewind in case of failure), but it is only
delivered once (2018).
Netflix uses Kafka as its stream platform and
messaging system (Wu et al., 2016), which means it
provides inherent support for exactly once processing
through idempotency and atomic transactions
(Woodie, 2017).
3.5 Integration and Extensibility
This requirement refers to the capability to integrate
with existing services and components. It also refers
to provisions made to facilitate the extension of the
existing architecture to incorporate different
components in the future.
Although Facebook’s real-time architecture is
composed of many systems, they are integrated
thanks to Scribe. Scribe works as a messaging system:
all of Facebook’s streaming systems write to Scribe,
and they also read from Scribe. This allows for the
creation of complex pipelines to cater for a multitude
of use-cases (Chen et al., 2016). In terms of
Non-functional Requirements for Real World Big Data Systems - An Investigation of Big Data Architectures at Facebook, Twitter and
Netflix
835
extensibility, any service developed to use Scribe as
data source and output could integrate seamlessly
with Facebook’s architecture.
As part of a process to make Heron open-source,
Twitter introduced a number of improvements to
make it more flexible and adaptable to different
infrastructures and use-cases. By adopting a general-
purpose modular architecture, Heron achieved
significant decoupling between its internal
components, and increased its potential for adoption
and extension by other companies (Fu et al., 2017).
Netflix’s high level architecture is somewhat rigid
in that there is no alternative to using Mesos as an
orchestration and cluster management tool.
Additionally, Titus must run as a single framework on
top of Mesos. This limitation however was introduced
by design. With Titus running as a single framework
on Mesos, it can allocate tasks more efficiently, with
visibility of resources across the entire cluster (Leung
et al., 2017). At the time of writing, Titus is not yet
open-source, so decoupling its components from
Netflix-specific infrastructure and use-cases is not a
requirement.
3.6 Distribution and Scalability
This requirement refers to the capability to distribute
data processing amongst different machines, located
in different data centres, in a multi-clustered
architecture. Dynamic scaling, which addresses the
possibility of adding or removing nodes to a running
system without any downtime, is also addressed as
part of this requirement.
Scalability was one of the driving factors behind
the development of Scribe as a messaging system at
Facebook. Similarly to Kafka, Scribe can be scaled
up by increasing the number of buckets (brokers)
running, thus increasing the level of parallelism
(Chen et al., 2016). There is no mechanism in place
for dynamic scaling of Puma and Stylus systems
(Chen et al., 2016).
Heron was developed as a more efficient and
scalable alternative to Storm. Heron, uses an in-house
developed proprietary framework called Dhalion to
help determine whether the cluster needs to be scaled
up or down (Graham, 2017).
As Netflix’s architecture is cloud-based, it is
inherently elastic and scalable. Fenzo is responsible
for dynamically scaling resources by adding or
removing EC2 nodes to the Mesos infrastructure as
needed (Schmaus et al., 2016).
3.7 Cloud Support and Elasticity
This requirement refers to the capability to move the
architecture (or part of it) into the cloud to take
advantage of the many benefits associated with its
economies of scale. Elasticity in particular is a cloud
property which allows a system to scale up and down
according to demand. Since the user only pays for
resources actually used, there is less wastage and it is
theoretically cheaper than running the entire
infrastructure locally with enough idle capacity to
cover for any eventual spike. While scalability can be
gained by increasing the number of nodes in any
traditional architecture, it becomes much more
powerful when combined with the elasticity of the
cloud.
Based on the material examined, Neflix’s
architecture is the only which is predominantly cloud-
based. Having started with services running on AWS
virtual machines, they are now undergoing a shift
towards a container-based approach, with a few
services now running in containers on AWS
infrastructure (Leung et al., 2017). Twitter has also
undergone a shift towards a containerised
architecture, albeit not cloud-based, with the
development and implementation of Heron. As
containers become more widespread, the risk of
vendor lock-in is lowered, since containers enable the
decoupling of the processing framework from the
infrastructure they run in. Future migration to a safer
multi-cloud setup is not only possible, but desirable
(Vergilio and Ramachandran, 2018).
3.8 Fault Tolerance
This requirements refers to the capability of a system
to continue to operate should one or more nodes fail.
Ideally, the system should recover gracefully, with
minimal repercussions on the user experience.
Fault-tolerance is a requirement of Facebook’s
real-time systems, currently implemented through
node independence, and by using Scribe for all
communication between systems. Scribe persists data
to disk and is backed by Swift, a stream platform
designed to provide checkpointing for Scribe. (Chen
et al., 2016).
At Twitter, fault tolerance is addressed at different
levels. At architectural level, a modular distributed
architecture provides better fault tolerance than a
monolithic design. At container level, resource
provisioning and job scheduling are decoupled, with
the scheduler responsible for monitoring the status of
running containers and for trying to restart any failed
ones, along with the processes they were running. At
JVM level, Heron limits task processing to one per
JVM. This way, should failure occur, it is much easier
to isolate the failed task and the JVM where it was
running (Fu et al., 2017). At topology level, the
SE-CLOUD 2018 - Special Session on Software Engineering for Service and Cloud Computing
836
management of running topologies is decentralised,
with one Topology Master per topology, which
means failure of one topology does not affect others
(Kulkarni et al., 2015).
As Netflix’s production systems are cloud-based,
fault tolerance is addressed from the perspective of a
cloud consumer. The Active-Active project was
launched by Netflix with the aim of achieving fault
tolerance through isolation and redundancy by
deploying services to the US across two AWS
regions: US-East-1 and US-West-2 (Meshenberg et
al., 2013). This project was later expanded to
incorporate the EU-West-1 region, as European
locations were still subjected to single points of
failure (Stout, 2016). With this latest development,
traffic could be routed between any of the three
regions across the globe, increasing the resilience of
Netflix’s architecture.
3.9 Flow Control
This requirement refers to the capability to handle
scenarios where the data source is emitting records
faster than the system can consume. Architectures
typically provide strategies for dealing with
backpressure, e.g. dropping records, sampling,
applying source backpressure, etc.
All real-time systems at Facebook read and write
to Scribe. As described by Chen et al., this central use
of a persistent messaging system makes Facebook’s
real-time architecture resilient to backpressure. Since
nodes are independent, if one node slows down, the
job is simply allocated to a different node, instead of
the slowing down the whole pipeline (2016). The
exact strategy used by Scribe to implement flow
control is not made explicit in the paper.
Heron was designed with a flow control
mechanism as an improvement over Storm, where
producers dropped data if consumers were too busy
to receive it. When Heron is in backpressure mode,
the Stream Manager tightens the furthest upstream
component (the spout) to slow down the flow of data
through the topology. The data processing speed is
thus reduced to the speed of the slowest component.
Once backpressure is relieved and Heron exits
backpressure mode, the spout is set back to emit
records at its normal rate (Kulkarni et al., 2015).
Mantis jobs at Netflix are written using
ReactiveX, a collection of powerful open-source
reactive libraries for the JVM (Christiansen and
Husain, 2013). RxJava, one of the libraries in
ReactiveX originally developed by Netflix, offers a
variety of strategies for dealing with backpressure
such as, for example, the concept of a cold
observable, which only starts emitting data if it is
being observed, and at a rate controlled by the
observer. For hot observables which emit data
regardless of whether or not they are being observed,
RxJava provides the options to buffer, sample,
debounce or window the incoming data (Gross and
Karnok, 2016).
3.10 Flexibility and Technology
Agnosticism
This criterion refers to the capability of an
architecture to use different technology in place of
existing components.
Out of the three architectures investigated,
Facebook’s setup is the least flexible and the least
technologically agnostic. With the exception of Hive
and its ODS, built on HBase (Tang, 2012),
Facebook’s data systems were developed in-house to
cater for very specific use-cases. This is perhaps the
reason why, at the time of writing, only Scribe has
been made open-source (Johnson, 2008), although it
was not developed further, and the source-code is
archived (Facebook Archive, 2014).
Heron’s modular architecture is flexible by
design, and the technologies chosen for Twitter’s
particular implementation, Aurora and Mesos are not
compulsory for other implementations. Heron’s
flexibility is evidenced by its adoption by large scale
companies such as Microsoft (Ramasamy, 2016), and
its technology agnosticism is evidenced by its
successful implementation on a Kubernetes cluster
(Kellogg, 2017).
At programming level, Netflix is an active
participant of the Reactive Streams initiative, which
aims to standardise reactive libraries with an aim to
rendering them interoperable. Considering that JDK
9, released in September 2017, is also compatible
with Reactive Streams, there is potential for Mantis’s
jobs to be defined in standard Java.
At cloud infrastructure level, the use of containers
as a deployment abstraction reduces the tight
coupling between Netflix’s artifacts and specific
virtual machine offerings provided by AWS. This is
defined by Leung et al., (2017) as a shift to a more
application-centric deployment.
At architecture level, because Titus is not open-
source, it is difficult to evaluate whether essential
parts of its architecture such as the Mantis, Fenzo or
the Mesos cluster could be replaced with an
equivalent. Work however is under way to make the
project open-source (Netflix TechBlog, 2017), which
could attract important contributions from the
Non-functional Requirements for Real World Big Data Systems - An Investigation of Big Data Architectures at Facebook, Twitter and
Netflix
837
community and enhance its flexibility and technology
agnosticism.
4 CONCLUSION AND FUTURE
WORK
This paper presented the results of a literature search
for non-functional requirements relevant to real-
world big-data implementations. Three companies
were selected for this comparative study: Facebook,
Twitter and Netflix. Their specific implementations
of the non-functional requirements selected were
compared and discussed in detail, and are
summarised in this section.
Facebook and Twitter process the largest volume
of data, with Twitter having the lowest requirement
for latency. These two architectures were also
explicitly designed to handle late and out of order
data. In terms of processing guarantees, all three
architectures support exactly-once semantics.
Although the existing systems at Facebook and
Netflix are integrated, they were not designed as a
unified modular framework. Heron, on the other
hand, was developed by Twitter as an improvement
over Storm, which suffered from bottlenecks and
single points of failure. Heron’s modular architecture
makes it more flexible and technologically agnostic,
as well as a stronger candidate for adoption by other
companies.
Differently from Facebook and Twitter, which
provide mechanisms for scalability and fault
tolerance in their infrastructures, Netflix approaches
this concept from a cloud consumer’s perspective,
since its architecture is cloud-based. Netflix’s
deployments are distributed over multiple regions,
although support for multi-cloud is still lacking.
All three architectures provide mechanisms for
flow control. Facebook and Twitter control
backpressure from an infrastructure level, whereas
Netflix provides methods and constructs to achieve
this programmatically.
Our next step in this research is to use the non-
functional requirements discussed in this study to
guide the design and implementation of a new
architecture for big data processing in the cloud: MC-
BDP. MC-BDP is an evolution of the PaaS-BDP
architectural pattern originally proposed by the
authors. While PaaS-BDP introduced a framework-
agnostic programming model and enabled different
frameworks to share a pool of location and provider-
independent resources (Vergilio and Ramachandran,
2018), MC-BDP expands this model by explicitly
prescribing a pooled environment where nodes are
deployed to multiple clouds.
ACKNOWLEDGEMENTS
This work made use of the Open Science Data Cloud
(OSDC) which is an Open Commons Consortium
(OCC)-sponsored project.
Cloud computing resources were provided by a
Microsoft Azure for Research award.
Container and cloud native technologies were
provided by Weaveworks.
REFERENCES
Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S.,
Fernández-Moctezuma, R. J., Lax, R., McVeety, S.,
Mills, D., Perry, F., Schmidt, E. and Whittle, S. (2015)
The Dataflow Model: A Practical Approach to
Balancing Correctness, Latency, and Cost in Massive-
Scale, Unbounded, Out-of-Order Data Processing.
Proceedings of the VLDB Endowment, 8, pp. 1792–
1803.
Akidau, T., Whittle, S., Balikov, A., Bekiroğlu, K.,
Chernyak, S., Haberman, J., Lax, R., McVeety, S.,
Mills, D. and Nordstrom, P. (2013) MillWheel: Fault-
Tolerant Stream Processing at Internet Scale.
Proceedings of the VLDB Endowment, 6 (11) August,
pp. 1033–1044.
Bernstein, P., Bykov, S., Geller, A., Kliot, G. and Thelin, J.
(2014) Orleans: Distributed Virtual Actors for
Programmability and Scalability [Online]. Available
from: <https://www.microsoft.com/en-us/research/pu
blication/orleans-distributed-virtual-actors-for-progra
mmability-and-scalability/>.
Cao, L. (2017) Data Science: Challenges and Directions.
Communications of the ACM, 60 (8) July, pp. 59–68.
Chen, G. J., Wiener, J. L., Iyer, S., Jaiswal, A., Lei, R.,
Simha, N., Wang, W., Wilfong, K., Williamson, T. and
Yilmaz, S. (2016) Realtime Data Processing at
Facebook. In: Proceedings of the 2016 International
Conference on Management of Data, 2016. New York,
NY, USA: ACM, pp. 1087–1098.
Cheng, B., Longo, S., Cirillo, F., Bauer, M. and Kovacs, E.
(2015) Building a Big Data Platform for Smart Cities:
Experience and Lessons from Santander. In: 2015 IEEE
International Congress on Big Data, June 2015. pp.
592–599.
Christiansen, B. and Husain, J. (2013) Reactive
Programming in the Netflix API with RxJava. Netflix
TechBlog, 4 December [Online blog]. Available from:
<https://medium.com/netflix-techblog/reactive-
programming-in-the-netflix-api-with-rxjava-
7811c3a1496a> [Accessed 16 February 2018].
SE-CLOUD 2018 - Special Session on Software Engineering for Service and Cloud Computing
838
Chung, L. and Prado Leite, J. C. (2009) Conceptual
Modeling: Foundations and Applications. Berlin,
Heidelberg: Springer-Verlag, pp. 363–379.
Eliot, S. (2010) Microsoft Cosmos: Petabytes Perfectly
Processed Perfunctorily [Online blog]. Available from:
<https://blogs.msdn.
microsoft.com/seliot/2010/11/05/microsoft-cosmos-
petabytes-perfectly-processed-perfunctorily/>
[Accessed 24 January 2018].
Facebook Archive (2014) Scribe [Online]. Facebook
Archive. Available from: <https://github. com/face
bookarchive/scribe> [Accessed 15 February 2018].
Fu, M., Agrawal, A., Floratou, A., Graham, B., Jorgensen,
A., Li, M., Lu, N., Ramasamy, K., Rao, S. and Wang,
C. (2017) Twitter Heron: Towards Extensible
Streaming Engines. In: 2017 IEEE 33rd International
Conference on Data Engineering (ICDE), April 2017.
pp. 1165–1172.
Gianos, T. and Weeks, D. (2016) Petabytes Scale Analytics
Infrastructure @Netflix [Online]. Presented at: QCon,
August 11, 2016, San Francisco. Available from:
<https://www.infoq.com/presentations/netflix-big-
data-infrastructure> [Accessed 12 February 2018].
Graham, B. (2017) From Rivulets to Rivers: Elastic Stream
Processing in Heron [Online]. Available from:
<https://www.slideshare.net/billonahill/from-rivulets-
to-rivers-elastic-stream-processing-in-heron>
[Accessed 14 February 2018].
Gross, D. and Karnok, D. (2016) Backpressure [Online].
ReactiveX/RxJava Wiki. Available from:
<https://github.com/ReactiveX/RxJava/wiki/Backpress
ure> [Accessed 15 February 2018].
Heron Documentation - Heron Delivery Semantics (2018)
[Online]. Available from: <https://twitter.github.io/
heron/docs/concepts/delivery-semantics/> [Accessed
14 February 2018].
Johnson, R. (2008) Facebook’s Scribe Technology Now
Open Source. Facebook Code, 24 October [Online
blog]. Available from: <https://code.facebook.com/
posts/214389698718537/facebook-s-scribe-technology
-now-open-source/> [Accessed 15 February 2018].
Kellogg, C. (2017) The Heron Stream Processing Engine
on Google Kubernetes Engine. Streamlio, 28
November [Online blog]. Available from:
<https://streaml.io/blog/heron-on-gke-power-by-
kubernetes/> [Accessed 15 February 2018].
Krishnan, S. (2016) Discovery and Consumption of
Analytics Data at Twitter. 29 June [Online blog].
Available from: <https://blog.twitter.
com/engineering/en_us/topics/insights/2016/discovery
-and-consumption-of-analytics-data-at-twitter.html>
[Accessed 9 February 2018].
Krishnan, S. and Tse, E. (2013) Hadoop Platform as a
Service in the Cloud. The Netflix Tech Blog, 10 January
[Online blog]. Available from: <http://techblog.netflix.
com/2013/01/hadoop-platform-as-service-in-cloud.ht
ml> [Accessed 30 October 2016].
Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg,
C., Mittal, S., Patel, J. M., Ramasamy, K. and Taneja,
S. (2015) Twitter Heron: Stream Processing at Scale.
In: Proceedings of the 2015 ACM SIGMOD
International Conference on Management of Data,
2015. New York, NY, USA: ACM, pp. 239–250.
Leung, A., Spyker, A. and Bozarth, T. (2017) Titus:
Introducing Containers to the Netflix Cloud. Queue, 15
(5) October, pp. 30:53–30:77.
Meshenberg, R., Gopalani, N. and Kosewski, L. (2013)
Active-Active for Multi-Regional Resiliency. Netflix
TechBlog, 2 December [Online blog]. Available from:
<https://medium.com/netflix-techblog/active-active-
for-multi-regional-resiliency-c47719f6685b>
[Accessed 15 February 2018].
Pearce, J. (2013) 2013: A Year of Open Source at Facebook
[Online]. Facebook Code. Available from:
<https://code.facebook.com
/posts/604847252884576/2013-a-year-of-open-source-
at-facebook/> [Accessed 12 February 2018].
Peng, B. (2017) [ISSUE-1124] - Windows Bolt Support
#2241 [Online] [Heron]. Twitter, Inc. Available from:
<https://github.com/twitter/heron/pull/2241>
[Accessed 12 February 2018].
Ramasamy, K. (2016) Open Sourcing Twitter Heron.
Twitter Engineering Blog, 25 May [Online blog].
Available from: <https://blog.twitter.com/engineering/
en_us/topics/open-source/2016/open-sourcing-twitter-
heron.html> [Accessed 15 February 2018].
Schmaus, B., Carey, C., Joshi, N., Mahilani, N. and Podila,
S. (2016) Stream-Processing with Mantis. Netflix
TechBlog, 14 March [Online blog]. Available from:
<https://medium.com/netflix-techblog/stream-
processing-with-mantis-78af913f51a6> [Accessed 31
January 2018].
Stout, P. (2016) Global CloudActive-Active and
Beyond. Netflix TechBlog, 30 March [Online blog].
Available from: <https://medium.com/netflix-
techblog/global-cloud-active-active-and-beyond-
a0fdfa2c3a45> [Accessed 15 February 2018].
Tang, L. (2012) Facebook’s Large Scale Monitoring
System Built on HBase [Online]. Presented at: Strata
Conference + Hadoop World, October 24, 2012, New
York, NY, USA.
Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K.,
Patel, J. M., Kulkarni, S., Jackson, J., Gade, K., Fu, M.,
Donham, J., Bhagat, N., Mittal, S. and Ryaboy, D.
(2014) Storm@Twitter. In: Proceedings of the 2014
ACM SIGMOD International Conference on
Management of Data, 2014. New York, NY, USA:
ACM, pp. 147–156.
Tse, E., Luo, Z. and Yigitbasi, N. (2014) Using Presto in
Our Big Data Platform on AWS. The Netflix Tech Blog,
10 July [Online blog]. Available from:
<https://medium.com/netflix-techblog/using-presto-in-
our-big-data-platform-on-aws-938035909fd4>
[Accessed 12 February 2018].
Twitter, Inc. (2018) Scalding: A Scala API for Cascading
[Online]. Twitter, Inc. Available from:
<https://github.com/twitter/scalding> [Accessed 12
February 2018].
Updates on Netflix’s Container Management Platform
(2017) Netflix TechBlog, 14 November [Online blog].
Non-functional Requirements for Real World Big Data Systems - An Investigation of Big Data Architectures at Facebook, Twitter and
Netflix
839
Available from: <https://medium.com/netflix-
techblog/updates-on-netflixs-container-management-
platform-a91738360bd8> [Accessed 16 February
2018].
Vergilio, T. and Ramachandran, M. (2018) PaaS-BDP - A
Multi-Cloud Architectural Pattern for Big Data
Processing on a Platform-as-a-Service Model. In:
Proceedings of the 3nd International Conference on
Complexity, Future Information Systems and Risk -
Volume 1: COMPLEXIS, ISBN 978-989-758-297-4,
pp. 45-52.
Woodie, A. (2017) A Peek Inside Kafka’s New ‘Exactly
Once’ Feature. Datanami, 7 March [Online blog].
Available from: <https://www.datanami.com/
2017/07/03/peek-inside-kafkas-new-exactly-feature/>
[Accessed 14 February 2018].
Wu, S., Wang, A., Daxini, M., Alexar, M., Xu, Z., Patel, J.,
Guraja, N., Bond, J., Zimmer, M. and Bakas, P. (2016)
The Netflix Tech Blog: Evolution of the Netflix Data
Pipeline [Online]. Available from:
<http://techblog.netflix.com/2016/02/evolution-of-
netflix-data-pipeline.html> [Accessed 30 October
2016].
SE-CLOUD 2018 - Special Session on Software Engineering for Service and Cloud Computing
840