We can briefly summarize this model by its
characteristics:
- Workload (or task) creation is offloaded from
centralized servers to the peers; each workload starts
as a whole, and it is split based on other peer’s
requests for computation while also accounting for
their performance. In this case, the application itself
is responsible for splitting the workload, which
allows for multiple types of workloads, offering
better load balancing.
- The completed child workload result is
transferred to the parent task. The dissemination
process is also controlled by the application itself.
- Gateway(s) are used for the overall workload
injection and result collection.
- Computation of a workload can be suspended,
transferred to a different node and resumed.
- Workload locations can be queried by the
application for initiating communication between
them. The connection is opened directly to the target
node or via a super-peer if the target is behind NAT.
- The use of periodic remote checkpoints allows
the recovery of a partially computed workload, which
then can be resumed and even further split. The
application can choose whether to wait or recover a
workload.
To achieve the above objectives, the previously
presented model extends the notion of workload unit
with additional data fields: unique identifier (ID),
application identifier, parent and children identifier
(for result merging), checkpoint data (allows the
transfer and continuation of computation), boundary
information (identify data-set boundaries; if
required), estimated total and remaining
computational effort, state of computation, result data
(contains the partial result of the workload) and
metadata (for any application-specific use).
For ease of processing, transfer with minimal
bandwidth usage, and storage, the above fields are
incorporated into a single data-object represented as
a JSON structure, named Workload Object or WOB
in short. Furthermore, WOB size is to be kept at
minimal, therefore large analyzable data-sets are to be
acquired separately.
The concept of WOB based computing allows the
system to incorporate in-house computer networks
with volunteer provided resourced and Cloud-based
VMs. Furthermore, the possibility to track WOBs
opens possibility to initiate communication between
the parallel branches of an application. Due to
arbitrary latency between nodes, such application
design must be latency tolerant.
3.1 Network Topology Design
As mentioned before, P2P networks create a virtual
topology over the physical one. Starting from the base
model of the system, we need a search protocol to
query each WOB and a distributed storage for the
WOB backups. There are a variety of
implementations for both distributed search and
storage, however, in the author’s opinion, having two
more virtual overlays on top of the one created by the
middleware, would significantly increase the
system’s complexity.
DHT based systems have proven to be the fastest
when it comes to search, especially when dealing with
rare resources. In the presented system, each WOB
can be considered rare; however, due to constant
migration, creation, and dissemination of them,
especially with a large number of objects, the number
of update messages may significantly surpass those of
query messages. This can lead to significant
bandwidth and resource consumption to keep the
DHT up-to-date. It must be noted, that the above
statement is an assumption and was not tested
experimentally, however, to the author’s best
knowledge, how DHTs behave with extremely
frequent keys and value changes was not examined in
the literature.
WOB creation is driven by workload request
messages and therefore must reach all nodes. The
number of these messages can be problematic as a
WOB nears completion, in which case as more and
more nodes become starved, a huge number of
messages can overload and cripple the network.
Clustering of the nodes offers a pretty straightforward
solution: a cluster that contains starved nodes has no
reason to accept any workload request messages.
Furthermore, if a super-peer is aware that no more
workload is available, they can stop all outbound
request messages to other clusters. However, if more
workload becomes available (e.g. a new WOB is
injected via the gateway), workload notification
messages can unlock clusters and trigger the idle
nodes to request workload.
Several topologies, other than the previously
mentioned ones, have been proposed and
demonstrated to be resilient and efficient in resource
location, such as the AFT (Poenaru, 2016). To best of
the author’s knowledge, besides clustering with
super-peers, none can offer a cutoff solution to the
above-mentioned message flooding problem.
Furthermore, super-peers can act as a tracker of the
WOBs located within their cluster, thus search
queries can be limited only to them. A similar
proposal was made by Chmaj and Walkowiak (2013),