cessor), HEFT (Heterogeneous Earliest Finish Time)
(Topcuoglu H., 2012), HCPT (Heterogeneous Critical
Parent Trees) (Hagras and J., 2003) .
Meta-heuristic algorithms imply the implementa-
tion of evolutionary and swarm approaches. Such al-
gorithms include: Genetic Algorithm (Nagar et al.,
2018), Gravitational Search Algorithm (Choudhary
et al., 2018), Simulated Annealing or Particle Swarm
Optimization (Masdari et al., 2017) etc. Schemes of
evolutionary and swarm algorithms can be developed
directly for the Scheduling problem. Since meta-
heuristic algorithms require significantly more time
to obtain a solution, hybrid schemes are often built
to make up this drawback by integrating heuristic al-
gorithms at the initialization stage of meta-heuristic
algorithms.The main advantages of meta-heuristic al-
gorithms are better quality of resulting schedules and
much wider range of applicability, but meta-heuristic
algorithms require more time to search for quality so-
lutions.
Despite the fact that hybrid schemes provide ac-
cess to the positive qualities of heuristic and meta-
heuristic algorithms, it is necessary to quickly and
efficiently solve problems based on a wide range of
calculations using a computing system, including the
wide heterogeneity of computational models and re-
sources. Promising for solving the scheduling prob-
lem are machine learning methods, in particular, re-
inforcement learning, because these methods can as-
similate system monitoring data, predict the density
of the computational load, and also be able to tune
performance models for the most accurate assessment
of the characteristics and indicators of data, computa-
tional models, resources and the quality of the result-
ing solutions in general.
There is a number of papers ((Hussain A.,
2016), (Ismayilov G., 2020), (Vukmirovi
ˇ
c S., 2012),
(Xiao Z., 2017)) devoted to the use of machine learn-
ing approaches for solving secondary tasks, such as
refining the assessment of individual tasks according
to historical data, predicting changes in workload, re-
fining the assessment of time of new task arriving.
But also there are papers, where machine learning
approach is used for solving Scheduling problem di-
rectly (Yao J., 2006) using Reinforcement Learning
(RL) method. In this paper RL is used for specific
Scheduling problem - Scheduling in Grid-systems.
In work (Rashmi S., 2017) RL approach is used for
scheduling tasks in MapReduce. Authors of the most
promising algorithm DQTS in paper (Tong Z., 2019)
consider more general formulation of the problem
and use one of the RL techniques - Deep Q-learning
method to get results with higher metrics (makespan
and nodes load), than baseline (MAXMIN, MIN-
MIN, FCFS algorithms) solution. But most of the pa-
pers use only basic information about computing sys-
tem (number of tasks and quantity of resources) and
these problem leads to less flexible scheduling sys-
tem. But compared with heuristic and meta-heuristic
algorithms, their advantages include: the ability to
use the accumulated experience of running various
WF, including changing conditions based on launch
statistics; the ability to learn patterns in WF struc-
tures, resources and their combination to provide a
much faster search for solutions than meta-heuristic
algorithms and better than heuristics; the ability to au-
tomatically adapt to changing conditions, the nature
of the load and its modes of receipt, developing new
scheduling strategies, which is difficult or even im-
possible for heuristics and meta-heuristics methods.
Stream. Compared to batch processing, stream pro-
cessing is characterized by the continuous flow of new
data that require immediate processing. This creates
the need for the simultaneous operation of all appli-
cation operators. Due to continuity, the final amount
of data to process cannot be determined. However,
you can predict the density of the incoming comput-
ing load, and the resulting system load estimates can
be taken into account when planning a streaming ap-
plication. High-quality forecasting plays an impor-
tant role, because of this, it becomes possible to ade-
quately respond to changes by scaling (up and down)
computing resources and achieve elasticity of calcu-
lations (the minimum difference between the need for
resources and the amount of allocated resources).
Optimal configuration of the platform and appli-
cation can increase data throughput, reduce latency
and power consumption. Choosing the optimal num-
ber of nodes, the correct platform parameters, and the
optimal distribution of application operators among
computing nodes can ensure maximum system per-
formance. Effective planning of streaming data pro-
cessing can lead to increased productivity, resource
utilization or reliability of the system as a whole,
depending on the requirements of both users and
providers.
For Stream data processing is widely used plat-
forms such as Apache Storm, Spark Streaming, Flink,
S4. However, the most part of algorithms are aimed
for Storm (Peng B., 2015)(Agarwalla B., 2006)(Xu J.,
2014).
Based on the analysis of the methods of schedul-
ing streaming computing, it was revealed that at
present this direction is at the development stage.
The existing methods use explicitly available infor-
mation and are not able to fully take into account
the dynamics of changes and the incompleteness of
ECTA 2020 - 12th International Conference on Evolutionary Computation Theory and Applications
202