Mind (Mnih et al., 2015) researchers created a method
that combines both RL and DL, called DQL. This
technique estimates the Q-value of all possible actions
for a given state using deep neural networks. The
combination between RL and DL arise strong corre-
lations between subsequent iterations. To overcome
this issue, DQL provides a solution called Experi-
ence Replay, that stores the agent’s past experiences
into a dataset, where each experience is a four-tuple,
(s
t
, a
t
, r
t
, s
t+1
), and extract a random mini-batch of
these experiences to update the Q-values. To break
the strong correlation, DeepMind (Mnih et al., 2015)
researchers also propose a solution that is based on
duplicating the Q-network, creating a copy called tar-
get network. The difference between the two copies
are their parameters (weights) and how they are up-
dated. While Q-network’s parameters are trained, tar-
get network’s parameters are periodically synchro-
nized with the Q-network’s parameters. The idea is
that using the target network’s values to train the Q-
network will decorrelate the action-value with the tar-
get value and improve the stability of the network.
The agent selects an action according to an ε-
greedy policy, which is represented via a DNN. Then,
the same interaction between the agent and the envi-
ronment, as in Q-learning method, occurs.
3 RELATED WORK
3.1 Collaborative Task Allocation
Approaches
Research efforts on collaborative robots and tasks
have been multiple in the past few years. Zhang et al.
(Zhang et al., 2022) have designed a Human-Robot
Collaboration (HRC) framework to assemble a prod-
uct. The authors implement a dual-agent algorithm
that extends the DDPG algorithm, where the actor-
network of each agent processes the current state
and outputs their own actions, and subsequently, the
critic-network evaluates and calculates the value of
the taken actions. The reward function is based on
the difficulty and time each agent needs to complete
the collaborative assembly task, with a structure tree
to represent the assembly task.
Liu et al. (Liu et al., 2021) proposed a novel
training approach adapted from the DQL method and
called DADRL. This algorithms considers both the
robot and the human as agents, where the robot is
trained to learn how to make decisions, while the hu-
man agent represents the real human in HRC, demon-
strating the dynamic and stochastic properties of the
task. The aim is to teach the robot to be capable of
decision-making and task-planning ”under the uncer-
tainties brought by the human partner using DRL”, as
the authors explain (Liu et al., 2021).
Yu et al. (Yu et al., 2021) formulate a human-robot
collaborative assembly task as a chessboard game
with specific assembly constraints determined by the
game rules. The robot is trained with an algorithm
based on DRL, where a CNN is used to predict the
distribution of the priority of move selections and to
know whether a working sequence is the one resulting
in the maximum of the HRC efficiency.
Zhang et al. (Zhang et al., 2022), Liu et al. (Liu
et al., 2021) and Yu et al. (Yu et al., 2021) have shown
that DQL and DDPG-based algorithms are efficient in
solving decision-making and sequence planning re-
ducing problems and are efficient in complex tasks.
Most of these works, however, do not approach graph
structures and the advantages they might bring to col-
laboration in manufacturing sectors. Liu et al. (Liu
et al., 2021) use a chessboard setting, but they assume
this structure has limitations in representing some task
constraints and relations.
3.2 Graph Structures for Task
Allocation
In an assembly process, a task can be divided into
smaller assembly subtasks, some of which are suit-
able to be performed by a robot and others are more
complex and consequently require an operator to per-
form them (Zhang et al., 2022). Given this, building
a graph representing the possible sequences of oper-
ations needed to complete a process, the operation’s
actor and other sufficient information about the pro-
cess, is essential for an operator to perceive the task
easily and for the robot to be efficient and effective in
its assistance.
Mangin et al. (Mangin et al., 2022) developed a
system that uses HTM to share information between
the human and the robot and enable transparent com-
munication and mutual understanding. This high-
level hierarchical task representations is then trans-
formed into low-level policies to teach the robot to
successfully assist the human with supportive actions.
Murali et al. (Murali et al., 2020) proposed an
architecture that aims to teach a robot to adapt re-
actively to unpredictable human operators’s actions,
while minimizing the overall process time. A collabo-
rative task is represented as an ”and/or” graph, where
nodes correspond to states and hyper-arcs represent
the possible actions the agent can perform to reach a
particular state.
Yu et al. (Yu et al., 2020) map a human-robot col-
laborative assembly task into a chessboard game with
Dynamic Task Graphs for Teams in Collaborative Assembly Processes
19