construction. In this situation, the life-span of an
agent should be relatively long, lasting at least until
any reconfiguration of the factory system is required
or a change occurs in its operating environment. This
study will be focused on factory based construction
manufacture, specifically for precast reinforced
concrete (PRC) component production.
Optimization of customized PRC component
production has been considered by several
researchers (Leu & Hwang, 2001; Chan & Hu, 2002;
Benjaoran & Dawood, 2005), using genetic
algorithms (GAs) to improve production
performance. Although the approach was shown to
be successful, heuristic search methods such as GAs
are computationally expensive. Therefore, they are
not well suited to situations where decisions have to
be made quickly.
RL solutions based on a learned model, such as
that developed by Shitole et al. (2019), will generate
rapid solutions to a decision problem, once trained. A
number of authors have applied this method to the
control of factory operations (Waschneck et al., 2018;
Zhou et al., 2020; Xia et al., 2021) and found results
to be promising when compared to more conventional
approaches such as rules-of-thumb. Unfortunately,
applications have been outside construction
manufacturing, and therefore do not address many of
the challenges of this industry, although Waschneck
et al. (2018) did address the problem of customization
within the semiconductor industry.
The objective of this paper is to explore the
potential of RL based modelling as a means of
controlling factory based construction
manufacturing, given the unique demands of
construction projects.
2 DYNAMIC SYSTEM CONTROL
2.1 Decision Agents
The future path of a construction manufacturing
system is determined by both controllable and
uncontrollable events. The controllable events
provide an opportunity to steer this path along a line
that is favourable to the manufacturer, optimizing
performance in terms of, say, productivity and/or
profit. This is achieved through the selection of an
appropriate sequence of decisions wherever options
exist. Examples of such decisions include prioritizing
jobs in a queue, deciding when to take an item of
equipment offline for maintenance, and selecting the
number of machines to allocate to a process.
These decisions are made by one or more agents,
as illustrated in Figure 1, which operate dynamically
throughout the life of the system. An agent monitors
relevant variables defining the state of the system and
its environment (both current and possibly past states,
and even predictions about future states) then uses
these insights to decide on appropriate future actions
to implement. Typically, these actions will concern
events in the immediate future (given that the most
relevant, accurate, and valuable information is
available at the time of the decision) but can also be
applied to events later in the future for decisions that
have a long lead time.
Figure 1: Decision agent control of dynamic system.
An important dichotomy of decision agents is
search based versus experience based systems.
Search based agents, which include blind and
heuristic methods, use a systematic exploration of the
solution space looking for the best action attainable.
They tailor a solution to the specific instance of the
problem at hand. As such, they may find better
optimized solutions than experience based agents,
although that needs to be tested. Search based agents
are also highly extensible, meaning they can be easily
adapted to new versions of the problem. On the
downside, they can be computationally expensive and
thus not suited to situations requiring rapid decision
making.
In contrast, experience based agents, which
include rules-of-thumb and artificial neural networks
(ANNs), make decisions based on exposure to similar
situations from the past. Once developed, an
experience based agent can output decisions rapidly.
However, because the solutions they offer are generic
rather than tailored to each situation, their decisions
may not be as well optimized as those of search based
agents. Furthermore, experience based agents tend to
lack extensibility; each new version of the problem
requires redevelopment of the agent, which in turn
requires the acquisition and assimilation of large
volumes of new information on system behaviour.