maintenance; or performing corrective maintenance
(it is the only choice if the machine fails). To solve
the underlying model, we adopt a Q-learning
algorithm for SMDP.
The rest of the paper is organized as follows. In
Section 2, we review the relevant literature. The
SMDP formulation is presented in Section 3. Then, in
Section 4, we present the Q-learning algorithm
adopted for our problem. In Section 5, a numerical
example is given. Then, concluding remarks are
provided in Section 6.
2 LITERATURE REVIEW
The integration of lot-sizing and time-based preventive
maintenance has been extensively studied by many
researchers (Aghezzaf et al., 2007; Ben Daya and
Makhdoum, 1998; Ben-Daya, 2002; El-Ferik, 2008;
Liao and Sheu, 2011; Suliman and Jawad, 2012;
Shamsaei and Vyve, 2017). In recent years, integrated
EPQ and CBM-based preventive maintenance models
have been proposed with the aim of optimizing the lot-
size and the degradation threshold, beyond which
preventive maintenance is conducted. Jafari and Makis
(2015) address the joint optimization of EPQ and
preventive maintenance policy. The deterioration of
the system is modelled by a proportional hazards
model that considers the condition monitoring
information and the age of the machine. Peng and van
Houtum (2016) develop a joint optimization model of
EPQ and CBM in which degradation is modelled as a
continuous time and continuous state stochastic
process. Khatab et al. (2019) investigate the problem
of integrating production quality and CBM for a
production system under periodic monitoring. Cheng
et al. (2018) develop a model to optimize production,
quality control and CBM policies for a system in which
product quality depends on the degradation level. Jafari
and Makis (2016) propose a model to jointly optimize
EPQ and preventive maintenance policy for a partially
observable two-unit system. Cheng et al. (2017)
consider joint optimization of production lot-sizing and
CBM for systems with multiple products. Preventive
maintenance decision making depends on the
predictive reliability and the structural importance
measure of the components.
Fewer studies, however, develop integrated
production and maintenance policies for systems with
stochastic demand. To find the optimal policy for
systems with stochastic demand, MDP and SMDP
models are proposed (Iravani and Duenyas, 2002;
Sloan, 2004; Jafari and Makis, 2019; Xiang et al.,
2014). These studies assume that the system produces
a single product type and the degradation is modelled
by a Markov chain with a limited number of states. In
this study, however, we propose a joint production
and CBM policy for a multi-product production
system with random product demands.
Darendeliler et al. (2022) has recently studied
joint optimal production/inventory and CBM control
for a multi-product manufacturing system under
stochastic product demands. It is assumed that the
system is reviewed at equidistant time points, so the
durations of producing a lot and maintenance are
assumed to be equal. The present paper relaxes this
assumption and extends the work by modelling the
problem as a SMDP, in which the system is reviewed
at the completion of a unit production, setup and
maintenance. Also, the previous work does not take
the production setup times into account, while they
are incorporated in the present model.
In literature, the problem of planning the lot-size
and sequence of several products on a single machine
with random product demands is known as the
stochastic economic lot scheduling problem
(SELSP). In the SELSP, the objective is to find a
policy that proposes whether to continue the
production of the current item, whether to switch to
another product or whether to keep the machine idle
so as to minimize the total expected average cost.
Obtaining such a policy, which dynamically
distributes the finite production capacity among the
products to be reactive to the stochastic demands,
processing and setup times, is a challenging problem
(Sox et al., 1999). Winands et al. (2011) categorize
SELSPs based on their sequencing and lot-sizing
strategies. Our model’s production policy could be
considered in the category of dynamic sequence and
global lot-sizing, in which there is no predetermined
production sequence, and the quantity of the lot-size
depends on the stock levels of all products and the
machine status rather than depending only on the
stock level of the product currently setup. The
majority of the SELSP models do not consider the
effect of equipment deterioration and maintenance on
the production policies. However, in this study, we
incorporate CBM policy in the SELSP problem.
There are few studies that consider dynamic
sequencing and global lot-sizing for the SELSP. Qiu
and Loulou (1995) model the SELSP as a SMDP and
solve limited-size problems by the successive
approximation method. Wang et al. (2012) apply two
reinforcement learning algorithms to the SELSP with
the random demand and processing times. Löhndorf
and Minner (2013) propose an approximate value
iteration method and compare its performance with the
global search for parameters of simple control policies.