Enhancing Circularity in Medical Device Supply Chains by Optimizing
EoL Decisions Through Reinforcement Learning: A Multi-Objective
Approach
Soufiane El Bechari
1
, Oualid Jouini
1 a
, Zied Jemai
1,2
, Fourat Trabelsi
1
and Robert Heidsieck
3
1
Industrial Engineering Laboratory (LGI) CentraleSup
´
elec, Paris-Saclay University, Gif-sur-Yvette, France
2
OASIS - ENIT, University of Tunis, Elmanar, BP37, Le Belvedere, Tunisia
3
GE HealthCare, 283 Rue de la Mini
`
ere, 78530 Buc, France
{soufiane.el-bechari, oualid.jouini, zied.jemai}@centralesupelec.fr, fourat.trabelsi@student-cs.fr,
Keywords:
Multi-Component Systems, Circular Supply Chain, EoL Management, Reinforcement Learning,
Multi-Objective Approach.
Abstract:
Circular supply chains are becoming essential in the pursuit of sustainability, as they promote the responsible
disposal, recycling, and reuse of products at the end of their life cycles. This research, developed in collabora-
tion with GE HealthCare, presents a multi-objective optimization framework that incorporates environmental,
economic, and circularity performance in end-of-life (EoL) decision-making. The proposed model leverages
historical data on reuse and recycling success rates to capture the operational realities of circular supply chains.
By employing Q-learning, this paper aims to develop a decision-support mechanism that optimizes EoL ac-
tions for components, thereby enhancing the circularity, reducing carbon footprint, and minimizing economic
costs within the circular supply chain.
1 INTRODUCTION
The increasing emphasis on Sustainable and Cir-
cular Supply Chain Management (SCSCM) within
the healthcare sector reflects a critical acknowledg-
ment of the environmental and economic impacts
of medical supply chain practices. This shift is
driven by the need to reconcile healthcare opera-
tions with sustainability goals, as the sector is re-
sponsible for significant waste generation and carbon
emissions, contributing to global environmental chal-
lenges (D’Alessandro et al., 2024). For instance, the
healthcare sector is responsible for around 4.6 % to 5
% of global greenhouse gas (GHG) emissions (Pich-
ler et al., 2019; Romanello et al., 2023), equivalent to
2 billion carbon dioxide equivalent (CO2e). Given the
significant impact of the healthcare sector on climate
change, there have been a number of policy initiatives
aimed at reducing the environmental footprint, most
notably through the NHS’s ”Delivering a Net Zero
National Health Service” strategy. the NHS, as one
of the world’s largest healthcare systems, has set tar-
gets to achieve net-zero emissions by 2040 for emis-
a
https://orcid.org/0000-0002-9498-165X
sions under its direct control, and by 2045 for those it
can influence indirectly, such as those from the supply
chain and patient travel (NHS, 2022). Additionally,
major healthcare companies like GE HealthCare are
aligning their sustainability goals with these broader
initiatives. GE HealthCare has set goals to reduce op-
erational GHG emissions (Scope 1 and 2) by 42%
and Scope 3 emissions by 25% by 2030, as part of
their commitment to reaching net zero by 2050 (GE
Healthcare, 2023). The intersection of healthcare and
environmental sustainability is becoming increasingly
prominent as the global efforts faces the dual chal-
lenges of delivering quality healthcare while combat-
ing climate change and minimizing waste. In this con-
text, the medical supply chain plays a pivotal role in
addressing environmental concerns, especially in ef-
forts to reduce carbon emissions and waste generation
(Abaku and Odimarha, 2024). The medical supply
chain, essential for delivering healthcare products like
pharmaceuticals and medical devices, is a highly in-
tricate system that significantly impacts the environ-
ment. Due to the specific nature of medical equip-
ment, significant efforts have been made in design,
operations, and supply chain management to maintain
88
El Bechari, S., Jouini, O., Jemai, Z., Trabelsi, F. and Heidsieck, R.
Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A Multi-Objective Approach.
DOI: 10.5220/0013122400003893
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 14th International Conference on Operations Research and Enterprise Systems (ICORES 2025), pages 88-99
ISBN: 978-989-758-732-0; ISSN: 2184-4372
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
operating conditions and support circular economy
principles to reduce environmental impact. But still a
need for further innovations and research, particularly
in optimizing the supply chain of medical equipment
through the integration of circular economy practices
to enhance both circularity and sustainability within
the healthcare sector.
Unlike traditional linear supply chains, which fol-
low a take-make-dispose’ approach, circular supply
chains aim to ”integrate circular economy thinking
into supply chain management and its surrounding
industrial and natural ecosystems” (Farooque et al.,
2019). A ”circular economy is a regenerative eco-
nomic system which necessitates a paradigm shift to
replace the ‘end of life’ concept with reducing, alter-
natively reusing, recycling, and recovering materials
throughout the supply chain, with the aim to promote
value maintenance and sustainable development, cre-
ating environmental quality, economic development,
and social equity, to the benefit of current and future
generations” (Kirchherr et al., 2023). This circular
supply chains aim to extend the life of products, com-
ponents and materials through CE strategies such as
reuse, recycling, and remanufacturing. These circular
actions allow businesses to reduce their dependency
on virgin materials, minimize waste, and lower their
overall environmental footprint.
One of the primary challenges in implementing
circular supply chains is assuring effective returns
and managing the EoLs of products by determining
whether to reuse, recycle, or dispose of returned used
products and components. Each of these decisions
carries circular, environmental, and economic impli-
cations. For example, reusing components reduces
the need for new materials but may be constrained
by technical or quality limitations. Recycling can re-
cover valuable materials, but the associated energy
and costs may be significant. Finally, disposal results
in increased waste and environmental impact, but in
some cases, it may be the only viable option if reuse
or recycling is not feasible or not the most pertinent
and appropriate solution.
Moreover, circular supply chains are often
complex and involve multiple components, each
with unique environmental and economic profiles.
Decision-making in this context requires careful con-
sideration of trade-offs between minimizing environ-
mental impact, reducing costs, and maximizing circu-
larity (i.e., the proportion of products that are success-
fully reused or recycled). To the best of our knowl-
edge, current models in circular supply chain manage-
ment often fail to fully integrate these multiple objec-
tives and do not incorporate real-world success rates
for reuse and recycling, leading to unrealistic expec-
tations about circularity potential.
This study addresses this challenge by developing
a decision-support mechanism utilizing Q-learning, a
reinforcement learning (RL) technique, to optimize
the management of EoL components in a CSC. The
model operates at the component level, enabling real-
time decision-making for reuse, recycling, and dis-
posal actions, based on component-specific parame-
ters and performance metrics.
The rest of the paper is organized as follows. Sec-
tion 2 provides an overview of the state-of-the-art. In
Section 3, the problem formulation and modeling are
presented, along with a detailed description of the
proposed algorithm to solve the problem. Section
4 introduces an industrial case study based on real-
world data, including results and analysis. Finally,
Section 5 concludes the paper and highlights future
perspectives.
2 LITERATURE REVIEW
As discussed in the introduction, the primary con-
tributions of this paper are in the domains of circu-
lar supply chain and EoL management, and multi-
objective decision-making. Specifically, we focus on
identifying the most effective strategies for optimiz-
ing reuse, recycling, and disposal actions within com-
plex multi-component circular supply chains. In the
following section, we review the key streams of liter-
ature in these areas, including circular supply chains,
and EoL management. We then position the contribu-
tions of this paper within the broader context of exist-
ing research, highlighting the novelty of our approach
in balancing economic, environmental, and circularity
objectives across multiple components.
2.1 Circular Supply Chains and EoL
Management
The integration of Circular Economy (CE) princi-
ples into Supply Chain Management (SCM) has been
widely referred to as Circular Supply Chain Manage-
ment (CSCM) in the literature (Genovese et al., 2017;
Nasir et al., 2017; Farooque et al., 2019). CSCM
encompasses various definitions, each emphasizing
the role of CE in reshaping supply chain activities
(Lahane et al., 2020). According to Farooque et al.
(2019), ”Circular supply chain management is the in-
tegration of circular thinking into the management of
the supply chain and its surrounding industrial and
natural ecosystems. It systematically restores tech-
nical materials and regenerates biological materials
Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A
Multi-Objective Approach
89
toward a zero-waste vision through system-wide in-
novation in business models and supply chain func-
tions from product/service design to EoL and waste
management, involving all stakeholders in a prod-
uct/service lifecycle, including parts/product manu-
facturers, service providers, consumers, and users”.
The adoption of CE practices within supply chains
offers several key benefits (Lahane et al., 2020), in-
cluding enhanced resource availability (Goyal et al.,
2018), improved EoL strategies (de Sousa Jabbour
et al., 2019), enriched value propositions (Mishra
et al., 2018), reduced waste generation (Herczeg et al.,
2018), and improved sustainability (Winkler, 2011).
However, the transition from traditional, linear
supply chain models—characterized by the ”take-
make-dispose” framework—towards circular systems
presents several significant challenges. These chal-
lenges, as noted by Roy et al. (2022), include a per-
sistent industrial preference for linear models, com-
pounded by feasibility concerns surrounding CE im-
plementation (Tura et al., 2019; Agyemang et al.,
2019), and the absence of robust performance mea-
surement systems (Tura et al., 2019). Additionally,
higher upfront costs related to circular business mod-
els (Vermunt et al., 2019) and complex product de-
signs that hinder the ease and cost-effectiveness of
recycling, reuse, or remanufacturing (Khodier et al.,
2018; Rosa et al., 2019; Halse and Jæger, 2019) re-
main significant barriers to wide-scale adoption.
Moreover, the absence of standardized circular
economy processes and metrics across industries hin-
ders cross-industry implementation of circular mod-
els (Govindan and Hasanagic, 2018; Mangla et al.,
2018; Ranta et al., 2018; Bouzon et al., 2018). This
gap, combined with the lack of widely accepted met-
rics and indicators to measure circularity performance
(Kravchenko et al., 2019; Bressanelli et al., 2019),
limits the scalability and effectiveness of CSCM.
The healthcare sector, in particular, presents
unique challenges for implementing circular econ-
omy principles due to stringent regulatory standards
and the critical importance of hygiene. Research
highlights significant barriers to circularity in medi-
cal equipment, including perceived safety risks, reg-
ulatory complexities, and financial constraints related
to medical device design (Hoveling et al., 2024).
Beyond these general barriers, the complexity in-
tensifies when it comes to managing the EoL of multi-
component products, such as electronics and medical
devices in CSC. In these cases, each component may
have its own unique lifecycle and recovery potential
for reuse or recycling. Han et al. (2021) highlight
the critical need for component-level analysis to de-
termine the optimal EoL strategy, which should con-
sider factors such as regional greenhouse gas (GHG)
emissions and market prices for resale. This is par-
ticularly relevant in assembly-based products, where
entire units are often discarded as waste, even though
some components retain value that could be recovered
through reuse or recycling (Kinoshita et al., 2016).
The process of disassembling complex products
into individual components for reuse, recycling, or
disposal presents opportunities to prevent the unnec-
essary consumption of virgin materials and reduce
GHG emissions (Hasegawa et al., 2019) . These
actions contribute significantly to the circularity of
supply chains by maximizing the lifecycle of each
component. However, the disassembly process can
be resource-intensive, requiring significant labor and
cost, making it essential to anticipate the potential
outcomes of reuse, recycling, and disposal before im-
plementation. As Hasegawa et al. (2019) point out,
simulating these decisions is crucial to optimize EoL
management in multi-component systems, allowing
for the efficient allocation of resources and the reduc-
tion of environmental impact.
Despite the growing body of research on CSCM,
there remains a significant gap in the literature con-
cerning EoL management and its broader impact on
the effectiveness of circular supply chains. Specif-
ically, limited attention has been paid to quanti-
tative models that simulate EoL decision-making
for multiple components, considering factors such
as component-specific reuse and recycling poten-
tial, cost, and environmental impact within a multi-
objective approach. Addressing this gap, the present
study proposes a quantitative model that simulates cir-
cular supply chain decisions at the component level,
optimizing the actions of reuse, recycling, and dis-
posal. The model developed here provides decision
support for EoL scenarios involving multiple compo-
nents within a single product, offering a more gran-
ular understanding of how different EoL strategies
affect circularity, cost, and environmental outcomes.
By adopting a detailed, component-by-component ap-
proach, this study enhances decision-making in circu-
lar supply chains, providing a decision-support mech-
anism for addressing the inherent complexities of
multi-component EoL management in CSC.
3 PROBLEM DESCRIPTION AND
MODELING
In this paper, the term ”product” or ”device” refers
to spare parts that can either be used in the produc-
tion of new large-scale equipment or for maintenance
services to support an installed base, ensuring the op-
ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems
90
erational continuity of the equipment. We focus on
a circular supply chain (CSC) that integrates suppli-
ers, manufacturers, customers (installed base), and re-
verse logistics for the take-back of defective products.
These defective products are typically returned from
the field and are not directly repairable or reusable in
their entirety.
Figure 1: Circular supply chain model and its characteris-
tics.
As shown in Fig. 1, the circular supply chain
model consists of various stages including disassem-
bly, reuse, recycling, and disposal. Upon return, de-
fective products are disassembled to extract reusable
components. These components are then cleaned, in-
spected, and tested to determine whether they meet
the required specifications for reuse. Components that
pass inspection are classified as ”qualified as new”
and are reintroduced into the production process for
manufacturing new products.
In this circular supply chain, the demand from
manufacturer must be satisfied either through the
closed-loop system—by reusing or recycling compo-
nents—or by procuring new materials or components
from suppliers. The goal is to maximize the use of
reused and recycled materials. However, when the
reuse or recycling process cannot meet the required
demand, procurement from suppliers is necessary to
ensure production continuity. For components that do
not meet reuse requirements, two options are consid-
ered. If the components are feasible for recycling and
the recycling process is legally compliant, the mate-
rials are recycled and used in the production of new
products, forming a closed-loop system. If recycling
is not feasible, either due to design limitations or le-
gal restrictions, the components are disposed of by a
third-party company, incurring an additional disposal
cost.
Moreover, for components that are non-reusable
by design, the same decision-making process applies.
If recycling is possible, the materials are processed
accordingly. If recycling is not an option, these com-
ponents are handled by a third-party disposal ser-
vice, which incurs a disposal cost. As depicted in
Fig. 2, a decision tree outlines the methodology for
managing each component’s EoL on a component-by-
component basis.
Figure 2: A decision tree for managing components’ EoL
in a component-by-component methodology.
Thus, for each component within the circular sup-
ply chain, a decision must be made between three EoL
options: reuse, recycling within a closed-loop system,
or disposal by a third-party.
The aim of this research is to develop a decision-
making framework for managing returned compo-
nents in a circular supply chain using a component-
by-component methodology, specifically focusing on
optimizing these EoL decisions for each component.
The complexity of the problem arises from the vari-
ability in return flows, the varying success rates for
reuse and recycling, and the economic and environ-
mental considerations for each component.
3.1 Agent-Based Modeling of Circular
Supply Chain
To apply the reinforcement learning (RL) mechanism
to the circular supply chain (CSC) described in this
work, it is necessary to formulate the problem as
an RL model. As previously discussed, RL mod-
els are implemented within an agent-based frame-
work, where each component acts as an independent
decision-making entity. The first step in this approach
is to model each component and process in the CSC as
a multi-agent system. Subsequently, the RL problem
Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A
Multi-Objective Approach
91
is defined within this designed agent-based frame-
work.
A circular supply chain involves various opera-
tions—reuse, recycling, and disposal—each of which
must be managed efficiently to minimize environ-
mental and economic costs while maximizing cir-
cularity. In the real world, each component must
autonomously make decisions regarding whether to
reuse, recycle, or dispose of itself based on its state
and the current system dynamics. These autonomous
decisions are key to improving the overall perfor-
mance of the supply chain by considering carbon
footprint reduction, cost minimization, and resource
circularity. However, the decisions made by individ-
ual components must also be coordinated with the
overall system objectives to optimize the global per-
formance of the CSC.
Figure 3: Agent-based framework of circular supply chain
EoL management system.
As shown in Fig. 3, the agent-based model treats
each component in the CSC as an agent. Each
component-agent is responsible for making real-time
decisions regarding its EoL treatment—reuse, recy-
cle, or dispose—based on the observed state. These
agents interact with the CSC system through a Q-
learning-based RL mechanism, where each agent in-
dependently learns to optimize its decisions over time
based on feedback (rewards) received from the envi-
ronment.
3.2 RL Modeling of EoL Management
Problem in the Circular Supply
Chain
In this subsection, we define the characteristics of the
reinforcement learning (RL) model used to solve the
circular supply chain (CSC) decision-making prob-
lem. Key elements of the RL model include the state
variable, reward function, value function, and system
policy. These components work together within a Q-
learning framework to guide agents (components) in
deciding between reuse, recycling, and disposal, with
the ultimate goal of minimizing the overall carbon
footprint, reducing economic costs, and promoting re-
source circularity.
3.2.1 State Variables
The state of the system at any given time period t is
characterized by the state vector:
S
i,t
= {x
i,t
,
reuse sr
i
,
recycle sr
i
,
reuse cf
i
,
recycle cf
i
,
dispose cf
i
,
reuse cost
i
,
recycle cost
i
,
dispose cost
i
,
weight
i
}.
(1)
The elements of the state variable are detailed as
follows:
Inventory Level (x
i,t
). The current quantity of
component i in inventory at time t, updated dy-
namically based on reuse, recycling, or disposal
decisions.
Reuse Success Rate (reuse sr
i
). The probability
that component i can be successfully reused after
inspection and cleaning. This rate is based on his-
torical performance.
Recycle Success Rate (recycle sr
i
). The likeli-
hood that component i can be recycled if reuse is
not possible. This is also derived from past data.
Carbon Footprint (CFP). Environmental impact
elements associated with different actions:
reuse cf
i
(reuse).
recycle cf
i
(recycling).
dispose cf
i
(disposal).
These values are used to assess the environmental
impact of each action. They are calculated using
the Ecoinvent 3.10 database and Brightway Life
Cycle Assessment (LCA) software in a parametric
approach, connected to a Python program for au-
tomatic calculations. Specifically, the model uses
the avoided burden 0.100 method for EoL man-
agement to assign environmental credits to reuse
and recycling actions, based on the research con-
ducted by Nicholson et al. (2009).
ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems
92
Costs. Financial costs of various actions for com-
ponent i:
reuse cost
i
.
recycle cost
i
.
dispose cost
i
.
These values are used to assess the financial im-
pact of each action. For the reuse action, a sav-
ing equivalent to the component’s original price is
applied, while recycling savings are based on val-
ues from relevant research. Disposal incurs a cost
paid to third-party companies for waste manage-
ment.
Weight (weight
i
). The physical weight of the
component.
3.2.2 Action Set
The actions available for each component are:
A = {a
reuse
, a
recycle
, a
dispose
}, (2)
where a
reuse
, a
recycle
, and a
dispose
represent the actions
to reuse, recycle, or dispose of a component, respec-
tively.
3.2.3 Transition Dynamics
The transition from state S
t
to S
t+1
is determined by
the amount of returned product and the action chosen
for each component. The state transition for compo-
nent i can be expressed as:
x
i,t+1
= x
i,t
q
i
(a
t
) + r
i,t
, (3)
where
q
i
(a
t
) is the quantity of component i used in pe-
riod t, based on the action a
t
,
r
i,t
is the return quantity of component i in period
t.
3.2.4 Reward Function
The reward function R(S
t
, a
t
) incorporates three com-
ponents: environmental reward, economic reward,
and circularity reward. The total reward is a weighted
sum of these three objectives:
R(S
t
, a
t
) = ω
env
· R
env
(S
t
, a
t
) + ω
econ
· R
econ
(S
t
, a
t
)+
ω
circ
· R
circ
(S
t
, a
t
),
(4)
where
R
env
(S
t
, a
t
) is the environmental reward, derived
from the CFP of the chosen action,
R
econ
(S
t
, a
t
) is the economic reward, derived from
the cost of the chosen action,
R
circ
(S
t
, a
t
) is the circularity reward, based on the
contribution of the action to material circularity,
ω
env
, ω
econ
, and ω
circ
are the respective weights
for environmental, economic, and circularity ob-
jectives.
The environmental reward R
env
(S
t
, a
t
) minimizes
the carbon footprint (CFP) for each action. The effec-
tive carbon footprint for reuse, recycle, and dispose
actions is determined based on success rates and fall-
back options.
For reuse, the effective carbon footprint is given
by:
effective cf = reuse sr
i
· reuse cf
i
+ (1 reuse sr
i
)·
(recycle sr
i
· recycle cf
i
+ (1 recycle sr
i
)·
dispose cf
i
).
(5)
For recycle, the effective carbon footprint is:
effective cf = (recycle sr
i
· recycle cf
i
+ (1 recycle sr
i
)·
dispose cf
i
.
(6)
For dispose, the carbon footprint is simply
dispose cf
i
.
Thus, the environmental reward is the negative of
the effective carbon footprint, multiplied by the com-
ponent inventory x
i,t
:
R
env
(S
t
, a
t
) = effective cf · x
i,t
. (7)
Similarly, the economic reward R
econ
(S
t
, a
t
) mini-
mizes financial costs. Like the environmental reward,
it considers reuse, recycle, and dispose actions, each
carrying specific costs.
For reuse, the effective cost is calculated as:
effective cost = reuse sr
i
· reuse cost
i
+ (1 reuse sr
i
)·
(recycle sr
i
· recycle cost
i
+ (1 recycle sr
i
)·
dispose cost
i
).
(8)
For recycle, the effective cost is:
effective cost = recycle sr
i
· recycle cost
i
+ (1 recycle sr
i
)·
dispose cost
i
.
(9)
For dispose, the cost is dispose cost
i
.
Thus, the economic reward is the negative of the
effective cost, multiplied by the component inventory
x
i,t
:
R
econ
(S
t
, a
t
) = effective cost · x
i,t
. (10)
Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A
Multi-Objective Approach
93
The circularity reward R
circ
(S
t
, a
t
) promotes reuse
and recycling. The reward is proportional to the
amount of inventory successfully reused or recycled.
For reuse, the circularity reward is:
R
circ
(S
t
, a
t
) = reuse sr
i
· x
i,t
+ recycle sr
i
· (x
i,t
reuse sr
i
· x
i,t
).
(11)
For recycle, the reward reflects the portion recy-
cled:
R
circ
(S
t
, a
t
) = recycle sr
i
· x
i,t
. (12)
For dispose, the circularity reward is zero:
R
circ
(S
t
, a
t
) = 0. (13)
3.2.5 Value Function and System Policy
The Q-learning algorithm is used to update the action-
value function Q(S
t
, a
t
), which estimates the expected
cumulative reward for taking action a
t
in state S
t
. The
update rule is:
Q(S
t
, a
t
) (1 α)Q(S
t
, a
t
) + α[R(S
t
, a
t
)
+γmax
a
Q(S
t+1
, a
)
, (14)
where
α is the learning rate.
γ is the discount factor for future rewards.
max
a
Q(S
t+1
, a
) is the maximum expected future
reward for the next state S
t+1
.
3.3 CSC Performance Evaluation
The Q-learning model is simulated to learn the opti-
mal policy for each component. The performance of
the circular supply chain is evaluated by tracking key
metrics such as total carbon footprint, economic cost,
and circularity contribution. These metrics are used to
compare the performance of different EoL strategies
(reuse, recycle, dispose) and to assess the effective-
ness of the Q-learning optimization.
The simulation of the Q-learning model helps in
learning the optimal policies for managing the EoL
of components. The performance of the circular sup-
ply chain is evaluated using the three following key
metrics : total carbon footprint, circularity and total
economic cost.
3.3.1 Total Carbon Footprint
The total carbon footprint (CFP
total
) of the circular
supply chain is the sum of the carbon footprint gen-
erated from the reuse, recycling, disposal of compo-
nents, and the carbon footprint from virgin material
production. It is calculated as:
CFP
total
= CFP
csc
+CFP
reused
+CFP
recycled
+CFP
disposed
,
(15)
where
CFP
csc
= CFP
prod
·
T
t=1
D
t
,
CFP
reused
=
T
t=1
n
i=1
q
reused,i,t
· reuse cf
i
,
CFP
recycled
=
T
t=1
n
i=1
q
recycled,i,t
· recycle cf
i
,
CFP
disposed
=
T
t=1
n
i=1
q
disposed,i,t
· dispose cf
i
,
where CFP
csc
represents the total carbon footprint
associated with production, D
t
is the product demand
at time t, and q
reused,i,t
, q
recycled,i,t
, q
disposed,i,t
represent
the quantities reused, recycled, and disposed for com-
ponent i at time t.
3.3.2 Circularity Contribution
The circularity contribution (CC
total
) reflects the pro-
portion of materials successfully reused or recycled in
the supply chain. It is calculated as:
CC
total
=
T
t=1
n
i=1
q
reused,i,t
+ q
recycled,i,t
T
t=1
R
t
, (16)
where
q
reused,i,t
is the quantity of component i reused at time t,
q
recycled,i,t
is the quantity of component i recycled at time t,
R
t
is the total product returns at time t.
3.3.3 Total Economic Cost
The total economic cost (C
total
) is the sum of the pro-
duction costs and the costs associated with reuse, re-
cycling, and disposal actions. It is given by:
C
total
= C
prod
·
T
t=1
D
t
+C
reused
+C
recycled
+C
disposed
,
(17)
ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems
94
where
C
reused
=
T
t=1
n
i=1
q
reused,i,t
· reuse cost
i
,
C
recycled
=
T
t=1
n
i=1
q
recycled,i,t
· recycle cost
i
,
C
disposed
=
T
t=1
n
i=1
q
disposed,i,t
· dispose cost
i
,
where C
prod
represents the total production
cost, and q
reused,i,t
, q
recycled,i,t
, q
disposed,i,t
represent the
quantities reused, recycled, and disposed for compo-
nent i at time t.
3.4 Proposed Algorithm for Solving
RL-Based Circular Supply Chain
Optimization Model
In the previous sections, we described and mod-
eled the circular supply chain (CSC) problem in
the context of a reinforcement learning (RL) frame-
work. In this part, we propose an algorithm for solv-
ing the modeled problem using a Q-learning mecha-
nism, which is a temporal difference method widely
used for solving RL problems (Watkins, 1989). The
proposed algorithm aims to optimize EoL decision-
making by learning the value of Q-functions itera-
tively. After the learning process, the best action for
each component—whether to reuse, recycle, or dis-
pose—is selected as the optimal policy for future de-
cisions.
As shown in Algorithm 1, the system is simu-
lated over multiple episodes and periods, where the
Q-values Q(s, a) are learned and updated during the
iterations. During each episode, the system tracks the
product returns, and for each component in the sup-
ply chain, one of the three EoL decisions (Reuse, Re-
cycle, or Dispose) is selected based on the Q-table.
In each state, the system selects an action, observes
the reward (a function of environmental impact, eco-
nomic cost, and circularity contribution), and updates
Q(s, a) accordingly.
The probability of exploration is a function of
the episode number and is reduced as the number
of episodes increases, linearly decreasing from 100%
exploration (random actions) in the first episodes to
1% in the final episodes. The exploration rate is grad-
ually decreased using a decay factor ε, promoting the
balance between exploring new strategies and exploit-
ing the learned Q-values for optimal actions. This en-
sures that the algorithm explores various EoL strate-
gies in the early phases but converges to the most re-
warding policies over time.
Initialize:
Q-values Q(s, a) = 0 for all states s and
actions a
Set learning rate α, discount factor γ,
exploration rate ε, total production carbon
footprint c
prod
f p
, number of episodes n
episodes
Define action space
A = {Reuse, Recycle, Dispose}
Initialize inventory levels x
i,t
for all components i
while episode n
episodes
do
for each period t = 1, . . . , T do
for each component i = 1, . . . , n do
if Random exploration (ε) then
Select a random action a A;
else
Select action
a = argmax
a
Q(s, a) based on
Q-values;
end
Observe the next state s
and
calculate the rewards;
Combine rewards using the weighted
sum of environmental, economic,
and circularity factors:
r(s, a) = w
env
· r
env
(s, a) + w
eco
· r
eco
(s, a)
+ w
circ
· r
circ
(s, a)
(18)
Update the Q-value using the
Q-learning update rule:
Q(s, a) Q(s, a) + α
r(s, a) + γmax
a
Q(s
, a
)
Q(s, a)
(19)
Update the inventory level x
i,t
by
adjusting for reused, recycled, and
disposed quantities for the
component;
end
end
Decrease exploration rate
ε max(0.01, ε × 0.99);
Increment episode count;
end
Strategic Planning:
After training, for each component i, retrieve the
best action a = argmax
a
Q(s, a) and update
inventory levels accordingly. Evaluate
system-wide metrics;
return Optimal Q-values for each component
Algorithm 1: Q-learning for Circular Supply Chain Opti-
mization.
The reward function during the simulation is com-
puted using a weighted combination of environmental
impact, economic costs, and circularity contribution.
For each state-action pair, the cumulative reward is
calculated, and the Q-function is updated iteratively.
Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A
Multi-Objective Approach
95
The Q-value update is based on the learning rule:
Q(s, a) Q(s, a) + α [r(s, a) + γ max
a
Q(s
, a
) Q(s, a)]
(20)
where α is the learning rate, r(s, a) is the combined
reward, and γ is the discount factor that balances fu-
ture and immediate rewards.
Initially, the agents (components) have no knowl-
edge of the value of each action in every state. Hence,
all Q-values are set to zero for all state-action pairs.
The learning rate α determines how much weight to
give the most recent reward compared to the existing
Q-estimate. A suitable learning rate must be selected
to ensure convergence of the algorithm without over-
fitting to specific samples.
The objective of this model is to maximize cir-
cularity and minimize both the total carbon footprint
and the economic cost of the circular supply chain.
Therefore, the reward function is structured to encour-
age reuse and recycling while penalizing disposal and
excessive carbon emissions. The Q-learning process
continues until the Q-values converge for all state-
action pairs. After the learning phase, the best action
in each state can be retrieved through a greedy search
on the Q-table, which then informs the EoL decision
for each component in the circular supply chain.
4 RESULTS & ANALYSIS
4.1 Industrial Case Study
In this section, we present an application of the de-
veloped model, based on a real industrial case from
our partner, GE HealthCare. GE HealthCare (GEHC)
is a global leader in the sales and services of medical
systems, particularly in the field of medical imaging,
with over 4 million systems installed across more than
160 countries. Due to the critical nature of its prod-
ucts (medical devices) and the technological com-
plexity of their components, GE HealthCare imple-
ments a circular supply chain strategy through asset
recovery and buy-back programs.
In 2023, these initiatives resulted in the recov-
ery of approximately 7,375 units, including imaging
systems, ultrasound devices, magnets, and surgical
machines, contributing to the reuse of approximately
7.3 million kilograms of materials (GE Healthcare,
2023). The integration of circular economy princi-
ples into GE HealthCare’s operations and product life
cycles is a key approach to managing climate impact,
reducing waste, promoting recycling and reuse, and
minimizing resource consumption.
In this study, we apply the developed model to
a device consisting of 53 components and investi-
gate the impact of EoL management on the efficiency
of the circular supply chain (CSC). To demonstrate
the benefits of this approach, we evaluate the per-
formance of the proposed Q-learning model across
several scenarios, assessing its effectiveness in opti-
mizing CSC outcomes. These scenarios are designed
to explore the influence of varying reuse and recy-
cle strategies on CSC circularity, cost reduction, and
carbon footprint reduction. Additionally, the perfor-
mance of these strategies is compared to a baseline
linear supply chain scenario, where no circular strate-
gies are implemented.
The goal of the analysis is to reveal the gap be-
tween design circularity—the ideal circularity based
on product design feasibility—and CSC circular-
ity—the actual circularity achieved when considering
operational constraints in the supply chain.
Figure 4: Performance metrics across different scenarios,
showing the CSC circularity (by quantity), cost reduction,
and carbon footprint reduction.
Table 1 and Figure 4 present the results for four
scenarios. In each case, we measure CSC circularity
by quantity, along with the total cost reduction and
carbon footprint reduction.
4.1.1 Scenario 1: Baseline - No Reuse, No
Recycle
Scenario 1 represents the baseline case where no
reuse or recycling actions are taken. All returned
components are disposed of, resulting in zero circu-
larity. This scenario mimics a traditional linear supply
chain where no efforts are made to recover or recycle
products at the end of life. As expected, this results
in no circularity, no cost savings, and no carbon foot-
print reduction.
4.1.2 Scenario 2: Partial Reuse Feasibility
In Scenario 2, 41% of the product’s components
(by design feasibility) are deemed reusable based on
product design. However, the actual reuse success
ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems
96
Table 1: Performance metrics for the four scenarios, showing the achieved CSC circularity, cost, and carbon footprint reduc-
tions.
Scenario CSC Circularity (Quantity) Cost Reduction Carbon Footprint Reduction
1 0.0% 0.0% 0.0%
2 34.5% 18.4% 13.8%
3 80.6% 45.0% 56.9%
4 81.1% 45.1% 69.1%
rates, drawn from historical data, indicate that not all
designed reusable components can actually be reused
in the CSC. Despite this, the reuse operations suc-
ceed in recovering 34.5% of the product’s compo-
nents, contributing to CSC circularity.
However, this scenario highlights the gap between
design circularity—which indicates 41% reuse feasi-
bility—and the achieved CSC circularity of 34.5%.
The operational constraints and quality issues in the
supply chain cause a drop in the actual circularity
achieved. Similarly, there is a modest 18.4% cost
reduction and a 13.8% reduction in carbon footprint,
showing that even with partial reuse, significant sav-
ings can be realized.
4.1.3 Scenario 3: Advanced Reuse and
Recycling
In Scenario 3, 83% of the product’s components are
designed to be reusable or recyclable. Furthermore,
recycling is introduced for components where the pro-
cess is feasible, with most recyclable components
having a success rate of 100%. However, for one crit-
ical and rare component, ID 16, the recycling success
rate is only 70%, reflecting operational difficulties.
This scenario achieves a substantial 80.6% CSC
circularity by quantity. This again highlights the
gap between design circularity and CSC circular-
ity, driven by the limitations of recycling for cer-
tain components. The impact on cost and environ-
mental performance is notable, with a 45.0% cost re-
duction and a 56.9% reduction in carbon footprint.
This demonstrates that recycling, even with opera-
tional constraints, provides substantial benefits in cir-
cular supply chain management.
4.1.4 Scenario 4: Improved Recycling for
Component ID 16
In Scenario 4, the only difference from Scenario 3 is
that the recycling success rate for Component ID 16
is increased to 100%. This small improvement results
in a noticeable rise in both circularity and environ-
mental performance. CSC circularity by quantity in-
creases to 81.1%, and the carbon footprint reduction
improves to 69.1%. This scenario demonstrates how
addressing operational constraints, even for a single
component, can significantly improve circularity and
reduce environmental impact.
4.1.5 The Gap Between Design Circularity and
CSC Circularity
The scenarios reveal an important insight: the gap
between design circularity and CSC circularity.
While product design plays a crucial role in de-
termining circularity potential, the actual circularity
achieved in the CSC is constrained by operational and
quality issues. These constraints, represented by suc-
cess rates for reuse and recycling, prevent the full re-
alization of circularity potential in the supply chain.
In Scenario 3, for example, the design circularity
suggests that 83% of the product’s components can
be reused or recycled, but the CSC circularity is only
80.6%. Scenario 4 closes this gap slightly, but only by
addressing operational constraints on recycling suc-
cess rates. This illustrates that optimizing the circular
supply chain involves not only design improvements
but also a focus on addressing real-world operational
challenges that hinder circularity.
4.2 Limitation
One limitation of this study is the circularity metric
used to assess the performance of the circular supply
chain (CSC). In this analysis, the metric for CSC cir-
cularity is based on the quantity of components reused
and recycled compared to what has been returned.
This approach assumes that reuse and recycling con-
tribute equally to circularity, which is not always the
case. In practice, the reuse process typically does not
require virgin materials, while recycling may some-
times necessitate the addition of virgin materials to
meet product specifications and requirements.
This simplification overlooks the varying
resource-saving potentials of reuse and recycling.
To provide a more accurate assessment of circu-
larity, future research should incorporate more
comprehensive circularity metrics that capture the
true resource-saving benefits of reuse and recycling
actions.
Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A
Multi-Objective Approach
97
5 CONCLUSIONS &
PERSPECTIVES
This research presents a decision-making mechanism
for circular supply chains focusing on EoL man-
agement that integrates environmental, economic,
and circularity performance through Q-learning. By
leveraging historical success rates for reuse and re-
cycling actions, the model reflects the operational re-
alities of EoL management, contrasting the idealized
design circularity with actual circular supply chain
(CSC) performance. Our findings demonstrate that
even when components are designed for reuse or recy-
cling, operational constraints, such as quality issues,
can significantly impact the realized circularity, car-
bon footprint reduction, and cost savings.
Through scenario analysis, we showed the trade-
offs between various EoL strategies and the sensi-
tivity of circularity and sustainability outcomes to
component-specific success rates. The model ad-
dresses the decision-making challenge of whether to
reuse, recycle, or dispose of returned components in
a circular supply chain, providing a robust framework
for managing EoL in a sustainable and cost-effective
manner.
However, the cost modeling in this study assumes
simplified scenarios. Future work should refine these
cost assumptions to better reflect real-world com-
plexities and explore other optimization techniques,
such as multi-objective evolutionary algorithms (e.g.,
NSGA-III or genetic algorithms), to better explore
the solution space and identify trade-offs between
different objectives. Additionally, while the model
currently evaluates environmental impact primarily
through carbon footprint reduction, future studies
should incorporate a broader range of impact cate-
gories to provide a more comprehensive environmen-
tal analysis.
Moreover, the potential for integrating more com-
prehensive circularity metrics to capture the resource-
saving benefits of circular supply chains should be ex-
plored to further enhance decision-making in circular
supply chain management.
REFERENCES
Abaku, E. A. and Odimarha, A. C. (2024). Sustainable sup-
ply chain management in the medical industry: a theo-
retical and practical examination. International Med-
ical Science Research Journal, 4(3):319–340.
Agyemang, M., Kusi-Sarpong, S., Khan, S. A., Mani, V.,
Rehman, S. T., and Kusi-Sarpong, H. (2019). Drivers
and barriers to circular economy implementation: An
explorative study in pakistan’s automobile industry.
Management Decision, 57(4):971–994.
Bouzon, M., Govindan, K., and Rodriguez, C. M. T. (2018).
Evaluating barriers for reverse logistics implementa-
tion under a multiple stakeholders’ perspective analy-
sis using grey decision making approach. Resources,
conservation and recycling, 128:315–335.
Bressanelli, G., Perona, M., and Saccani, N. (2019).
Challenges in supply chain redesign for the circu-
lar economy: a literature review and a multiple case
study. International Journal of Production Research,
57(23):7395–7422.
de Sousa Jabbour, A. B. L., Luiz, J. V. R., Luiz, O. R., Jab-
bour, C. J. C., Ndubisi, N. O., de Oliveira, J. H. C., and
Junior, F. H. (2019). Circular economy business mod-
els and operations management. Journal of cleaner
production, 235:1525–1539.
D’Alessandro, C., Szopik-Depczy
´
nska, K., Tarczy
´
nska-
Łuniewska, M., Silvestri, C., and Ioppolo, G. (2024).
Exploring circular economy practices in the health-
care sector: A systematic review and bibliometric
analysis. Sustainability, 16(1):401.
Farooque, M., Zhang, A., Th
¨
urer, M., Qu, T., and Huis-
ingh, D. (2019). Circular supply chain management:
A definition and structured literature review. Journal
of cleaner production, 228:882–900.
GE Healthcare (2023). Sustainability Report.
https://www.gehealthcare.com/-/jssmedia/
gehc/us/files/about-us/sustainability/reports/
ge-healthcare-sustainability-report-2023.pdf?rev=-1.
Accessed: 2024-09-24.
Genovese, A., Acquaye, A. A., Figueroa, A., and Koh, S. L.
(2017). Sustainable supply chain management and the
transition towards a circular economy: Evidence and
some applications. Omega, 66:344–357.
Govindan, K. and Hasanagic, M. (2018). A systematic re-
view on drivers, barriers, and practices towards cir-
cular economy: a supply chain perspective. Interna-
tional Journal of Production Research, 56(1-2):278–
311.
Goyal, S., Esposito, M., and Kapoor, A. (2018). Circular
economy business models in developing economies:
lessons from india on reduce, recycle, and reuse
paradigms. Thunderbird International Business Re-
view, 60(5):729–740.
Halse, L. L. and Jæger, B. (2019). Operationalizing industry
4.0: Understanding barriers of industry 4.0 and circu-
lar economy. In Advances in Production Management
Systems. Towards Smart Production Management Sys-
tems: IFIP WG 5.7 International Conference, APMS
2019, Austin, TX, USA, September 1–5, 2019, Pro-
ceedings, Part II. Springer.
Han, J., Ijuin, H., Kinoshita, Y., Yamada, T., Yamada, S.,
and Inoue, M. (2021). Sustainability assessment of
reuse and recycling management options for end-of-
life computers-korean and japanese case study analy-
sis. Recycling, 6(3):55.
Hasegawa, S., Kinoshita, Y., Yamada, T., and Bracke, S.
(2019). Life cycle option selection of disassembly
parts for material-based co2 saving rate and recovery
ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems
98
cost: Analysis of different market value and labor cost
for reused parts in german and japanese cases. Inter-
national Journal of Production Economics, 213:229–
242.
Herczeg, G., Akkerman, R., and Hauschild, M. Z. (2018).
Supply chain collaboration in industrial symbiosis
networks. Journal of cleaner production, 171:1058–
1067.
Hoveling, T., Nijdam, A. S., Monincx, M., Faludi, J., and
Bakker, C. (2024). Circular economy for medical de-
vices: Barriers, opportunities and best practices from
a design perspective. Resources, Conservation and
Recycling, 208:107719.
Khodier, A., Williams, K., and Dallison, N. (2018). Chal-
lenges around automotive shredder residue production
and disposal. Waste Management, 73:566–573.
Kinoshita, Y., Yamada, T., Gupta, S. M., Ishigaki, A.,
and Inoue, M. (2016). Disassembly parts selection
and analysis for recycling rate and cost by goal pro-
gramming. Journal of Advanced Mechanical Design,
Systems, and Manufacturing, 10(3):JAMDSM0052–
JAMDSM0052.
Kirchherr, J., Yang, N.-H. N., Schulze-Sp
¨
untrup, F.,
Heerink, M. J., and Hartley, K. (2023). Conceptual-
izing the circular economy (revisited): an analysis of
221 definitions. Resources, Conservation and Recy-
cling, 194:107001.
Kravchenko, M., Pigosso, D. C., and McAloone, T. C.
(2019). Towards the ex-ante sustainability screening
of circular economy initiatives in manufacturing com-
panies: Consolidation of leading sustainability-related
performance indicators. Journal of Cleaner Produc-
tion, 241:118318.
Lahane, S., Kant, R., and Shankar, R. (2020). Circular
supply chain management: A state-of-art review and
future opportunities. Journal of Cleaner Production,
258:120859.
Mangla, S. K., Luthra, S., Mishra, N., Singh, A., Rana,
N. P., Dora, M., and Dwivedi, Y. (2018). Barriers to
effective circular supply chain management in a devel-
oping country context. Production Planning & Con-
trol, 29(6):551–569.
Mishra, J. L., Hopkinson, P. G., and Tidridge, G. (2018).
Value creation from circular economy-led closed loop
supply chains: a case study of fast-moving consumer
goods. Production Planning & Control, 29(6):509–
521.
Nasir, M. H. A., Genovese, A., Acquaye, A. A., Koh, S.,
and Yamoah, F. (2017). Comparing linear and circu-
lar supply chains: A case study from the construction
industry. International Journal of Production Eco-
nomics, 183:443–457.
NHS (2022). Delivering a ‘net zero’ national
health service. https://www.england.nhs.uk/
greenernhs/wp-content/uploads/sites/51/2022/07/
B1728-delivering-a-net-zero-nhs-july-2022.pdf.
Accessed: 2024-09-24.
Nicholson, A. L., Olivetti, E. A., Gregory, J. R., Field, F. R.,
and Kirchain, R. E. (2009). End-of-life lca allocation
methods: Open loop recycling impacts on robustness
of material selection decisions. In 2009 IEEE interna-
tional symposium on sustainable systems and technol-
ogy, pages 1–6. IEEE.
Pichler, P.-P., Jaccard, I. S., Weisz, U., and Weisz, H.
(2019). International comparison of health care
carbon footprints. Environmental research letters,
14(6):064004.
Ranta, V., Aarikka-Stenroos, L., Ritala, P., and M
¨
akinen,
S. J. (2018). Exploring institutional drivers and bar-
riers of the circular economy: A cross-regional com-
parison of china, the us, and europe. Resources, Con-
servation and Recycling, 135:70–82.
Romanello, M., Di Napoli, C., Green, C., Kennard, H.,
Lampard, P., Scamman, D., Walawender, M., Ali, Z.,
Ameli, N., Ayeb-Karlsson, S., et al. (2023). The 2023
report of the lancet countdown on health and climate
change: the imperative for a health-centred response
in a world facing irreversible harms. The Lancet,
402(10419):2346–2394.
Rosa, P., Sassanelli, C., and Terzi, S. (2019). Circular busi-
ness models versus circular benefits: An assessment
in the waste from electrical and electronic equipments
sector. Journal of cleaner production, 231:940–952.
Roy, T., Garza-Reyes, J. A., Kumar, V., Kumar, A., and
Agrawal, R. (2022). Redesigning traditional linear
supply chains into circular supply chains–a study into
its challenges. Sustainable Production and Consump-
tion, 31:113–126.
Tura, N., Hanski, J., Ahola, T., St
˚
ahle, M., Piiparinen, S.,
and Valkokari, P. (2019). Unlocking circular busi-
ness: A framework of barriers and drivers. Journal
of cleaner production, 212:90–98.
Vermunt, D. A., Negro, S. O., Verweij, P. A., Kuppens,
D. V., and Hekkert, M. P. (2019). Exploring barri-
ers to implementing different circular business mod-
els. Journal of cleaner production, 222:891–902.
Watkins, C. J. C. H. (1989). Learning from delayed rewards.
Winkler, H. (2011). Closed-loop production systems—a
sustainable supply chain approach. CIRP Journal
of Manufacturing Science and Technology, 4(3):243–
246.
Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A
Multi-Objective Approach
99