Enhancing Circularity in Medical Device Supply Chains by Optimizing

EoL Decisions Through Reinforcement Learning: A Multi-Objective

Approach

Souﬁane El Bechari

, Oualid Jouini

1 a

, Zied Jemai

1,2

, Fourat Trabelsi

and Robert Heidsieck

Industrial Engineering Laboratory (LGI) CentraleSup

elec, Paris-Saclay University, Gif-sur-Yvette, France

OASIS - ENIT, University of Tunis, Elmanar, BP37, Le Belvedere, Tunisia

GE HealthCare, 283 Rue de la Mini

ere, 78530 Buc, France

ﬁ

Keywords:

Multi-Component Systems, Circular Supply Chain, EoL Management, Reinforcement Learning,

Multi-Objective Approach.

Abstract:

Circular supply chains are becoming essential in the pursuit of sustainability, as they promote the responsible

disposal, recycling, and reuse of products at the end of their life cycles. This research, developed in collabora-

tion with GE HealthCare, presents a multi-objective optimization framework that incorporates environmental,

economic, and circularity performance in end-of-life (EoL) decision-making. The proposed model leverages

historical data on reuse and recycling success rates to capture the operational realities of circular supply chains.

By employing Q-learning, this paper aims to develop a decision-support mechanism that optimizes EoL ac-

tions for components, thereby enhancing the circularity, reducing carbon footprint, and minimizing economic

costs within the circular supply chain.

1 INTRODUCTION

The increasing emphasis on Sustainable and Cir-

cular Supply Chain Management (SCSCM) within

the healthcare sector reﬂects a critical acknowledg-

ment of the environmental and economic impacts

of medical supply chain practices. This shift is

driven by the need to reconcile healthcare opera-

tions with sustainability goals, as the sector is re-

sponsible for signiﬁcant waste generation and carbon

emissions, contributing to global environmental chal-

lenges (D’Alessandro et al., 2024). For instance, the

healthcare sector is responsible for around 4.6 % to 5

% of global greenhouse gas (GHG) emissions (Pich-

ler et al., 2019; Romanello et al., 2023), equivalent to

2 billion carbon dioxide equivalent (CO2e). Given the

signiﬁcant impact of the healthcare sector on climate

change, there have been a number of policy initiatives

aimed at reducing the environmental footprint, most

notably through the NHS’s ”Delivering a Net Zero

National Health Service” strategy. the NHS, as one

of the world’s largest healthcare systems, has set tar-

gets to achieve net-zero emissions by 2040 for emis-

https://orcid.org/0000-0002-9498-165X

sions under its direct control, and by 2045 for those it

can inﬂuence indirectly, such as those from the supply

chain and patient travel (NHS, 2022). Additionally,

major healthcare companies like GE HealthCare are

aligning their sustainability goals with these broader

initiatives. GE HealthCare has set goals to reduce op-

erational GHG emissions (Scope 1 and 2) by 42%

and Scope 3 emissions by 25% by 2030, as part of

their commitment to reaching net zero by 2050 (GE

Healthcare, 2023). The intersection of healthcare and

environmental sustainability is becoming increasingly

prominent as the global efforts faces the dual chal-

lenges of delivering quality healthcare while combat-

ing climate change and minimizing waste. In this con-

text, the medical supply chain plays a pivotal role in

addressing environmental concerns, especially in ef-

forts to reduce carbon emissions and waste generation

(Abaku and Odimarha, 2024). The medical supply

chain, essential for delivering healthcare products like

pharmaceuticals and medical devices, is a highly in-

tricate system that signiﬁcantly impacts the environ-

ment. Due to the speciﬁc nature of medical equip-

ment, signiﬁcant efforts have been made in design,

operations, and supply chain management to maintain

El Bechari, S., Jouini, O., Jemai, Z., Trabelsi, F. and Heidsieck, R.

Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A Multi-Objective Approach.

DOI: 10.5220/0013122400003893

In Proceedings of the 14th International Conference on Operations Research and Enterprise Systems (ICORES 2025), pages 88-99

ISBN: 978-989-758-732-0; ISSN: 2184-4372

operating conditions and support circular economy

principles to reduce environmental impact. But still a

need for further innovations and research, particularly

in optimizing the supply chain of medical equipment

through the integration of circular economy practices

to enhance both circularity and sustainability within

the healthcare sector.

Unlike traditional linear supply chains, which fol-

low a ’take-make-dispose’ approach, circular supply

chains aim to ”integrate circular economy thinking

into supply chain management and its surrounding

industrial and natural ecosystems” (Farooque et al.,

2019). A ”circular economy is a regenerative eco-

nomic system which necessitates a paradigm shift to

replace the ‘end of life’ concept with reducing, alter-

natively reusing, recycling, and recovering materials

throughout the supply chain, with the aim to promote

value maintenance and sustainable development, cre-

ating environmental quality, economic development,

and social equity, to the beneﬁt of current and future

generations” (Kirchherr et al., 2023). This circular

supply chains aim to extend the life of products, com-

ponents and materials through CE strategies such as

reuse, recycling, and remanufacturing. These circular

actions allow businesses to reduce their dependency

on virgin materials, minimize waste, and lower their

overall environmental footprint.

One of the primary challenges in implementing

circular supply chains is assuring effective returns

and managing the EoL’s of products by determining

whether to reuse, recycle, or dispose of returned used

products and components. Each of these decisions

carries circular, environmental, and economic impli-

cations. For example, reusing components reduces

the need for new materials but may be constrained

by technical or quality limitations. Recycling can re-

cover valuable materials, but the associated energy

and costs may be signiﬁcant. Finally, disposal results

in increased waste and environmental impact, but in

some cases, it may be the only viable option if reuse

or recycling is not feasible or not the most pertinent

and appropriate solution.

Moreover, circular supply chains are often

complex and involve multiple components, each

with unique environmental and economic proﬁles.

Decision-making in this context requires careful con-

sideration of trade-offs between minimizing environ-

mental impact, reducing costs, and maximizing circu-

larity (i.e., the proportion of products that are success-

fully reused or recycled). To the best of our knowl-

edge, current models in circular supply chain manage-

ment often fail to fully integrate these multiple objec-

tives and do not incorporate real-world success rates

for reuse and recycling, leading to unrealistic expec-

tations about circularity potential.

This study addresses this challenge by developing

a decision-support mechanism utilizing Q-learning, a

reinforcement learning (RL) technique, to optimize

the management of EoL components in a CSC. The

model operates at the component level, enabling real-

time decision-making for reuse, recycling, and dis-

posal actions, based on component-speciﬁc parame-

ters and performance metrics.

The rest of the paper is organized as follows. Sec-

tion 2 provides an overview of the state-of-the-art. In

Section 3, the problem formulation and modeling are

presented, along with a detailed description of the

proposed algorithm to solve the problem. Section

4 introduces an industrial case study based on real-

world data, including results and analysis. Finally,

Section 5 concludes the paper and highlights future

perspectives.

2 LITERATURE REVIEW

As discussed in the introduction, the primary con-

tributions of this paper are in the domains of circu-

lar supply chain and EoL management, and multi-

objective decision-making. Speciﬁcally, we focus on

identifying the most effective strategies for optimiz-

ing reuse, recycling, and disposal actions within com-

plex multi-component circular supply chains. In the

following section, we review the key streams of liter-

ature in these areas, including circular supply chains,

and EoL management. We then position the contribu-

tions of this paper within the broader context of exist-

ing research, highlighting the novelty of our approach

in balancing economic, environmental, and circularity

objectives across multiple components.

2.1 Circular Supply Chains and EoL

Management

The integration of Circular Economy (CE) princi-

ples into Supply Chain Management (SCM) has been

widely referred to as Circular Supply Chain Manage-

ment (CSCM) in the literature (Genovese et al., 2017;

Nasir et al., 2017; Farooque et al., 2019). CSCM

encompasses various deﬁnitions, each emphasizing

the role of CE in reshaping supply chain activities

(Lahane et al., 2020). According to Farooque et al.

(2019), ”Circular supply chain management is the in-

tegration of circular thinking into the management of

the supply chain and its surrounding industrial and

natural ecosystems. It systematically restores tech-

nical materials and regenerates biological materials

Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A

Multi-Objective Approach

toward a zero-waste vision through system-wide in-

novation in business models and supply chain func-

tions from product/service design to EoL and waste

management, involving all stakeholders in a prod-

uct/service lifecycle, including parts/product manu-

facturers, service providers, consumers, and users”.

The adoption of CE practices within supply chains

offers several key beneﬁts (Lahane et al., 2020), in-

cluding enhanced resource availability (Goyal et al.,

2018), improved EoL strategies (de Sousa Jabbour

et al., 2019), enriched value propositions (Mishra

et al., 2018), reduced waste generation (Herczeg et al.,

2018), and improved sustainability (Winkler, 2011).

However, the transition from traditional, linear

supply chain models—characterized by the ”take-

make-dispose” framework—towards circular systems

presents several signiﬁcant challenges. These chal-

lenges, as noted by Roy et al. (2022), include a per-

sistent industrial preference for linear models, com-

pounded by feasibility concerns surrounding CE im-

plementation (Tura et al., 2019; Agyemang et al.,

2019), and the absence of robust performance mea-

surement systems (Tura et al., 2019). Additionally,

higher upfront costs related to circular business mod-

els (Vermunt et al., 2019) and complex product de-

signs that hinder the ease and cost-effectiveness of

recycling, reuse, or remanufacturing (Khodier et al.,

2018; Rosa et al., 2019; Halse and Jæger, 2019) re-

main signiﬁcant barriers to wide-scale adoption.

Moreover, the absence of standardized circular

economy processes and metrics across industries hin-

ders cross-industry implementation of circular mod-

els (Govindan and Hasanagic, 2018; Mangla et al.,

2018; Ranta et al., 2018; Bouzon et al., 2018). This

gap, combined with the lack of widely accepted met-

rics and indicators to measure circularity performance

(Kravchenko et al., 2019; Bressanelli et al., 2019),

limits the scalability and effectiveness of CSCM.

The healthcare sector, in particular, presents

unique challenges for implementing circular econ-

omy principles due to stringent regulatory standards

and the critical importance of hygiene. Research

highlights signiﬁcant barriers to circularity in medi-

cal equipment, including perceived safety risks, reg-

ulatory complexities, and ﬁnancial constraints related

to medical device design (Hoveling et al., 2024).

Beyond these general barriers, the complexity in-

tensiﬁes when it comes to managing the EoL of multi-

component products, such as electronics and medical

devices in CSC. In these cases, each component may

have its own unique lifecycle and recovery potential

for reuse or recycling. Han et al. (2021) highlight

the critical need for component-level analysis to de-

termine the optimal EoL strategy, which should con-

sider factors such as regional greenhouse gas (GHG)

emissions and market prices for resale. This is par-

ticularly relevant in assembly-based products, where

entire units are often discarded as waste, even though

some components retain value that could be recovered

through reuse or recycling (Kinoshita et al., 2016).

The process of disassembling complex products

into individual components for reuse, recycling, or

disposal presents opportunities to prevent the unnec-

essary consumption of virgin materials and reduce

GHG emissions (Hasegawa et al., 2019) . These

actions contribute signiﬁcantly to the circularity of

supply chains by maximizing the lifecycle of each

component. However, the disassembly process can

be resource-intensive, requiring signiﬁcant labor and

cost, making it essential to anticipate the potential

outcomes of reuse, recycling, and disposal before im-

plementation. As Hasegawa et al. (2019) point out,

simulating these decisions is crucial to optimize EoL

management in multi-component systems, allowing

for the efﬁcient allocation of resources and the reduc-

tion of environmental impact.

Despite the growing body of research on CSCM,

there remains a signiﬁcant gap in the literature con-

cerning EoL management and its broader impact on

the effectiveness of circular supply chains. Specif-

ically, limited attention has been paid to quanti-

tative models that simulate EoL decision-making

for multiple components, considering factors such

as component-speciﬁc reuse and recycling poten-

tial, cost, and environmental impact within a multi-

objective approach. Addressing this gap, the present

study proposes a quantitative model that simulates cir-

cular supply chain decisions at the component level,

optimizing the actions of reuse, recycling, and dis-

posal. The model developed here provides decision

support for EoL scenarios involving multiple compo-

nents within a single product, offering a more gran-

ular understanding of how different EoL strategies

affect circularity, cost, and environmental outcomes.

By adopting a detailed, component-by-component ap-

proach, this study enhances decision-making in circu-

lar supply chains, providing a decision-support mech-

anism for addressing the inherent complexities of

multi-component EoL management in CSC.

3 PROBLEM DESCRIPTION AND

MODELING

In this paper, the term ”product” or ”device” refers

to spare parts that can either be used in the produc-

tion of new large-scale equipment or for maintenance

services to support an installed base, ensuring the op-

ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems

erational continuity of the equipment. We focus on

a circular supply chain (CSC) that integrates suppli-

ers, manufacturers, customers (installed base), and re-

verse logistics for the take-back of defective products.

These defective products are typically returned from

the ﬁeld and are not directly repairable or reusable in

their entirety.

Figure 1: Circular supply chain model and its characteris-

tics.

As shown in Fig. 1, the circular supply chain

model consists of various stages including disassem-

bly, reuse, recycling, and disposal. Upon return, de-

fective products are disassembled to extract reusable

components. These components are then cleaned, in-

spected, and tested to determine whether they meet

the required speciﬁcations for reuse. Components that

pass inspection are classiﬁed as ”qualiﬁed as new”

and are reintroduced into the production process for

manufacturing new products.

In this circular supply chain, the demand from

manufacturer must be satisﬁed either through the

closed-loop system—by reusing or recycling compo-

nents—or by procuring new materials or components

from suppliers. The goal is to maximize the use of

reused and recycled materials. However, when the

reuse or recycling process cannot meet the required

demand, procurement from suppliers is necessary to

ensure production continuity. For components that do

not meet reuse requirements, two options are consid-

ered. If the components are feasible for recycling and

the recycling process is legally compliant, the mate-

rials are recycled and used in the production of new

products, forming a closed-loop system. If recycling

is not feasible, either due to design limitations or le-

gal restrictions, the components are disposed of by a

third-party company, incurring an additional disposal

cost.

Moreover, for components that are non-reusable

by design, the same decision-making process applies.

If recycling is possible, the materials are processed

accordingly. If recycling is not an option, these com-

ponents are handled by a third-party disposal ser-

vice, which incurs a disposal cost. As depicted in

Fig. 2, a decision tree outlines the methodology for

managing each component’s EoL on a component-by-

component basis.

Figure 2: A decision tree for managing components’ EoL

in a component-by-component methodology.

Thus, for each component within the circular sup-

ply chain, a decision must be made between three EoL

options: reuse, recycling within a closed-loop system,

or disposal by a third-party.

The aim of this research is to develop a decision-

making framework for managing returned compo-

nents in a circular supply chain using a component-

by-component methodology, speciﬁcally focusing on

optimizing these EoL decisions for each component.

The complexity of the problem arises from the vari-

ability in return ﬂows, the varying success rates for

reuse and recycling, and the economic and environ-

mental considerations for each component.

3.1 Agent-Based Modeling of Circular

Supply Chain

To apply the reinforcement learning (RL) mechanism

to the circular supply chain (CSC) described in this

work, it is necessary to formulate the problem as

an RL model. As previously discussed, RL mod-

els are implemented within an agent-based frame-

work, where each component acts as an independent

decision-making entity. The ﬁrst step in this approach

is to model each component and process in the CSC as

a multi-agent system. Subsequently, the RL problem

Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A

Multi-Objective Approach

is deﬁned within this designed agent-based frame-

work.

A circular supply chain involves various opera-

tions—reuse, recycling, and disposal—each of which

must be managed efﬁciently to minimize environ-

mental and economic costs while maximizing cir-

cularity. In the real world, each component must

autonomously make decisions regarding whether to

reuse, recycle, or dispose of itself based on its state

and the current system dynamics. These autonomous

decisions are key to improving the overall perfor-

mance of the supply chain by considering carbon

footprint reduction, cost minimization, and resource

circularity. However, the decisions made by individ-

ual components must also be coordinated with the

overall system objectives to optimize the global per-

formance of the CSC.

Figure 3: Agent-based framework of circular supply chain

EoL management system.

As shown in Fig. 3, the agent-based model treats

each component in the CSC as an agent. Each

component-agent is responsible for making real-time

decisions regarding its EoL treatment—reuse, recy-

cle, or dispose—based on the observed state. These

agents interact with the CSC system through a Q-

learning-based RL mechanism, where each agent in-

dependently learns to optimize its decisions over time

based on feedback (rewards) received from the envi-

ronment.

3.2 RL Modeling of EoL Management

Problem in the Circular Supply

Chain

In this subsection, we deﬁne the characteristics of the

reinforcement learning (RL) model used to solve the

circular supply chain (CSC) decision-making prob-

lem. Key elements of the RL model include the state

variable, reward function, value function, and system

policy. These components work together within a Q-

learning framework to guide agents (components) in

deciding between reuse, recycling, and disposal, with

the ultimate goal of minimizing the overall carbon

footprint, reducing economic costs, and promoting re-

source circularity.

3.2.1 State Variables

The state of the system at any given time period t is

characterized by the state vector:

i,t

= {x

i,t

reuse sr

recycle sr

reuse cf

recycle cf

dispose cf

reuse cost

recycle cost

dispose cost

weight

(1)

The elements of the state variable are detailed as

follows:

• Inventory Level (x

i,t

). The current quantity of

component i in inventory at time t, updated dy-

namically based on reuse, recycling, or disposal

decisions.

• Reuse Success Rate (reuse sr

). The probability

that component i can be successfully reused after

inspection and cleaning. This rate is based on his-

torical performance.

• Recycle Success Rate (recycle sr

). The likeli-

hood that component i can be recycled if reuse is

not possible. This is also derived from past data.

• Carbon Footprint (CFP). Environmental impact

elements associated with different actions:

– reuse cf

(reuse).

– recycle cf

(recycling).

– dispose cf

(disposal).

These values are used to assess the environmental

impact of each action. They are calculated using

the Ecoinvent 3.10 database and Brightway Life

Cycle Assessment (LCA) software in a parametric

approach, connected to a Python program for au-

tomatic calculations. Speciﬁcally, the model uses

the avoided burden 0.100 method for EoL man-

agement to assign environmental credits to reuse

and recycling actions, based on the research con-

ducted by Nicholson et al. (2009).

ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems

• Costs. Financial costs of various actions for com-

ponent i:

– reuse cost

– recycle cost

– dispose cost

These values are used to assess the ﬁnancial im-

pact of each action. For the reuse action, a sav-

ing equivalent to the component’s original price is

applied, while recycling savings are based on val-

ues from relevant research. Disposal incurs a cost

paid to third-party companies for waste manage-

ment.

• Weight (weight

). The physical weight of the

component.

3.2.2 Action Set

The actions available for each component are:

A = {a

reuse

, a

recycle

, a

dispose

}, (2)

where a

reuse

, a

recycle

, and a

dispose

represent the actions

to reuse, recycle, or dispose of a component, respec-

tively.

3.2.3 Transition Dynamics

The transition from state S

to S

t+1

is determined by

the amount of returned product and the action chosen

for each component. The state transition for compo-

nent i can be expressed as:

i,t+1

= x

i,t

− q

) + r

i,t

, (3)

where

• q

) is the quantity of component i used in pe-

riod t, based on the action a

• r

i,t

is the return quantity of component i in period

3.2.4 Reward Function

The reward function R(S

, a

) incorporates three com-

ponents: environmental reward, economic reward,

and circularity reward. The total reward is a weighted

sum of these three objectives:

R(S

, a

) = ω

env

· R

env

, a

) + ω

econ

· R

econ

, a

circ

· R

circ

, a

(4)

where

• R

env

, a

) is the environmental reward, derived

from the CFP of the chosen action,

• R

econ

, a

) is the economic reward, derived from

the cost of the chosen action,

• R

circ

, a

) is the circularity reward, based on the

contribution of the action to material circularity,

• ω

env

, ω

econ

, and ω

circ

are the respective weights

for environmental, economic, and circularity ob-

jectives.

The environmental reward R

env

, a

) minimizes

the carbon footprint (CFP) for each action. The effec-

tive carbon footprint for reuse, recycle, and dispose

actions is determined based on success rates and fall-

back options.

For reuse, the effective carbon footprint is given

by:

effective cf = reuse sr

· reuse cf

+ (1 − reuse sr

)·

(recycle sr

· recycle cf

+ (1 − recycle sr

)·

dispose cf

(5)

For recycle, the effective carbon footprint is:

effective cf = (recycle sr

· recycle cf

+ (1 − recycle sr

)·

dispose cf

(6)

For dispose, the carbon footprint is simply

dispose cf

Thus, the environmental reward is the negative of

the effective carbon footprint, multiplied by the com-

ponent inventory x

i,t

env

, a

) = −effective cf · x

i,t

. (7)

Similarly, the economic reward R

econ

, a

) mini-

mizes ﬁnancial costs. Like the environmental reward,

it considers reuse, recycle, and dispose actions, each

carrying speciﬁc costs.

For reuse, the effective cost is calculated as:

effective cost = reuse sr

· reuse cost

+ (1 − reuse sr

)·

(recycle sr

· recycle cost

+ (1 − recycle sr

)·

dispose cost

(8)

For recycle, the effective cost is:

effective cost = recycle sr

· recycle cost

+ (1 − recycle sr

)·

dispose cost

(9)

For dispose, the cost is dispose cost

Thus, the economic reward is the negative of the

effective cost, multiplied by the component inventory

i,t

econ

, a

) = −effective cost · x

i,t

. (10)

Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A

Multi-Objective Approach

The circularity reward R

circ

, a

) promotes reuse

and recycling. The reward is proportional to the

amount of inventory successfully reused or recycled.

For reuse, the circularity reward is:

circ

, a

) = reuse sr

· x

i,t

+ recycle sr

· (x

i,t

− reuse sr

· x

i,t

(11)

For recycle, the reward reﬂects the portion recy-

cled:

circ

, a

) = recycle sr

· x

i,t

. (12)

For dispose, the circularity reward is zero:

circ

, a

) = 0. (13)

3.2.5 Value Function and System Policy

The Q-learning algorithm is used to update the action-

value function Q(S

, a

), which estimates the expected

cumulative reward for taking action a

in state S

. The

update rule is:

Q(S

, a

) ← (1 − α)Q(S

, a

) + α[R(S

, a

)

+γmax

′

Q(S

t+1

, a

′

)



, (14)

where

• α is the learning rate.

• γ is the discount factor for future rewards.

• max

′

Q(S

t+1

, a

′

) is the maximum expected future

reward for the next state S

t+1

3.3 CSC Performance Evaluation

The Q-learning model is simulated to learn the opti-

mal policy for each component. The performance of

the circular supply chain is evaluated by tracking key

metrics such as total carbon footprint, economic cost,

and circularity contribution. These metrics are used to

compare the performance of different EoL strategies

(reuse, recycle, dispose) and to assess the effective-

ness of the Q-learning optimization.

The simulation of the Q-learning model helps in

learning the optimal policies for managing the EoL

of components. The performance of the circular sup-

ply chain is evaluated using the three following key

metrics : total carbon footprint, circularity and total

economic cost.

3.3.1 Total Carbon Footprint

The total carbon footprint (CFP

total

) of the circular

supply chain is the sum of the carbon footprint gen-

erated from the reuse, recycling, disposal of compo-

nents, and the carbon footprint from virgin material

production. It is calculated as:

CFP

total

= CFP

csc

+CFP

reused

+CFP

recycled

+CFP

disposed

(15)

where

CFP

csc

= CFP

prod

∑

t=1

CFP

reused

∑

t=1

∑

i=1

reused,i,t

· reuse cf

CFP

recycled

∑

t=1

∑

i=1

recycled,i,t

· recycle cf

CFP

disposed

∑

t=1

∑

i=1

disposed,i,t

· dispose cf

where CFP

csc

represents the total carbon footprint

associated with production, D

is the product demand

at time t, and q

reused,i,t

, q

recycled,i,t

, q

disposed,i,t

represent

the quantities reused, recycled, and disposed for com-

ponent i at time t.

3.3.2 Circularity Contribution

The circularity contribution (CC

total

) reﬂects the pro-

portion of materials successfully reused or recycled in

the supply chain. It is calculated as:

total

∑

t=1

∑

i=1



reused,i,t

+ q

recycled,i,t



∑

t=1

, (16)

where

reused,i,t

is the quantity of component i reused at time t,

recycled,i,t

is the quantity of component i recycled at time t,

is the total product returns at time t.

3.3.3 Total Economic Cost

The total economic cost (C

total

) is the sum of the pro-

duction costs and the costs associated with reuse, re-

cycling, and disposal actions. It is given by:

total

= C

prod

∑

t=1

reused

recycled

disposed

(17)

ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems

where

reused

∑

t=1

∑

i=1

reused,i,t

· reuse cost

recycled

∑

t=1

∑

i=1

recycled,i,t

· recycle cost

disposed

∑

t=1

∑

i=1

disposed,i,t

· dispose cost

where C

prod

represents the total production

cost, and q

reused,i,t

, q

recycled,i,t

, q

disposed,i,t

represent the

quantities reused, recycled, and disposed for compo-

nent i at time t.

3.4 Proposed Algorithm for Solving

RL-Based Circular Supply Chain

Optimization Model

In the previous sections, we described and mod-

eled the circular supply chain (CSC) problem in

the context of a reinforcement learning (RL) frame-

work. In this part, we propose an algorithm for solv-

ing the modeled problem using a Q-learning mecha-

nism, which is a temporal difference method widely

used for solving RL problems (Watkins, 1989). The

proposed algorithm aims to optimize EoL decision-

making by learning the value of Q-functions itera-

tively. After the learning process, the best action for

each component—whether to reuse, recycle, or dis-

pose—is selected as the optimal policy for future de-

cisions.

As shown in Algorithm 1, the system is simu-

lated over multiple episodes and periods, where the

Q-values Q(s, a) are learned and updated during the

iterations. During each episode, the system tracks the

product returns, and for each component in the sup-

ply chain, one of the three EoL decisions (Reuse, Re-

cycle, or Dispose) is selected based on the Q-table.

In each state, the system selects an action, observes

the reward (a function of environmental impact, eco-

nomic cost, and circularity contribution), and updates

Q(s, a) accordingly.

The probability of exploration is a function of

the episode number and is reduced as the number

of episodes increases, linearly decreasing from 100%

exploration (random actions) in the ﬁrst episodes to

1% in the ﬁnal episodes. The exploration rate is grad-

ually decreased using a decay factor ε, promoting the

balance between exploring new strategies and exploit-

ing the learned Q-values for optimal actions. This en-

sures that the algorithm explores various EoL strate-

gies in the early phases but converges to the most re-

warding policies over time.

Initialize:

Q-values Q(s, a) = 0 for all states s and

actions a

Set learning rate α, discount factor γ,

exploration rate ε, total production carbon

footprint c

prod

f p

, number of episodes n

episodes

Deﬁne action space

A = {Reuse, Recycle, Dispose}

Initialize inventory levels x

i,t

for all components i

while episode ≤ n

episodes

for each period t = 1, . . . , T do

for each component i = 1, . . . , n do

if Random exploration (ε) then

Select a random action a ∈ A;

else

Select action

a = argmax

Q(s, a) based on

Q-values;

end

Observe the next state s

′

and

calculate the rewards;

Combine rewards using the weighted

sum of environmental, economic,

and circularity factors:

r(s, a) = w

env

· r

env

(s, a) + w

eco

· r

eco

(s, a)

+ w

circ

· r

circ

(s, a)

(18)

Update the Q-value using the

Q-learning update rule:

Q(s, a) ← Q(s, a) + α



r(s, a) + γmax

′

Q(s

′

, a

′

)

− Q(s, a)



(19)

Update the inventory level x

i,t

adjusting for reused, recycled, and

disposed quantities for the

component;

end

Decrease exploration rate

ε ← max(0.01, ε × 0.99);

Increment episode count;

end

Strategic Planning:

After training, for each component i, retrieve the

best action a = argmax

Q(s, a) and update

inventory levels accordingly. Evaluate

system-wide metrics;

return Optimal Q-values for each component

Algorithm 1: Q-learning for Circular Supply Chain Opti-

mization.

The reward function during the simulation is com-

puted using a weighted combination of environmental

impact, economic costs, and circularity contribution.

For each state-action pair, the cumulative reward is

calculated, and the Q-function is updated iteratively.

Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A

Multi-Objective Approach

The Q-value update is based on the learning rule:

Q(s, a) ← Q(s, a) + α [r(s, a) + γ max

′

Q(s

′

, a

′

) − Q(s, a)]

(20)

where α is the learning rate, r(s, a) is the combined

reward, and γ is the discount factor that balances fu-

ture and immediate rewards.

Initially, the agents (components) have no knowl-

edge of the value of each action in every state. Hence,

all Q-values are set to zero for all state-action pairs.

The learning rate α determines how much weight to

give the most recent reward compared to the existing

Q-estimate. A suitable learning rate must be selected

to ensure convergence of the algorithm without over-

ﬁtting to speciﬁc samples.

The objective of this model is to maximize cir-

cularity and minimize both the total carbon footprint

and the economic cost of the circular supply chain.

Therefore, the reward function is structured to encour-

age reuse and recycling while penalizing disposal and

excessive carbon emissions. The Q-learning process

continues until the Q-values converge for all state-

action pairs. After the learning phase, the best action

in each state can be retrieved through a greedy search

on the Q-table, which then informs the EoL decision

for each component in the circular supply chain.

4 RESULTS & ANALYSIS

4.1 Industrial Case Study

In this section, we present an application of the de-

veloped model, based on a real industrial case from

our partner, GE HealthCare. GE HealthCare (GEHC)

is a global leader in the sales and services of medical

systems, particularly in the ﬁeld of medical imaging,

with over 4 million systems installed across more than

160 countries. Due to the critical nature of its prod-

ucts (medical devices) and the technological com-

plexity of their components, GE HealthCare imple-

ments a circular supply chain strategy through asset

recovery and buy-back programs.

In 2023, these initiatives resulted in the recov-

ery of approximately 7,375 units, including imaging

systems, ultrasound devices, magnets, and surgical

machines, contributing to the reuse of approximately

7.3 million kilograms of materials (GE Healthcare,

2023). The integration of circular economy princi-

ples into GE HealthCare’s operations and product life

cycles is a key approach to managing climate impact,

reducing waste, promoting recycling and reuse, and

minimizing resource consumption.

In this study, we apply the developed model to

a device consisting of 53 components and investi-

gate the impact of EoL management on the efﬁciency

of the circular supply chain (CSC). To demonstrate

the beneﬁts of this approach, we evaluate the per-

formance of the proposed Q-learning model across

several scenarios, assessing its effectiveness in opti-

mizing CSC outcomes. These scenarios are designed

to explore the inﬂuence of varying reuse and recy-

cle strategies on CSC circularity, cost reduction, and

carbon footprint reduction. Additionally, the perfor-

mance of these strategies is compared to a baseline

linear supply chain scenario, where no circular strate-

gies are implemented.

The goal of the analysis is to reveal the gap be-

tween design circularity—the ideal circularity based

on product design feasibility—and CSC circular-

ity—the actual circularity achieved when considering

operational constraints in the supply chain.

Figure 4: Performance metrics across different scenarios,

showing the CSC circularity (by quantity), cost reduction,

and carbon footprint reduction.

Table 1 and Figure 4 present the results for four

scenarios. In each case, we measure CSC circularity

by quantity, along with the total cost reduction and

carbon footprint reduction.

4.1.1 Scenario 1: Baseline - No Reuse, No

Recycle

Scenario 1 represents the baseline case where no

reuse or recycling actions are taken. All returned

components are disposed of, resulting in zero circu-

larity. This scenario mimics a traditional linear supply

chain where no efforts are made to recover or recycle

products at the end of life. As expected, this results

in no circularity, no cost savings, and no carbon foot-

print reduction.

4.1.2 Scenario 2: Partial Reuse Feasibility

In Scenario 2, 41% of the product’s components

(by design feasibility) are deemed reusable based on

product design. However, the actual reuse success

ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems

Table 1: Performance metrics for the four scenarios, showing the achieved CSC circularity, cost, and carbon footprint reduc-

tions.

Scenario CSC Circularity (Quantity) Cost Reduction Carbon Footprint Reduction

1 0.0% 0.0% 0.0%

2 34.5% 18.4% 13.8%

3 80.6% 45.0% 56.9%

4 81.1% 45.1% 69.1%

rates, drawn from historical data, indicate that not all

designed reusable components can actually be reused

in the CSC. Despite this, the reuse operations suc-

ceed in recovering 34.5% of the product’s compo-

nents, contributing to CSC circularity.

However, this scenario highlights the gap between

design circularity—which indicates 41% reuse feasi-

bility—and the achieved CSC circularity of 34.5%.

The operational constraints and quality issues in the

supply chain cause a drop in the actual circularity

achieved. Similarly, there is a modest 18.4% cost

reduction and a 13.8% reduction in carbon footprint,

showing that even with partial reuse, signiﬁcant sav-

ings can be realized.

4.1.3 Scenario 3: Advanced Reuse and

Recycling

In Scenario 3, 83% of the product’s components are

designed to be reusable or recyclable. Furthermore,

recycling is introduced for components where the pro-

cess is feasible, with most recyclable components

having a success rate of 100%. However, for one crit-

ical and rare component, ID 16, the recycling success

rate is only 70%, reﬂecting operational difﬁculties.

This scenario achieves a substantial 80.6% CSC

circularity by quantity. This again highlights the

gap between design circularity and CSC circular-

ity, driven by the limitations of recycling for cer-

tain components. The impact on cost and environ-

mental performance is notable, with a 45.0% cost re-

duction and a 56.9% reduction in carbon footprint.

This demonstrates that recycling, even with opera-

tional constraints, provides substantial beneﬁts in cir-

cular supply chain management.

4.1.4 Scenario 4: Improved Recycling for

Component ID 16

In Scenario 4, the only difference from Scenario 3 is

that the recycling success rate for Component ID 16

is increased to 100%. This small improvement results

in a noticeable rise in both circularity and environ-

mental performance. CSC circularity by quantity in-

creases to 81.1%, and the carbon footprint reduction

improves to 69.1%. This scenario demonstrates how

addressing operational constraints, even for a single

component, can signiﬁcantly improve circularity and

reduce environmental impact.

4.1.5 The Gap Between Design Circularity and

CSC Circularity

The scenarios reveal an important insight: the gap

between design circularity and CSC circularity.

While product design plays a crucial role in de-

termining circularity potential, the actual circularity

achieved in the CSC is constrained by operational and

quality issues. These constraints, represented by suc-

cess rates for reuse and recycling, prevent the full re-

alization of circularity potential in the supply chain.

In Scenario 3, for example, the design circularity

suggests that 83% of the product’s components can

be reused or recycled, but the CSC circularity is only

80.6%. Scenario 4 closes this gap slightly, but only by

addressing operational constraints on recycling suc-

cess rates. This illustrates that optimizing the circular

supply chain involves not only design improvements

but also a focus on addressing real-world operational

challenges that hinder circularity.

4.2 Limitation

One limitation of this study is the circularity metric

used to assess the performance of the circular supply

chain (CSC). In this analysis, the metric for CSC cir-

cularity is based on the quantity of components reused

and recycled compared to what has been returned.

This approach assumes that reuse and recycling con-

tribute equally to circularity, which is not always the

case. In practice, the reuse process typically does not

require virgin materials, while recycling may some-

times necessitate the addition of virgin materials to

meet product speciﬁcations and requirements.

This simpliﬁcation overlooks the varying

resource-saving potentials of reuse and recycling.

To provide a more accurate assessment of circu-

larity, future research should incorporate more

comprehensive circularity metrics that capture the

true resource-saving beneﬁts of reuse and recycling

actions.

Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A

Multi-Objective Approach

5 CONCLUSIONS &

PERSPECTIVES

This research presents a decision-making mechanism

for circular supply chains focusing on EoL man-

agement that integrates environmental, economic,

and circularity performance through Q-learning. By

leveraging historical success rates for reuse and re-

cycling actions, the model reﬂects the operational re-

alities of EoL management, contrasting the idealized

design circularity with actual circular supply chain

(CSC) performance. Our ﬁndings demonstrate that

even when components are designed for reuse or recy-

cling, operational constraints, such as quality issues,

can signiﬁcantly impact the realized circularity, car-

bon footprint reduction, and cost savings.

Through scenario analysis, we showed the trade-

offs between various EoL strategies and the sensi-

tivity of circularity and sustainability outcomes to

component-speciﬁc success rates. The model ad-

dresses the decision-making challenge of whether to

reuse, recycle, or dispose of returned components in

a circular supply chain, providing a robust framework

for managing EoL in a sustainable and cost-effective

manner.

However, the cost modeling in this study assumes

simpliﬁed scenarios. Future work should reﬁne these

cost assumptions to better reﬂect real-world com-

plexities and explore other optimization techniques,

such as multi-objective evolutionary algorithms (e.g.,

NSGA-III or genetic algorithms), to better explore

the solution space and identify trade-offs between

different objectives. Additionally, while the model

currently evaluates environmental impact primarily

through carbon footprint reduction, future studies

should incorporate a broader range of impact cate-

gories to provide a more comprehensive environmen-

tal analysis.

Moreover, the potential for integrating more com-

prehensive circularity metrics to capture the resource-

saving beneﬁts of circular supply chains should be ex-

plored to further enhance decision-making in circular

supply chain management.

REFERENCES

Abaku, E. A. and Odimarha, A. C. (2024). Sustainable sup-

ply chain management in the medical industry: a theo-

retical and practical examination. International Med-

ical Science Research Journal, 4(3):319–340.

Agyemang, M., Kusi-Sarpong, S., Khan, S. A., Mani, V.,

Rehman, S. T., and Kusi-Sarpong, H. (2019). Drivers

and barriers to circular economy implementation: An

explorative study in pakistan’s automobile industry.

Management Decision, 57(4):971–994.

Bouzon, M., Govindan, K., and Rodriguez, C. M. T. (2018).

Evaluating barriers for reverse logistics implementa-

tion under a multiple stakeholders’ perspective analy-

sis using grey decision making approach. Resources,

conservation and recycling, 128:315–335.

Bressanelli, G., Perona, M., and Saccani, N. (2019).

Challenges in supply chain redesign for the circu-

lar economy: a literature review and a multiple case

study. International Journal of Production Research,

57(23):7395–7422.

de Sousa Jabbour, A. B. L., Luiz, J. V. R., Luiz, O. R., Jab-

bour, C. J. C., Ndubisi, N. O., de Oliveira, J. H. C., and

Junior, F. H. (2019). Circular economy business mod-

els and operations management. Journal of cleaner

production, 235:1525–1539.

D’Alessandro, C., Szopik-Depczy

nska, K., Tarczy

nska-

Łuniewska, M., Silvestri, C., and Ioppolo, G. (2024).

Exploring circular economy practices in the health-

care sector: A systematic review and bibliometric

analysis. Sustainability, 16(1):401.

Farooque, M., Zhang, A., Th

urer, M., Qu, T., and Huis-

ingh, D. (2019). Circular supply chain management:

A deﬁnition and structured literature review. Journal

of cleaner production, 228:882–900.

GE Healthcare (2023). Sustainability Report.

https://www.gehealthcare.com/-/jssmedia/

gehc/us/ﬁles/about-us/sustainability/reports/

ge-healthcare-sustainability-report-2023.pdf?rev=-1.

Accessed: 2024-09-24.

Genovese, A., Acquaye, A. A., Figueroa, A., and Koh, S. L.

(2017). Sustainable supply chain management and the

transition towards a circular economy: Evidence and

some applications. Omega, 66:344–357.

Govindan, K. and Hasanagic, M. (2018). A systematic re-

view on drivers, barriers, and practices towards cir-

cular economy: a supply chain perspective. Interna-

tional Journal of Production Research, 56(1-2):278–

311.

Goyal, S., Esposito, M., and Kapoor, A. (2018). Circular

economy business models in developing economies:

lessons from india on reduce, recycle, and reuse

paradigms. Thunderbird International Business Re-

view, 60(5):729–740.

Halse, L. L. and Jæger, B. (2019). Operationalizing industry

4.0: Understanding barriers of industry 4.0 and circu-

lar economy. In Advances in Production Management

Systems. Towards Smart Production Management Sys-

tems: IFIP WG 5.7 International Conference, APMS

2019, Austin, TX, USA, September 1–5, 2019, Pro-

ceedings, Part II. Springer.

Han, J., Ijuin, H., Kinoshita, Y., Yamada, T., Yamada, S.,

and Inoue, M. (2021). Sustainability assessment of

reuse and recycling management options for end-of-

life computers-korean and japanese case study analy-

sis. Recycling, 6(3):55.

Hasegawa, S., Kinoshita, Y., Yamada, T., and Bracke, S.

(2019). Life cycle option selection of disassembly

parts for material-based co2 saving rate and recovery

ICORES 2025 - 14th International Conference on Operations Research and Enterprise Systems

cost: Analysis of different market value and labor cost

for reused parts in german and japanese cases. Inter-

national Journal of Production Economics, 213:229–

242.

Herczeg, G., Akkerman, R., and Hauschild, M. Z. (2018).

Supply chain collaboration in industrial symbiosis

networks. Journal of cleaner production, 171:1058–

1067.

Hoveling, T., Nijdam, A. S., Monincx, M., Faludi, J., and

Bakker, C. (2024). Circular economy for medical de-

vices: Barriers, opportunities and best practices from

a design perspective. Resources, Conservation and

Recycling, 208:107719.

Khodier, A., Williams, K., and Dallison, N. (2018). Chal-

lenges around automotive shredder residue production

and disposal. Waste Management, 73:566–573.

Kinoshita, Y., Yamada, T., Gupta, S. M., Ishigaki, A.,

and Inoue, M. (2016). Disassembly parts selection

and analysis for recycling rate and cost by goal pro-

gramming. Journal of Advanced Mechanical Design,

Systems, and Manufacturing, 10(3):JAMDSM0052–

JAMDSM0052.

Kirchherr, J., Yang, N.-H. N., Schulze-Sp

untrup, F.,

Heerink, M. J., and Hartley, K. (2023). Conceptual-

izing the circular economy (revisited): an analysis of

221 deﬁnitions. Resources, Conservation and Recy-

cling, 194:107001.

Kravchenko, M., Pigosso, D. C., and McAloone, T. C.

(2019). Towards the ex-ante sustainability screening

of circular economy initiatives in manufacturing com-

panies: Consolidation of leading sustainability-related

performance indicators. Journal of Cleaner Produc-

tion, 241:118318.

Lahane, S., Kant, R., and Shankar, R. (2020). Circular

supply chain management: A state-of-art review and

future opportunities. Journal of Cleaner Production,

258:120859.

Mangla, S. K., Luthra, S., Mishra, N., Singh, A., Rana,

N. P., Dora, M., and Dwivedi, Y. (2018). Barriers to

effective circular supply chain management in a devel-

oping country context. Production Planning & Con-

trol, 29(6):551–569.

Mishra, J. L., Hopkinson, P. G., and Tidridge, G. (2018).

Value creation from circular economy-led closed loop

supply chains: a case study of fast-moving consumer

goods. Production Planning & Control, 29(6):509–

521.

Nasir, M. H. A., Genovese, A., Acquaye, A. A., Koh, S.,

and Yamoah, F. (2017). Comparing linear and circu-

lar supply chains: A case study from the construction

industry. International Journal of Production Eco-

nomics, 183:443–457.

NHS (2022). Delivering a ‘net zero’ national

health service. https://www.england.nhs.uk/

greenernhs/wp-content/uploads/sites/51/2022/07/

B1728-delivering-a-net-zero-nhs-july-2022.pdf.

Accessed: 2024-09-24.

Nicholson, A. L., Olivetti, E. A., Gregory, J. R., Field, F. R.,

and Kirchain, R. E. (2009). End-of-life lca allocation

methods: Open loop recycling impacts on robustness

of material selection decisions. In 2009 IEEE interna-

tional symposium on sustainable systems and technol-

ogy, pages 1–6. IEEE.

Pichler, P.-P., Jaccard, I. S., Weisz, U., and Weisz, H.

(2019). International comparison of health care

carbon footprints. Environmental research letters,

14(6):064004.

Ranta, V., Aarikka-Stenroos, L., Ritala, P., and M

akinen,

S. J. (2018). Exploring institutional drivers and bar-

riers of the circular economy: A cross-regional com-

parison of china, the us, and europe. Resources, Con-

servation and Recycling, 135:70–82.

Romanello, M., Di Napoli, C., Green, C., Kennard, H.,

Lampard, P., Scamman, D., Walawender, M., Ali, Z.,

Ameli, N., Ayeb-Karlsson, S., et al. (2023). The 2023

report of the lancet countdown on health and climate

change: the imperative for a health-centred response

in a world facing irreversible harms. The Lancet,

402(10419):2346–2394.

Rosa, P., Sassanelli, C., and Terzi, S. (2019). Circular busi-

ness models versus circular beneﬁts: An assessment

in the waste from electrical and electronic equipments

sector. Journal of cleaner production, 231:940–952.

Roy, T., Garza-Reyes, J. A., Kumar, V., Kumar, A., and

Agrawal, R. (2022). Redesigning traditional linear

supply chains into circular supply chains–a study into

its challenges. Sustainable Production and Consump-

tion, 31:113–126.

Tura, N., Hanski, J., Ahola, T., St

ahle, M., Piiparinen, S.,

and Valkokari, P. (2019). Unlocking circular busi-

ness: A framework of barriers and drivers. Journal

of cleaner production, 212:90–98.

Vermunt, D. A., Negro, S. O., Verweij, P. A., Kuppens,

D. V., and Hekkert, M. P. (2019). Exploring barri-

ers to implementing different circular business mod-

els. Journal of cleaner production, 222:891–902.

Watkins, C. J. C. H. (1989). Learning from delayed rewards.

Winkler, H. (2011). Closed-loop production systems—a

sustainable supply chain approach. CIRP Journal

of Manufacturing Science and Technology, 4(3):243–

246.

Enhancing Circularity in Medical Device Supply Chains by Optimizing EoL Decisions Through Reinforcement Learning: A

Multi-Objective Approach