Machine Learning-Driven Classiﬁcation of Polyethylene (HDPE, LDPE)

via Raman Spectroscopy

Evangelos Stergiou

, Fotios K. Konstantinidis

, C. Stefani

, G. Arvanitakis

Georgios Tsimiklis

and Angelos Amditis

Institute of Communication and Computer Systems, National Technical University of Athens, 9 Iroon. Polytechniou Str.,

Zografou Athens GR-157 73, Greece

{vangelis.stergiou, fotios.konstantinidis, christina.stefani, giorgos.arvanitakis, georgios.tsimiklis, angelos.amditis}@iccs.gr

Keywords:

Raman Spectroscopy, Gradient Boosting, Polyethylene, Classiﬁcation, Machine Learning.

Abstract:

Polymer industries are currently focusing on developing new methods for identifying Polyethylene (PE) cate-

gories through rapid and non-destructive characterization techniques (NDT) to improve their production pro-

cesses or recycling process control. However, NDT for classiﬁcation is challenging for PE categories due to

their identical chemical structures. This work presents a data-driven method for classifying PE to its two main

categories, Low-Density Polyethylene (LDPE) and High-Density Polyethylene (HDPE). The method is using

Raman spectroscopy, with the spectrums being processed to select the features, which are decisive for the

classiﬁcation of the different types of PE. PE samples in the form of granules are subjected to spectroscopic

measurements, followed by data pre-processing in order for the signals to be enhanced. Using a Gradient

Boosting model, the selected spectral features were used to train and validate the model. The model achieved

an accuracy rate of 97 %, indicating the potential of the proposed method for rapid and accurate separation of

LDPE and HDPE. This performance is not limited to PE granules but also to different plastic types (e.g. ﬁlm,

bottles, etc.). This approach offers a rapid method to classify polyethylene types, making the method suitable

for industrial uses.

1 INTRODUCTION

PE is one of the most widely used polymers in the

industry and the applications of this polymer range

from packaging products to manufacturing materials

for the automotive industry. A critical factor that af-

fects the properties of polyethylene, such as strength,

ﬂexibility, breathability, water vapor transmission,

etc. is its density, which can be classiﬁed into low

(LDPE) and high (HDPE). However, the classiﬁcation

of these two types of polyethylene is a challenging

topic, as they present similar characteristics in their

macromolecular structure and the structure of their

functional groups, with the main difference being the

branches in the polymer chains (Bruns, 2022).

These types of polymers consist of repeating

(−CH

−) units (methylene groups) and an alkyl

https://orcid.org/0009-0009-5922-0757

https://orcid.org/0000-0002-1826-6582

https://orcid.org/0000-0002-8818-075X

https://orcid.org/0000-0003-2414-6891

https://orcid.org/0000-0002-2431-8529

https://orcid.org/0000-0002-4089-1990

polymer that usually does not contain side chains. In

the case of LDPE, however, there are more side chain

branches in its polymer chains, consisting of methyl

groups (−CH

−) (Vollmert, 2012). These branches

interfere with the regular arrangement of the polymer

chains, causing a decrease in crystallinity and thus a

lower density. In contrast, HDPE exhibits less branch-

ing, allowing the chains to organize more regularly

and form high-density crystalline regions Figure 1.

These structural differences make their classiﬁcation

in the manner of density difﬁcult with traditional char-

acterization methods (Baxter et al., 2020)

Accurate identiﬁcation of PE density is critical

in the production processes as it affects the physico-

chemical properties of the material, such as mechan-

ical strength, thermal stability, and chemical resis-

tance (Konstantinidis et al., 2023a). The ability to

quickly and accurately classify between LDPE and

HDPE is critical for quality assurance and select-

ing appropriate processing procedures. Nowadays,

the density calculation of polyethylene is mainly car-

ried out by destructive characterization methods or

a combination of methods that require repeatability

326

Stergiou, E., Konstantinidis, F. K., Stefani, C., Arvanitakis, G., Tsimiklis, G. and Amditis, A.

Machine Learning-Driven Classiﬁcation of Polyethylene (HDPE, LDPE) via Raman Spectroscopy.

DOI: 10.5220/0013254000003905

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 326-334

ISBN: 978-989-758-730-6; ISSN: 2184-4313

and time which is a deterrent for a production line.

Thus, developing a quick and non-destructive tech-

nique with high performance in categorizing these

materials will signiﬁcantly contribute to the simpliﬁ-

cation of the quality control process and the efﬁcient

management of raw materials (Shebani et al., 2018).

Figure 1: Chemical Structures of LDPE and HDPE.

One of the most signiﬁcant and widely used an-

alyzing technique is Raman Spectroscopy (Katha-

rina Eberhardt and Popp, 2015). Raman spectroscopy

offers a non-destructive and reliable tool to analyze

the molecular structures of polymers, providing infor-

mation about the vibrational states of the molecules.

Although spectral signatures and especially Raman

Spectrums of LDPE and HDPE are similar due to

their common chemical composition, there are slight

differences in the spectra related to the presence of

methyl groups and the degree of branching.

This work proposes an innovative approach to

classify polyethylene based on its structure (macro-

molecules) density, using Raman spectroscopy com-

bined with a machine learning model, which an ad-

vanced machine learning model can detect. The main

objectives of this classiﬁcation approach can be dis-

tinguished into three key axes:

• The introduction of a rapid and non-destructive

classiﬁcation approach based on Raman spec-

troscopy, capable of classifying the PE materials

into two main categories (LDPE, HDPE).

• The exploitation of an advanced machine learn-

ing model which is trained on a limited volume

of pure PE granules analyzed data from Raman

spectroscopy.

• The sensitive analysis regarding different param-

eters of the problem (e.g. different measurement

times, forms of PE plastics), with high prediction

accuracies.

This approach provides an automated, fast, and

accurate way to separate the two types of PE, facilitat-

ing both the production process and the identiﬁcation

of the material in various applications.

2 RELATED WORKS

The classiﬁcation of different types of polyethylene is

a major challenge for research, as the differences in

their structure are often too subtle to be detected by

traditional methods. Many researchers work mainly

focus on the quantiﬁcation of polyethylene blends

(blends) rather than the categorization of pure materi-

als and with a limited amount of samples, which indi-

cates a gap in the existing literature.

Silva and Wiebeck suggest the use of Raman spec-

troscopy combined with the CARS-PLS (Competi-

tive Adaptive Reweighted Sampling - Partial Least

Squares) technique to quantify LDPE and HDPE as

blends but not for classiﬁcation of different PE mate-

rials based on their density. This method shows re-

liable results with prediction errors for LDPE con-

centration. Despite the difﬁculty in LDPE/HDPE

discrimination, the researchers were able to improve

the accuracy of the predictions through modiﬁca-

tions to the model parameters. This work demon-

strates the potential of Raman spectroscopy combined

with chemometric techniques to accurately categorize

polymer mixtures. Although this approach achieved

some good results, there is a limitation in the number

of samples that were been used and also this study

focused on blends of PE (Silva and Wiebeck, 2019).

In addition, the majority of related research

chooses to combine more than one spectroscopic

technique, such as Raman spectroscopy, ATR-FTIR,

and NIR, to improve the separation performance and

increase the accuracy of the prediction models. The

classiﬁcation of HDPE and LDPE is made more ex-

pensive and resource-intensive by this combination of

methods.

This is also reﬂected in a previous study of Silva

and Wiebeck (da Silva and Wiebeck, 2017) accord-

ing to which it’s used Raman and FTIR spectroscopic

data combined with variants of the PLS method, such

as iPLS and siPLS, to improve the prediction perfor-

mance of LDPE/HDPE blends. However, the need to

use multiple techniques demonstrates the challenge of

distinguishing polymers with a single technology.

Similarly, the study of (Sato et al., 2002), focuses

on the use of Raman spectroscopy combined with an-

alytical multivariate analysis techniques (such as Dif-

ferential Calomitery Spectrometry) to characterize the

physical properties of HDPE, LDPE, and linear low

density (LLDPE) polyethylene. The study aimed to

better understand the spectral behavior of the three

types of polyethylene and to develop models for their

density, crystallinity and melting point. The authors

used principal component analysis (PCA) and partial

least squares (PLS-1) regression on Raman spectro-

Machine Learning-Driven Classiﬁcation of Polyethylene (HDPE, LDPE) via Raman Spectroscopy

327

scopic data derived from LDPE, HDPE and LLDPE

samples and a combination of characterization tech-

niques and a limited amount of samples.

In other works (Konstantinidis et al., 2023c),

(Konstantinidis et al., 2023b), detailed solutions for

the classiﬁcation of waste materials in an industrial

case (sorting systems) are presented, the sorting of

the entire range of plastics is done using multispec-

tral imaging data in tandem use of AI-driven solutions

Moreover in (Sifnaios et al., 2024) publiction,

a light-weight model is introduced for pixel-level

classiﬁcation of Hyperspectral images of plastics.The

aforementioned studies introduce innovative methods

in material sorting however, fail to address the classi-

ﬁcation of individual categories of PE, i.e. LDPE and

HDPE.

Accordingly, the study by Workman Jr (1999) in-

corporated Raman, NIR, and IR spectroscopy data for

the quantitative analysis of LDPE, LLDPE and HDPE

blends, achieving high accuracy (1-5 absolute error).

The results showed that the combined approach of-

fers greater accuracy than using individual techniques

due to the different spectral regions analyzed by each

method (Workman Jr, 1999).

In contrast to the above studies, the present study

focuses on the classiﬁcation of pure types of PE

(LDPE and HDPE) without the need of combin-

ing different characterization techniques or complex

statistics methods to identify critical factors for the

classiﬁcation. Furthermore, in this method, a high

amount of samples were used for better accuracy and

validation metrics. The method is based solely on

Raman spectroscopy, while the use of the Gradient

Boosting machine learning model allows accurate cat-

egorization of pure materials based on the spectrum

peaks. This reduces the complexity of the method

and makes the identiﬁcation process more direct and

easier to use in quality control and production appli-

cations. In this way, our approach aims to develop a

fast, automated, and non-destructive system, capable

of separating pure polyethylene types with high pre-

cision, without the use of multiple techniques or an-

alytical methods, and ready to be adapted to various

processes of PE.

3 METHODOLOGY

The methodology followed in the present study is a

multidimensional approach combining Raman spec-

troscopy and machine learning to accurately and

quickly classify PE samples. Raman spectroscopy

was chosen because of its ability to probe the vibra-

tional states of molecules, allowing analysis of the

crystalline and amorphous regions of polymers in a

rapid and efﬁcient NDT way. First, Raman spec-

troscopy measurements were performed on LDPE

and HDPE samples to obtain spectral data. These

data were pre-processed to remove noise and improve

their quality. Then, during the feature selection pro-

cess, the most important peaks that differentiate the

two PE categories were selected. These selected fea-

tures were used to train a Gradient Boosting machine

learning model, which was evaluated through metrics

such as Accuracy. The proposed methodology is de-

scribed in detail below, focusing on the optimization

of each step and the efﬁciency of the ﬁnal sample clas-

siﬁcation Figure 2.

3.1 Data Generation

3.1.1 Samples

In the present study, PE samples with low and high

densities were used for Raman spectrum analyses.

The samples were in the form of pellets (granules)

and consisted of pure LDPE and HDPE.

3.1.2 Spectroscopic Technique

For the spectroscopic analysis of the samples, the

Handheld Raman Spectroscope C15471 by HAMA-

MATSU was used. Measurements were performed

on a total of 400 samples, of which 200 were LDPE

and 200 HDPE. Regarding the parameters of mea-

surements, the Measurement time (M t) was from 1

to 3 seconds and the Distance (D) was chosen based

on the sample holder of the device. The Power (P) of

monochromatic radiation was 50 mW with a wave-

length of 785 nm. In general, the measurements

were performed with different recording times and

constant settings in rest parameters to assess the effect

of the measurement time on the quality of the spectra.

About the (M t) where used:

• 50 LDPE and 50 HDPE samples—were analyzed

with a (M t) of 1 seconds.

• 100 LDPE and 100 HDPE samples—were ana-

lyzed with a (M t) of 2 second.

• 50 LDPE and 50 HDPE samples—were analyzed

with a (M t) of 3 seconds.

The output of each measurement was spectrums

in csv form. During each spectroscopic measurement,

all spectrometer parameters remained constant to en-

sure the uniformity of the spectra and the reliability

of the results.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

328

Figure 2: The Data-Driven Method Process.

3.2 ML Pipeline

3.2.1 Data Pre-Processing

The spectroscopic data were pre-processed to im-

prove the quality of the spectra and remove unwanted

distortions. Speciﬁcally, the following techniques

were applied:

Background Removal (Baseline Correction):

Used to eliminate the unwanted background com-

ponent in the spectra, ensuring that the peak

characteristics of the samples remained unchanged

and the information from the spectrum was not

distorted. The mathematical relationship of baseline

correction can be described in Equation 1:

y c(x) = y(x)–Pol(x) , (1)

where y c(x) is the corrected spectrum after the sub-

station of the polynomial, y(x) is the raw spectrum,

and Pol(x) is the ﬁtting polynomial, which is sub-

tracted to eliminate the background component.

Noise Reduction: The Savitzky-Golay technique

(Gallagher, 2020), which applies a polynomial ﬁt to a

window of data, was used to reduce noise by 2m + 1

points, where m is the window size. The relationship

of smoothing to the Savitzky-Golay technique can be

described in Equation 2:

ˆy

∑

j=−m

· y

i+ j

, (2)

where ˆy

is the estimated value at the point y

i+ j

of the

smoothed spectrum, and the initial value of the spec-

trum at the point (i + j) and c

are the coefﬁcients of

the polynomial applied to each data window. The ﬁt-

ting polynomial used in the above methods was 3rd

degree, depending on the desired level of smoothing

and minimization of spectral peak distortion. The out-

put of these methods can be described in Figure 3.

3.2.2 Find Peaks

For the peaks identiﬁcation of each measurement

where used a function with speciﬁc parameters. This

Figure 3: Raman Spectrum Plots during data pre-

processing.

function takes a 1-D array and ﬁnds all local max-

ima by a simple comparison of neighboring values.

Optionally, a subset of these peaks can be selected

by specifying conditions for a peak’s properties. The

main parameter was the prominence as it was suitable

for this study, so the process of peak identiﬁcation for

each measurement be more reliable for the upcom-

ing data analysis. Prominence refers to how much a

peak stands out from others or from noise in the spec-

trum. The peaks selected must have speciﬁc intensity

according to the value of prominence to represent the

real characteristics of the polymer and not be due to

random ﬂuctuations.

Figure 4: The Selection of Peaks on the Sets of LDPE and

HDPE.

Machine Learning-Driven Classiﬁcation of Polyethylene (HDPE, LDPE) via Raman Spectroscopy

329

3.2.3 Peak Sets

After the Pre-Processing of spectrums for both cate-

gories, the data were summarized in the form of Table

2, and Table 3. These tables show the possibility of

appearance for each identiﬁed Peak

at each category

of PE. For a given observation i that belongs to the

category c of the PE, Peak

corresponds to the value

of wavenumber

(cm

−1

) where the peak is spotted in

the corresponding PE category (c), and N c is the to-

tal number of observation that belongs in the corre-

sponding category (c). Count

is the total appearance

of the peak

in the entire class c for each category and

Possibility

the possibility of the appearance of peak

as Equation 3:

Possibility

Count(Peak

)

N c

. (3)

Unique peaks were observed in each category of

PE, however there was of low possibility of appear-

ance. On the contrary, the common peaks that were

found were those with the highest possibility of ap-

pearing in each category, which proves the difﬁculty

of the classiﬁcation in these two types of PE. Finally,

the sets of peaks for each category could be present as

Ven’s Diagram on Figure 4.

3.2.4 Features Selection

The selection of features was based on the possibil-

ity of the appearance of the peaks in each type of PE.

More speciﬁcally after the analysis of the spectrums

for both categories, we summarized the data in the

form of Table 2 and Table 3, the peaks selected as

features (F) for the model met speciﬁc separation cri-

teria between the HDPE and LDPE classes, based on

Equation 4.

F = (A − B) ∪ (B − A) ∪

{

x : x ∈ A ∩ B & |P(x)| > T

}

(4)

where:

• A: The set of HDPE peaks.

• B: The set of LDPE peaks.

• A − B: The set of peaks that are unique to HDPE,

meaning peaks that exist in A but do not appear in

B. This subset captures peaks exclusive to HDPE

samples.

• B − A: The set of peaks that are unique to LDPE,

meaning peaks that exist in B but not in A. This

subset captures peaks exclusive to LDPE samples.

• A ∩ B: The intersection of sets A and B, represent-

ing the peaks that are common to both HDPE and

LDPE samples.

• P(x): The absolute difference in the probability

of appearance of a common peak x between the

two categories, HDPE and LDPE. Formally, if

HDPE

(x) and P

LDPE

(x) represent the probability

of appearance of peak x in HDPE and LDPE re-

spectively, then Equation 5:

P(x) =

HDPE

(x) − P

LDPE

(x)

. (5)

This value is used to determine if the difference in

occurrence between the two categories is signiﬁ-

cant based on the threshold T .

• T : The threshold for the probability difference.

Only peaks where |P(x)| > T are considered sig-

niﬁcant for inclusion in the set of common peaks.

• {x : x ∈ A ∩ B & |P(x)| > T }: The set of common

peaks between HDPE and LDPE that have an ab-

solute difference in the possibility of appearance

greater than the threshold T . This subset ﬁlters

the intersection of A and B to include only peaks

with a statistical difference in appearance.

As described before, the classiﬁcation of PE us-

ing Raman spectroscopy and based on the peaks de-

tection is a challenge, as the two materials have the

same chemical composition. Their main difference

lies in the structure of the polymer chains, such as

branching, which affects their physical information

but is not always apparent in spectral analyses. Due

to this structural similarity, many common peaks ap-

pear in the spectra of both types of PE, as both ma-

terials are composed of the same molecular groups,

such as methylene groups (-CH

-). The appearance

of these common peaks makes it difﬁcult to directly

classify the two types of polyethylene based solely on

their spectral signatures. For this reason, it becomes

necessary to identify peaks where there is a speciﬁc

difference (Threshold) in their probability of occur-

rence between LDPE and HDPE. About the Thresh-

old value, this was applied to the difference in the

probability of a peak occurring between LDPE and

HDPE as the relation (4). Speciﬁcally, the selection

of features was based on the relationship These peaks

can serve as the key features to accurately classify the

two materials, offering a more reliable basis for their

separation by Raman spectroscopy.

After the analysis, the resulting optimal values for

the feature selection were: Threshold = 30%: This

value ensures that only peaks that have a distinct dif-

ference in probability of occurrence between LDPE

and HDPE are selected. Prominence = 100: This

prominence ensures that only peaks with signiﬁcant

intensity, which stand out from the noise, are se-

lected as features. These values led to the selection

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

330

of the most distinctive peaks, which helped to op-

timize the performance of the model, ensuring high

accuracy and reliability in classiﬁcation. The selec-

tion of features based on the two parameters, thresh-

old and prominence, proved to be decisive for the per-

formance of the model. The threshold value at 30%

allowed the selection of peaks that effectively differ-

entiate LDPE from HDPE while the correct choice

of prominence ensured that only peaks representing

signiﬁcant features of the spectra were used and not

random noise or meaningless peaks.

Finally, the form of the selected features is de-

scribed in Table 1 where 0 is picked if the speciﬁc

peak does not exist in the ﬁle of the row and 1 if ex-

ists.

Table 1: Selected Features for the Training of ML Model.

F List: A B A ∩ B Category

sample: 0 or 1 HDPE or LDPE

3.3 ML Model

Gradient Boosting (GB) was selected for the classiﬁ-

cation of the PE samples (LDPE and HDPE). GB is a

robust ML algorithm that builds an ensemble of weak

models, typically decision trees and improves their

performance to create a strong predictive model with

high accuracy.This method is based on the progres-

sive strengthening (boosting) of the models, where

each new model tries to correct the errors of the pre-

vious one, thus reducing the overall prediction error

(Zhang and Yin, 2023).

To optimize the model’s performance, certain set-

tings were adjusted during the Gradient Boosting

Classiﬁer training process. The settings for this model

set as follows:

• n estimators=150: This parameter deﬁnes the

number of boosting stages or weak learners (de-

cision trees) the model should use.

• learning rate=0.1: This is the pace at which the

model gains knowledge by modifying each new

tree’s contribution.

• max depth=3: This parameter determines the

maximum depth of each decision tree. With a

setting of 3, the model restricts the complexity of

each tree, which can help prevent overﬁtting and

promote generalizability.

• criterion=’friedman mse’: This setting was made

to guarantee that the outcomes could be repli-

cated. Fixing the random state allows for consis-

tent results across runs by controlling the random-

ness of some model training phases.

3.3.1 Train

For the training of the model, the data used was

on M t (measurement time) of 2 seconds and in an

amount of 200 samples.

The dataset was subjected to 10-fold cross-

validation to prevent overﬁtting and provide a solid

performance evaluation. Cross-validation divides the

dataset into ten equal-sized subsets (folds) to produce

a more accurate estimate of the model’s performance.

Nine folds were utilized for training and one fold for

testing in each cycle. Each fold was utilized as a test-

ing set exactly once during the 10 iterations of this

procedure. The ﬁnal accuracy was calculated as the

average of the accuracies across all ten iterations, pro-

viding a trustworthy measure of the model’s general-

izability.

3.3.2 Test

Following training, the model’s performance was

evaluated using a distinct dataset, excluding the M t

2 seconds data on which it was initially trained. In or-

der to measure the robustness of the model to samples

of different acquisition times, than the data used in the

training, we test on 100 samples of M t 1 second and

100 samples of M t 3 seconds. This variance in mea-

surement periods provided insights into the model’s

performance with a range of sample parameters. It

allowed for the evaluation of the model’s resilience

and ﬂexibility to minor modiﬁcations in data collec-

tion conditions.

In order to assess the model’s classiﬁcation accu-

racy in real-world applications, it was lastly evaluated

on commercially accessible plastic shapes, like bot-

tles, ﬁlms, and cups. The amount of commercial PE

plastics tested was: 10 bottles, 10 cups, and 4 ﬁlms.

Also, for each sample, measurements were taken at

8 points to increase the reliability of the prediction

model .

4 RESULTS & DISCUSION

The results of the cross-validation process are pre-

sented in Table 4. The model was evaluated using

10 different folds, with scores ranging from 0.80 to

1.00. The mean score was 0.90, indicating accurate

and consistent model performance across all valida-

tion sets.

The high average accuracy suggests that the model

generalizes effectively to different subsets of the data.

Furthermore, the low variance in the scores indicates

stability in the model’s performance, further enhanc-

ing its reliability. Importantly, the absence of ex-

Machine Learning-Driven Classiﬁcation of Polyethylene (HDPE, LDPE) via Raman Spectroscopy

331

Table 2: HDPE Peak Information.

HDPE Peak

Count

Possibility

HDPE Peak

Count

PHDPE

HDPE Peak

Count

PHDPE

HDPE Peak

Count

PHDPE

HDPE Peak

Count

PHDPE

Table 3: LDPE Peak Information.

LDPE Peak

Count

Possibility

LDPE Peak

Count

PLDPE

LDPE Peak

Count

PLDPE

LDPE Peak

Count

PLDPE

LDPE Peak

Count

PLDPE

Table 4: Cross-Validation Scores and Mean.

Fold Score

1 0.95

2 0.95

3 0.80

4 0.80

5 0.85

6 0.95

7 0.85

8 1.00

9 0.95

10 0.90

Mean 0.90

tremely high scores coupled with consistent results

suggests that the model does not suffer from overﬁt-

ting, as it maintains balanced performance across all

folds.

The results, as analyzed below, demonstrated that

the trained model with data of M

t = 2 seconds, was

able to classify PE categories for each M t with high-

est accuracy 97% Table 5. Furthermore, the predic-

tion accuracy remained high for the random PE plas-

tic types Table 6.

Table 5: Model Performance on Different Time Measure-

ments.

M t (sec) Num. of Samples Acc. (%)

1 100 86

2 100 97

3 100 76

Table 6: Model Performance on Different Plastic Types.

Type Num. of Samples Acc. (%)

LDPE ﬁlm 2 100

HDPE ﬁlm 2 100

HDPE cup 5 80

LDPE cup 5 80

HDPE bottle 5 80

LDPE bottle 5 100

The test on the samples with M t of 1 second

showed an Accuracy of 86% . Although faster data

acquisition is desirable, a decrease in the accuracy and

quality of predictions is observed, which can be at-

tributed to the increased presence of noise due to the

short M t. The performance of the model with an M t

of 2 seconds was on 97%. For the data with an M t

of 3 seconds, the model scored the lower Accuracy

of 76%, The results show that for the fastest and most

reliable spectroscopic analysis, the measurement time

of 2 seconds is the most efﬁcient choice, offering high

accuracy in the classiﬁcation of LDPE and HDPE.

About the commercial PE plastics, an over 90% ac-

curacy was observed with an excellent model perfor-

mance on the classiﬁcation of PE ﬁlms Table 6.

The results of the present study conﬁrm the effec-

tiveness of the proposed method to categorize PE into

LDPE and HDPE using Raman spectroscopy and ma-

chine learning models. The use of Gradient Boost-

ing proved to be particularly efﬁcient, as it was able

to accurately classify the two classes, overcoming the

challenges arising from the similarity of spectral char-

acteristics between LDPE and HDPE.

4.1 Features Importance

On the model performance and accuracy and for the

understanding of the differences in chemical struc-

tures of both categories, we must take into consider-

ation the feature importance of the used model Table

7. As we see the peak on 927.27 cm

−1

was the most

important peak for the model to be able to classify

the category of PE. As shown in Figure 5, in the plot

of the Standard Deviation of spectrums, the speciﬁc

peak appeared in low intensity. But in Figure 6, it is

clear that the difference in this peak proves the impor-

tance of the model classiﬁcation. More speciﬁcally, it

is clearly shown that the peak appeared mostly for the

HDPE category. Quite so, this feature importance was

noticed also from the outputs of data pre - processing

where the speciﬁc peak appeared in 65% of the HDPE

samples, in contrast to the LDPE category where the

peak didn’t appear at whole sampling.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

332

Figure 5: Comparison of Two Mean Raman Spectra with

Standard Deviation.

Figure 6: Comparison of Two Mean Raman Spectra with

Standard Deviation.

Table 7: Feature Importance for Raman Shift Values.

Wavenumber (cm

−1

) Feature Importance

927.27 0.43

Rest of the Peaks <0.1

4.2 Limitations

The comparatively low accuracy (76%) where been

found in the prediction of the dataset with M t of 3

seconds is a limitation of the current investigation.

The model was particularly trained on data with a

M t of 2 seconds, which is probably why the accu-

racy was lower. More noise is added to the spectrum

signals as M t rises, making it harder for the trained

model in different circumstances to manage. This in-

dicates susceptibility to changes in the measurement

interval and suggests that performance may decline

when acquisition times increase from the training cir-

cumstances.

5 CONCLUSION AND FUTURE

WORK

A data-driven method with data from Raman Spec-

troscopy and the evaluation of the gradient boosting

model algorithm were been used to predict the cate-

gories of the PE. The model showed different values

of accuracy for different M t with the highest being

for the data of M t 2 seconds and the lowest being at

M t 3 seconds. Furthermore, the model showed high

accuracy on random types of PE plastics.

So we conclude from the above that a machine

learning model with its training through a large

amount of Raman spectroscopy data, can predict with

a high success rate the category of an unknown PE

sample both in the form of granule and in the form of

commercial plastic.

Also, in the features’ importance values, it is noted

that the peak 927.27 cm

−1

had a crucial role in the

classiﬁcation decisions of the model. Also, the spe-

ciﬁc peak appeared mostly on HDPE samples. This

ﬁnding should be studied further in order to under-

stand the reason for these peak differences in the com-

plex study of PE classiﬁcation and in the literature

so far, regarding the characteristic Raman peaks of

PE (Jin et al., 2017).

The Raman method, as it is shown to yield ex-

cellent results after the training carried out in the

present project, offers interesting horizons for fur-

ther research. In particular, an additional study of

the method could lead to an improvement of its ac-

curacy and efﬁciency, to signiﬁcantly reduce analysis

times. This perspective would be a signiﬁcant advan-

tage, paving the way for its use in real-time (online)

production processes, making the method suitable for

integration into production lines where rapid and reli-

able analysis is required.

Ultimately, according to the presented study, more

PE categories could be used with the aim to classiﬁ-

cation of a wider range of plastic materials.

ACKNOWLEDGEMENTS

This research was ﬁnancially supported by the Euro-

pean Union’s Horizon Europe research and innova-

tion program under grant agreement No 101058540

(project PLASTICE).

REFERENCES

Baxter, L., Herrman, K., Panthi, R., Mishra, K., Singh,

R., Thibeault, S., Benton, E., and Vaidyanathan,

Machine Learning-Driven Classiﬁcation of Polyethylene (HDPE, LDPE) via Raman Spectroscopy

333

R. (2020). Chapter 3 - thermoplastic micro- and

nanocomposites for neutron shielding.

Bruns, B. (2022). 1. atr-ftir spectroscopy combined

with chemometric methods for the classiﬁcation of

polyethylene residues containing different contami-

nants. Journal of Polymers and The Environment.

da Silva, D. J. and Wiebeck, H. (2017). Using pls, ipls and

sipls linear regressions to determine the composition

of ldpe/hdpe blends: a comparison between confocal

raman and atr-ftir spectroscopies. Vibrational Spec-

troscopy, 92:259–266.

Gallagher, N. B. (2020). Savitzky-golay smoothing and dif-

ferentiation ﬁlter. Eigenvector Research Incorporated.

Jin, Y., Kotula, A. P., Snyder, C. R., Hight Walker, A. R.,

Migler, K. B., and Lee, Y. J. (2017). Raman iden-

tiﬁcation of multiple melting peaks of polyethylene.

Macromolecules, 50(16):6174–6183.

Katharina Eberhardt, Clara Stiebing, C. M. M. S. and Popp,

J. (2015). Advantages and limitations of raman spec-

troscopy for molecular diagnostics: an update. Ex-

pert Review of Molecular Diagnostics, 15(6):773–

787. PMID: 25872466.

Konstantinidis, F. K., Myrillas, N., Tsintotas, K. A.,

Mouroutsos, S. G., and Gasteratos, A. (2023a). A

technology maturity assessment framework for indus-

try 5.0 machine vision systems based on systematic

literature review in automotive manufacturing. Inter-

national Journal of Production Research, pages 1–37.

Konstantinidis, F. K., Sifnaios, S., Arvanitakis, G., Tsimik-

lis, G., Mouroutsos, S. G., Amditis, A., and Gaster-

atos, A. (2023b). Multi-modal sorting in plastic and

wood waste streams. Resources, Conservation and

Recycling, 199:107244.

Konstantinidis, F. K., Sifnaios, S., Tsimiklis, G., Mourout-

sos, S. G., Amditis, A., and Gasteratos, A. (2023c).

Multi-sensor cyber-physical sorting system (cpss)

based on industry 4.0 principles: A multi-functional

approach. Procedia Computer Science, 217:227–237.

Sato, H., Shimoyama, M., Kamiya, T., Amari, T.,

sic, S.,

Ninomiya, T., Siesler, H. W., and Ozaki, Y. (2002).

Raman spectra of high-density, low-density, and lin-

ear low-density polyethylene pellets and prediction of

their physical properties by multivariate data analysis.

Journal of applied polymer science, 86(2):443–448.

Shebani, A., Klash, A., Elhabishi, R. G., Abdsalam, S.,

Elbreki, H., and Elhrari, W. (2018). 1. the inﬂu-

ence of ldpe content on the mechanical properties of

hdpe/ldpe blends.

Sifnaios, S., Arvanitakis, G., Konstantinidis, F. K., Tsimik-

lis, G., Amditis, A., and Frangos, P. (2024). A deep

learning approach for pixel-level material classiﬁca-

tion via hyperspectral imaging.

Silva, D. J. d. and Wiebeck, H. (2019). Predicting ldpe/hdpe

blend composition by cars-pls regression and confocal

raman spectroscopy. Pol

ımeros, 29:e2019010.

Vollmert, B. (2012). Polymer chemistry. Springer Science

& Business Media.

Workman Jr, J. J. (1999). Quantiﬁcation of ldpe [low den-

sity poly (ethylene)], lldpe [linear low density poly

(ethylene)], and hdpe [high density poly (ethylene)] in

polymer ﬁlm mixtures “as received” using multivari-

ate modeling with data augmentation (data fusion) and

infrared, raman, and near-infrared spectroscopy. Spec-

troscopy Letters, 32(6):1057–1071.

Zhang, J. and Yin, K. (2023). Application of gradient boost-

ing model to forecast corporate green innovation per-

formance. Frontiers in Environmental Science.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

334