Predicting Moisture Content on Wood Using Machine Learning

Classiﬁcation Methods

ıtor Mendes Magalh

aes

, Giancarlo Lucca

, Alessandro de Lima Bicho

and Eduardo N. Borges

Centro de Ci

encias Computacionais, Universidade Federal do Rio Grande, FURG, Brazil

Keywords:

Moisture Content, Wood, Intelligent Systems, Machine Learning, Artiﬁcial Intelligence.

Abstract:

The growing demand for wood in several industry segments and for its economical value increased illegal

deforestation in several countries. As a direct consequence, climate changes across the planet have been

aggravated, which further increases the prominence and concern about the issue of deforestation. So that these

potentially catastrophic effects can be mitigated, it is necessary to better use wood in production processes.

In this sense, a key point is the variation of the moisture content of the wood as a function of storage time,

since, as the wood logs are stored outdoors, they gradually begin to lose water. Dry wood usually cracks,

which makes most of its use unfeasible – depending on the purpose – which can even lead to the disposal of

the log. Considering that there is a direct relationship between moisture content and wood weight, this work

aims to develop different possible solutions for this problem using explainable machine learning methods,

contributing to the effectiveness in controlling the variation in moisture content and, consequently, to a better

use in the production processes in which wood is used as a raw material.

1 INTRODUCTION

Different countries has increasingly established its

position as an exporter of natural products – as for ex-

ample, Brazil. Precisely, the Brazilian relation with

the Asian market have increased. With the grow-

ing demand from Chinese industry, exports have been

gradually required, and this has had a great impact

both on the organization of Brazilian agriculture and

on its technological conﬁguration (Vieira et al., 2019).

Brazil has been studying and forecasting an in-

crease in demand for wood since the 1960s, when

tax incentives were created for farmers who produce

wood by planting fast-growing species, such as Pinus

and Eucalyptus (Kengen, 2001).

As much as the advances are continuous, histor-

ically Brazil has difﬁculties about promoting the ex-

pansion of planted forests while controlling deforesta-

tion. When dealing with farmers who are not used to

planting trees for this purpose, some speciﬁc charac-

teristics of this market can greatly reduce its attrac-

https://orcid.org/0000-0003-3588-9930

https://orcid.org/0000-0002-3776-0260

https://orcid.org/0000-0002-6572-1496

https://orcid.org/0000-0003-1595-7676

tiveness. The fact that the ﬁnancial return is only re-

alized after many years can be cited as the main cause

of the lack of attractiveness.

Deforestation is a growing concern, especially for

developing countries. It has global repercussions, as

forest losses can directly imply changes in the wa-

ter balance, in the carbon cycles and obviously in the

supply of wood (Allen and Barnes, 1985). Even the

United Nations (UN) consider this issue as one of the

17 goals to a sustainable development

Regardless of the destination that will be given

to the wood, many products that have wood as raw

material go through the same stage: the storage of

wood logs in piles. But as the storage time passes,

the moisture content on wood will reduce (Rezende

et al., 2010). Such changes directly alter all mechani-

cal properties of wood.

While the wood logs are stored in piles and ex-

posed to weather conditions, their weight decreases

due to moisture loss (Tomczak et al., 2018). For this

reason, the storage time of the logs in the piles is deci-

sive (Lima et al., 2017; J

unior and Alves, 2019). So,

For more informations see –

https://www.un.org/sustainabledevelopment/sustainable-

development-goals/

Magalhães, V., Lucca, G., Bicho, A. and Borges, E.

Predicting Moisture Content on Wood Using Machine Learning Classiﬁcation Methods.

DOI: 10.5220/0011988600003467

In Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023) - Volume 1, pages 607-614

ISBN: 978-989-758-648-4; ISSN: 2184-4992

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

607

the moisture content is considered a key factor in the

storage of wood, and ideally it should be kept within

the standards (Estuqui Filho, 2006), because the stor-

age time can lead to losses in the production process –

precisely because of cracks – caused by moisture loss.

Companies in the sector, seeking to reduce and

even avoid the cracks problem, use some devices.

One of the most used devices is the anti-split, which

basically consists of two metal plates installed at the

ends of the logs (cross sections).

It was not possible to ﬁnd machine learning-based

methods to predict the variation of moisture content

in wood logs, as a function of the weight of the logs.

In this sense, there is a study that presents a review,

with several other methods published in recent years,

which use other variables to solve the current prob-

lem (Magalh

aes et al., 2022).

So, the aim of this work is to develop a Artiﬁcial

Intelligence-based model (Russell, 2010) – specif-

ically using machine learning classiﬁcation meth-

ods (Tan et al., 2005; Tavana et al., 2022) – to pre-

dict the moisture content on wood. Precisely, we have

consider the application of different methods that are

able to produce explainable models, which can be lat-

ter used to better understanding the problem.

This paper is organized as follows. First, the the-

oretical foundation of the present work is presented

in Section 2. Then, the methodology, describing the

considered dataset and pre-processing is shown in

Section 3. After that, in Section 4, the results are dis-

cussed. At the end, the main conclusions are drawn.

2 THEORETICAL FOUNDATION

This section provide the concepts related with the pa-

per. It starts presenting the concepts of machine learn-

ing and the considered methods.

2.1 Machine Learning

The machine learning process can use different meth-

ods to solve problems. Usually, it is emphasized that

there is no single approach that best solves all prob-

lems. Therefore, it is important to incorporate the spe-

ciﬁc knowledge of the problem into the behavior of

the algorithm(Yazdi et al., 2018; Khorhidpoor et al.,

2023), as well as to understand the limitations of the

algorithms, preferably using methodologies that al-

low evaluating the concepts induced by them in the

resolution of certain problem (Mahesh, 2020).

When dealing with supervised learning, verifying

the dataset and its relationship with the problem that

must be solved, it is necessary to analyze the target

attribute from two different perspectives: classiﬁca-

tion and regression problems (Harrington, 2012). As

some classes will be created to represent the weight

loss intervals of wood logs as a function of moisture

content variation (see subsection 3.1.3), in addition to

being a supervised machine learning problem, should

be treated as a classiﬁcation problem.

2.2 Considered Algorithms

There are different approaches to deal with classiﬁca-

tion problems. Decision Trees – DT (Freund and Ma-

son, 1999), Support Vector Machines – SVM (Stein-

wart and Christmann, 2008), Fuzzy Rule-Based Clas-

siﬁcation Systems – FRBCS (Cord

on et al., 1999) and

Artiﬁcial Neural Networks – ANN (Yegnanarayana,

2009) are a few approaches. It is necessary to point

out that each one have a large set of related algo-

rithms. In what follows, the considered algorithms

used in this study are introduced.

It is important to observe that each algorithm pro-

duce an interpretable model – that can be used to bet-

ter understand the decision made in the prediction of

new examples.

FURIA: Fuzzy Unordered Rule Induction Algo-

rithm – FURIA (H

uhn and H

ullermeier, 2009) is

an algorithm that consider the IREP (Incremental

Reduced Error Pruning) to generate the rules, im-

proving the performance in comparison to the use

of a default rule.

RIPPER: Repeated Incremental Pruning to Produce

Error Reduction – RIPPER (Cohen, 1995) is one

of the most used algorithms for rule induction. It

orders, in an ascending way, the classes involved

in the problem according to their frequency in the

training set, being suitable for the development of

models that deal with unbalanced datasets.

C4.5: Based on decision trees which is able to deal

with continuous values, unavailable values, prune

the trees and derive rules from that, C4.5 (Quin-

lan, 1993) aims to generate a classiﬁer model pre-

senting two different states during the process: a

leaf and a decision node. Based on the attribute

under analysis, it may result in a branch, or a sub-

tree, for each value found in the base.

Random Forest: Being a classiﬁer formed by a set

of classiﬁcation trees, each constructed from a

random sampling of the original training set, Ran-

dom Forest – RF (Breiman, 2001) is a algorithm

that obtains the forests through bootstrapping ag-

gregating, a method used to generate multiple ver-

sions of a predictor, that are built re-sampling the

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

608

original set. The classiﬁcation of a feature vector

is done by voting.

3 METHODOLOGY

The methodology adopted in this study is described

in this section. We start by the issues related with

the dataset, such as data integration and exploratory

analysis. Later, the statistical tests are discussed and

the parameters used by the algorithms are shown.

3.1 Dataset

The data used for the construction of speciﬁc predic-

tion models have different origins. For this reason,

it becomes very important to describe the integration

process of different databases, resulting in the ﬁnal

dataset considered in this research.

When it comes to data composition, it can be cate-

gorized into two distinct groups: operational data and

meteorological data.

• The operational data are the speciﬁc storage data

of the wood logs, such as information about the

piles, dates and weights; and biological-forestry

data of the logs, such as the type of wood, species,

length, diameter class – as required by the inter-

national market, and the presence (or not) of bark.

• The meteorological data are the atmospheric or

climatic data that have relationship with the vari-

ation of the moisture content in the wood. Such

data are extremely important for the construction

of prediction models because it is known that the

process of variation in moisture content is also

based on speciﬁc meteorological data.

These two groups of data were obtained from dif-

ferent sources. For this reason, its categorization is

even more important. Next, the ways of obtaining

data will be detailed, as well as their description.

3.1.1 Obtaining Data

The operational data were provided by a company

whose purpose is to buy Brazilian wood and then ex-

port to the European and Asian markets, especially to

China. The referred company’s log storage yards are

located in the city of Rio Grande, state of Rio Grande

do Sul, in the south of Brazil.

As the exact location of the storage yards is

known, it was possible to obtain all historical me-

teorological data available by the speciﬁc measuring

station through the National Institute of Meteorology

(INMET)

– ofﬁcially linked to the Ministry of Agri-

culture, Livestock and Supply of Brazil (MAPA)

Then, in INMET’s own system on the internet

, the

automatic measurement station closest to the storage

yards was searched. Precisely, the data were obtained

from the A802

measuring station, located at coordi-

nates 32º04’43.7” S 52º10’03.8” W.

For each existing pile of logs in the database of the

referred company’s storage yards, was obtained the

ﬁrst date on which any input occurred and also the last

date on which any in/out happened. With this period

in hand, historical meteorological data for the same

location were searched in the INMET database – me-

teorological data recorded daily and automatically by

measurement stations.

As the equipment of the automatic meteorological

stations can have problems, some dates were found

without any measurement. For these instances, the

arithmetic mean between the two closest dates – be-

fore and after – in which there is measurement was

deﬁned.

3.1.2 Description of the Variables

The generated dataset, then, resulted in a total of

759 instances and 23 different attributes, divided into

nominal, ordinal and categorical types. It contains

all the data that can be extracted from the different

sources, categorized into operational data and mete-

orological data, as explained above.

The list of all the attributes of the dataset is shown

in Table 1. For each one of them, the data type is pre-

sented, as well as its minimum, maximum and mean

(mode for categorical and median for ordinal), as well

as its description for a better understanding.

3.1.3 Pre-Processing Data

Starting the data pre-processing step, a new attribute

was created in the dataset, named PERCENTAGE. It

represents the percentage of wood weight loss (rever-

sals) in relation to the sum of wood inputs in each

pile, and can be represented through the equation

P = (E × 100) × T

−1

. Where P is the percentage of

weight loss, E represents the reversed weight of each

pile, and T represents the sum of the input weights of

wood in each pile.

After creating the attribute, instances representing

piles from which there was no record of output were

removed. In theory, either such instances of piles con-

tinued to receive loads of wood, or they were not yet

https://portal. inmet.gov.br/

https://www.gov.br/agricultura/pt-br

https://bdmep.inmet.gov.br/

https://tempo.inmet.gov.br/TabelaEstacoes/A802

Predicting Moisture Content on Wood Using Machine Learning Classiﬁcation Methods

609

Table 1: Details of the attributes present in the ﬁnal dataset.

Attribute Type Min Max Average/Mode Description

pile categorical na na na id of the wood log storage stack

initial date date na na 29/03/2022 start date of depositing logs in

the pile

ﬁnal date date na na 24/05/2021 end date of deposits and/or

withdrawals of logs from the

pile

product categorical na na eucalyptus type of wood stored in the pile

(pinus or eucalipto)

species categorical na na grandis species of wood stored in the

pile

diameter ordinal 8 a 12 40 a 60 20 a 30 cross-section diameter class,

measured by the thinnest end of

the wood logs in the pile

length ordinal 3,1 6 5,2 length class (size, in meters) of

wood logs

bark categorical na na yes whether the wood logs in the

pile have bark or not

temp min continuous 1,9 24 5,4 minimum temperature (in °c) of

the period

temp max continuous 12,7 39,4 28,5 maximum temperature (in °c) of

the period

temp mean continuous 7,73 37,7 17,2 average temperature (in °c) of

the period

temp po continuous 4,2 29,4 12,6 average temperature (in °c) of

the period in which the wa-

ter present in the ambient air

changed to a liquid state

precipitation continuous 0 1156 95,4 total precipitation (in millime-

ters) for the period

atmp mean continuous 1.005 1.535,4 1.016,4 average atmospheric pressure

(in hpa) of the period

ru min continuous 21 91 32 minimum relative humidity of

the air (in %) of the period

ru med continuous 59,5 117,5 76,4 average relative humidity of the

air (in %) of the period

wind mean continuous 1,1 5,4 3 average wind speed (in m/s)

over the period

raj max mean continuous 3,7 19,3 10,3 average maximum wind gust

speed (in m/s) for the period

qtd days discrete 0 425 47 number of days in the period

tot in continuous 940 8.128.820 216.740 total entries (in kg) of logs in the

pile

tot out continuous 0 6.628.260 168.480 total outputs (in kg) of logs from

the pile

difference continuous -169.280 1.500.560 18.990 difference (in kg) between total

inputs and total outputs

reversals continuous 0 1.500.560 9.220 amount reversed (in kg) of the

pile – in theory, this attribute

should be directly related to wa-

ter loss

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

610

ﬁnalized at the time of obtaining the operational data.

They may also represent data entry errors by the oper-

ational sector of the company that provided the data.

Instances that had no reversed amount were also

removed, that is, instances representing piles that, in

theory, had no weight loss. Such instances need to

be removed based on the specialist’s knowledge: in

natural drying, under the conditions of the captured

data, there will always be a loss of weight.

After removing these instances, the feature selec-

tion process was started. In real-world situations, in

which data are not available in the ideal format to start

the knowledge discovery process – and are often ob-

tained from different sources, it is necessary to use of

tools that make data mining more effective.

For the creation of classiﬁcation models, it is

necessary to create a new attribute, which is called

LOSS INTERVAL. This attribute will represent the

weight loss percentage intervals of each pile in rela-

tion to the total inputs. We present it in Table 2, where

is possible to observe the total number of instances re-

sulting in each weight loss interval.

Table 2: Breakdown of weight loss intervals.

Name of the interval Interval Samples

A ≥ 0% and ≤ 10% 478

B ≥ 10% and ≤ 20% 173

C ≥ 20% and ≤ 30% 58

D ≥ 30% 17

After analyzing the data in the Table 2 (samples

column), it can be observed that the largest number of

instances of the dataset is represented in classes A and

B, that is, the vast majority of the piles have a loss of

weight, in percentage, less than 20%. It is necessary

to verify the integrity of this information.

One possible way is to compare the weight loss

with the number of days of storage in the piles. After

all, the longer is the storage time, the greater will be

the weight loss – until the variation is close to zero.

When it comes to the loss of moisture content in

wood stored outdoors, 20% is a small percentage. It

can be inferred that the observed piles, due to the av-

erage number of days of storage (almost 65 days),

had already been cut longer – information discovered

from descriptive data analysis and which is important

for the subsequent analysis of predictive models.

With this, it can be understood that the ﬁnal

datasets intended for the development of the classiﬁ-

cation models is ready to be submitted to the next step

of this research, the data mining, being composed by

726 instances.

3.2 Parameter Set-up

In this paper, 5 different algorithms are considered to

tackle the proposed classiﬁcation problem. These al-

gorithms were applied through the WEKA software

In Table 3, for each algorithm, the used parame-

ters are shown. We highlight that these are the default

ones used in the tool.

Table 3: Set-up of the hyperparameters used by the machine

learning algorithms.

Algorithm Hyperparameters

FURIA

T-norm: Product minNo: 2.0

batchSize: 100 numDecimalPlaces: 2

checkErroRate: True optimizations: 2

debug: False seed: 1

doNotCheckCapabilities: False uncovAction: Rule streching

folds: 3

Ripper

batchSize: 100 minNo: 2.0

checkErroRate: True numDecimalPlaces: 2

debug: False optimizations: 2

doNotCheckCapabilities: False seed: 1

folds: 3 usePruning: True

C4.5

batchSize: 100 numFolds: 3

binarySplits: False reducedErrorPruning: False

collapseTree: True saveInstanceData: False

conﬁdenceFactor: 0.25 seed: 1

debug: False subtreeRaising: True

doNotCheckCapabilities: False unpruned: False

doNotMakeSplit: False useLaplace: False

minNumObj: 2 useMDLcorrection: True

numDecimalPlaces: 2

bagSizePercent: 100 NumDecimalPlaces: 2

batchSize: 100 numExecutionSlots: 1

BreakTiesRamdomly: False numFeatures: 0

calcOutOfBag: False numIterations: 100

ComputeAttributeImportance: False OutputOutOfBagComplexity: False

debug: False PrintClassiﬁers: False

doNotCheckCapabilities: False Seed: 1

maxDepth: 0 storeOutOfBagPredictions: False

Baseline

BatchSize: 100

debug: False

doNotCheckCapabilities: False

numDecimalPlaces: 2

In this study, we validate the models considering

a hold-out approach. That is, the original dataset is

splitted into different partitions of training and test.

To avoid a split that facilitate the model training, we

have considered 5 different runs with different values

(and amount of data) in each one, setting a seed

. The

relation of seeds and the considered runs are: 1 (Run

1), 1234 (Run 2); 500 (Run 3); 98765 (Run 4) and

999999 (Run 5).

4 EXPERIMENTAL RESULTS

In this section, the obtained results are shown. Pre-

cisely, the results are provided in Table 4, where for

each hold-out conﬁguration we provide the obtained

For more information about this software, visit – https:

//www.cs.waikato.ac.nz/ml/weka/.

A seed is a number used in the pseudo random value

generator. Observe that it is possible to set a value to this pa-

rameter to guarantee the reproducibility of the experiment.

Predicting Moisture Content on Wood Using Machine Learning Classiﬁcation Methods

611

accuracy for different runs and methods. we point out

that the baseline is considered as the majority class.

In order to ease the comprehension of the obtained

results, we highlight for each execution with boldface

the highest accuracy and underline the lowest. Simi-

larly, in order to provide a general analysis among all

the experiment, we check with

↑

the largest general

accuracy and with

↓

the lowest.

Starting with a general analysis of the obtained re-

sults, it is noticeable that the FURIA method achieved

the largest accuracy mean in 3 different runs. The RF

method presented one largest accuracy run for the last

run in the hold-out 75-25. Also, it is interesting to

notice that the Baseline was not outperformed for any

method for the run 4 in the hold-out 90-10. This be-

havior is probably due to the generalization caused by

the division of the data.

Taking into account the cases where the ap-

proaches provided the lowest general accuracy, the

Baseline is outperformed by 4 out the 5 runs. It can

be considered as an expected behavior since this is a

simple approach that indicate if the learned models

can be considered as satisfactory. However, it is nec-

essary observe that for the RF method, in one speciﬁc

situation the obtained accuracy is the lowest one.

Up to this point, considering the achieved means,

the largest one is obtained by FURIA, which also

presented the largest result among all methods. The

reverse occurs with the Baseline approach, which

achieved accuracy means around 60%. A satisfactory

performance was provided by the RIPPER method,

around 70% of accuracy in general. Considering the

C4.5 and RF the similarity of these approaches is also

noticeable.

In a closer look to the obtained results, per hold-

out, in the ﬁrst analysis (a) the dominance of FU-

RIA is observable, for 4/5 of the runs this approach

achieved the largest mean. A similar situation also is

noticeable, in the otter cases (b, c and d) since for 3

out 5 runs, FURIA achieved the largest mean. The

Baseline is completely outperformed in two analysis,

a and b, and for 4/5 and 3/5 for the analysis c and d

respectively.

While RF achieve at least one satisfactory result in

all analysis, C4.5 only have the largest accuracy in the

one situation (d). It is interesting to mention that this

last hold-out present a behavior in the last run where

all the dataset performed equally.

4.1 Statistical Analysis

The analysis of the methods considering the accuracy

is an interesting approach that demonstrated the supe-

riority of the FURIA. However, the analysis of each

hold-out and in a general way can not be enough to

state any conclusion. In order to provide a complete

study, a statistical group comparison is performed.

Precisely, the aligned Friedman rank test (Hodges

and Lehmann, 1962) to compare the group of 5 differ-

ent approaches, shows the achieved rankings per col-

umn in Table 5. Additionally, the values are sorted

from the lowest to highest obtained ranking and is

considered as control variable. We also compute the

Holm’s post-hoc test, to check whether the control

approach is statistically better, showing the obtained

APV with the obtained rank for each method. It there

are statistical differences, considering a signiﬁcance

level of 10%, we underline it.

From the results obtained by the statistical analy-

sis it can be concluded that FURIA is the best option

to tackle with the presented issue. In fact, this method

is considered as control variable in all situations.

Considering the obtained differences, we can ob-

serve that FURIA is statistically superior in relation

to the Baseline in 80% of the study. Moreover, the

performance is superior than C4.5 and RF in some

situations as in a and b. In consideration of RIPPER,

in all cases no differences were found.

4.2 Analyzing the Generated Models

As observable in the previous analysis, FURIA was

the method that achieved the superior performance in

the study. Thus, this subsection aims at analyzing the

rules generated by this model.

Regarding the comprehension of the rules, its

necessary to state that FURIA consider the usage

of trapezoidal membership functions with a con-

cept of soft boundaries. For example, as stated

by the authors, a generic fuzzy rule R – (A ∈

(−∞, −∞, 6, 9)|class

) indicates that the rule is com-

pletely valid for A ≤ 6, invalid for A > 9 and partially

valid in-between.

In what follows we provide the rules generated

by the model in the cases that it is considered as the

largest global accuracy. That is, highlighted with

↑

Table 4. It is important to mention that the generated

rules are the same for all cases. Also, consider CF

as the conﬁdence of the rule (H

uhn and H

ullermeier,

2009).

• [Rule

] - (TEMP MAX in [32.5, 33.2, inf, inf])

and (ATMP MEAN in [1014.46, 1014.5, inf, inf])

→ LOSS INTERVAL = B (CF = 0.63)

• [Rule

] - (TEMP MAX in [-inf, -inf, 32.5, 33.2])

→ LOSS INTERVAL = A (CF = 0.84)

• [Rule

] - (PRECIPITATION in [-inf, -inf, 104.4,

105.6]) and (TOT IN in [203680, 246120, inf,

inf]) → LOSS INTERVAL = A (CF = 0.9)

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

612

Table 4: Results achieved in test by the different approaches and validations.

Hold-out 75-25 Hold-out 80-20

FURIA RIPPER C4.5 RF Baseline FURIA RIPPER C4.5 RF Baseline

Run 1 0.746 0.730 0.705 0.648 0.623 Run 1 0.732 0.742 0.701 0.670 0.608

Run 2 0.754 0.697 0.730 0.689 0.598 Run 2 0.763

↑

0.753 0.742 0.711 0.577

Run 3 0.770 0.738 0.713 0.697 0.639 Run 3 0.763 0.722 0.711 0.711 0.619

↓

Run 4 0.697 0.689 0.664 0.631 0.582 Run 4 0.660 0.701 0.639 0.639 0.577

Run 5 0.746 0.721 0.631 0.762

↑

0.623 Run 5 0.742 0.701 0.711 0.742 0.619

#Mean 0.743

↑

0.715 0.689 0.685 0.613 #Mean 0.732 0.724 0.701 0.695 0.600

↓

(a) (b)

Hold-out 85-15 Hold-out 90-10

FURIA RIPPER C4.5 RF Baseline FURIA RIPPER C4.5 RF Baseline

Run 1 0.767

↑

0.740 0.712 0.726 0.616 Run 1 0.714 0.735 0.653 0.714 0.551

↓

Run 2 0.753 0.671 0.726 0.671 0.575 Run 2 0.714 0.673 0.694 0.673 0.531

↓

Run 3 0.795

↑

0.767 0.726 0.671 0.630 Run 3 0.735 0.673 0.735 0.673 0.694

Run 4 0.644 0.658 0.616 0.548

↓

0.616 Run 4 0.633 0.673 0.612 0.612 0.714

↑

Run 5 0.712 0.712 0.712 0.740 0.589

↓

Run 5 0.735 0.735 0.735 0.735 0.653

#Mean 0.734 0.710 0.699 0.671 0.605 #Mean 0.706 0.698 0.686 0.682 0.629

Table 5: Statistical results with Align Friedman rank test

and Holm post-hoc test.

Hold-out 75-25 Holdout 80-20

Method Ranking APV Method Ranking APV

FURIA 4 – FURIA 5.6 –

RIPPER 10.1 0.19 RIPPER 7.7 0.65

C4.5 13.7 0.08 C4.5 13.4 0.18

RF 14.2 0.08 RF 15.3 0.11

Baseline 23 0.00 Baseline 23 0.00

(a) (b)

Hold-out 85-15 Holdout 90-10

Method Ranking APV Method Ranking APV

FURIA 1.4 – FURIA 8.7 –

RIPPER 2.6 0.33 RIPPER 11.3 0.73

C4.5 2.866667 0.23 C4.5 12.9 0.73

RF 3.233333 0.08 RF 14.3 0.68

Baseline 4.9 0.00 Baseline 17.8 0.2

• [Rule

] - (RU MIN in [-inf, -inf, 21, 24]) and

(QTD DAYS in [76, 78, inf, inf]) and (RU MED

in [-inf, -inf, 74.64, 74.66]) and (TEMP MEAN

in [-inf, -inf, 20.24, 20.25]) → LOSS INTERVAL

= C (CF = 0.62)

The rule generated by the algorithm in the Rule

states that when the maximum temperature is greater

than 32.5°C (high temperatures) and the average at-

mospheric pressure is medium to high, the weight loss

of the wood will be becoming considerable. Typi-

cally, maximum temperatures above 32.5°C occur in

summer.

In the Rule

, the algorithm says that when the

maximum temperature of the period is up to 33.2ºC

(high temperatures), the weight loss due to the mois-

ture content will be considered small, with an 84% of

conﬁdence.

The rule generated in the Rule

states that when

precipitation is low during the storage period, and the

total input of wood logs in the pile is considered high,

the weight loss due to moisture will be small. This

rule is important, as it contradicts the relevant scien-

tiﬁc literature, which establishes that rainfall (precip-

itation) is not a variable to be considered in the vari-

ation of moisture content. Based on the data consid-

ered in the development of the current research, the

rainfall in the period is an important variable, with a

conﬁdence measure of 90%.

Finally, the Rule

establishes, with a conﬁdence

of 62%, that when the minimum relative humidity of

the period is up to 24%; the number of days that the

wood logs remain stored in the piles is high, from

76 days; and the average temperature of the period

is up to 20.25ºC, the loss of moisture in the logs will

be considerably high – when it should have a lower

weight loss. What may have happened in the genera-

tion of this rule is a sample problem, as the generated

result is different from the expected one.

5 CONCLUSION

Wood is a scarce resource. It serve as raw material

for the industrialization of countless ﬁnished prod-

ucts, and even as fuel for factories, the use of this

resource has grown in an increasing way around the

world. However, after the tree is felled down, the

wood gradually begins to lose its moisture content,

causing cracks in the logs. Such cracks cause ir-

reparable losses in production processes, as much

wood that could be used in industry ends up being

Predicting Moisture Content on Wood Using Machine Learning Classiﬁcation Methods

613

used very little – or even being discarded.

So, the objective of this work was to develop a

method of predicting the moisture loss in wood while

the logs are stored in piles – a step prior to indus-

trialization, applying Machine Learning classiﬁcation

based methods to solve this problem.

Furthermore, this work compares and analyzes the

results of the different applied algorithms: FURIA,

Ripper, C4.5 and Random Forest. Of these, the clas-

siﬁcation method using the FURIA algorithm was su-

perior to the others, including statistically superior in

relation to the baseline.

From this paper, different future works can be

consider. A tuning of the algorithms’ hyperparame-

ters as well as a regression approach.

ACKNOWLEDGEMENTS

This work was supported by the Brazilian re-

search funding agencies CNPq (305805/2021-5) and

FAPERGS (Programa de Apoio

a Fixac¸

ao de Jovens

Doutores no Brasil - 23/2551-0000126-8).

REFERENCES

Allen, J. C. and Barnes, D. F. (1985). The causes of defor-

estation in developing countries. Annals of the associ-

ation of American Geographers, 75(2):163–184.

Breiman, L. (2001). Random forests. Machine learning,

45(1):5–32.

Cohen, W. W. (1995). Fast effective rule induction. In

Twelfth International Conference on Machine Learn-

ing, pages 115–123. Morgan Kaufmann.

Cord

on, O., Del Jesus, M. J., and Herrera, F. (1999). A pro-

posal on reasoning methods in fuzzy rule-based clas-

siﬁcation systems. International Journal of Approxi-

mate Reasoning, 20(1):21–45.

Estuqui Filho, C. A. (2006). A durabilidade da madeira

na arquitetura sob a ac¸

ao dos fatores naturais: estudo

de casos em bras

ılia. Master’s thesis, Faculdade de

Arquitetura e Urbanismo - Universidade de Bras

ılia.

Freund, Y. and Mason, L. (1999). The alternating decision

tree learning algorithm. In icml, volume 99, pages

124–133.

Harrington, P. (2012). Machine learning in action. Simon

and Schuster.

Hodges, J. L. and Lehmann, E. L. (1962). Ranks methods

for combination of independent experiments in anal-

ysis of variance. Annals of Mathematical Statistics,

33:482–497.

uhn, J. and H

ullermeier, E. (2009). Furia: an algorithm

for unordered fuzzy rule induction. Data Mining and

Knowledge Discovery, 19(3).

unior, A. S. and Alves, J. E. (2019). Fatores intervenientes

no armazenamento de laminados. Gepros: Gest

ao da

Produc¸

ao, Operac¸

oes e Sistemas, 14(5):190.

Kengen, S. (2001). a pol

ıtica ﬂorestal brasileira: uma per-

spectiva hist

orica [brazilian forest policy: a historical

perspective]. 1º Simp

osio Ibero-americano de gest

ao e

economia Florestal, Porto Seguro, Brazil. Instituto de

Pesquisas e estudos Florestais.(2001). Retrieved May,

4:2007.

Khorhidpoor, A., Nazari Shirkouhi, S., and Amin-

Tahmasbi, H. (2023). Risk assessment of public-

private partnership projects for water transmission and

distribution using anﬁs method. Sharif Journal of In-

dustrial Engineering & Management.

Lima, R., da Silva Cardoso, G., de Proenc¸a, G., and

da Costa, W. G. (2017). Inﬂu

encia do tempo de ar-

mazenamento (tpc) da madeira no aceite de cavacos

de eucalipto para a fabricac¸

ao de polpa. In Congresso

Internacional de Celulose e Pape, page 8. Associac¸

Brasileira T

ecnica de Celulose e Papel – ABTCP.

Magalh

aes, V. M., Lucca, G., de Lima Bicho, A., and

Borges, E. N. (2022). On the methods to predict mois-

ture content on wood: A literature review. In ICEIS

(1), pages 521–528.

Mahesh, B. (2020). Machine learning algorithms-a re-

view. International Journal of Science and Research

(IJSR).[Internet], 9:381–386.

Quinlan, J. (1993). C4.5: Programs for Machine Learning.

Morgan Kauffman.

Rezende, R. N., Lima, J. T., de Ramos, L. E., Faria, A.

L. R., et al. (2010). Secagem ao ar livre de toras de

eucalyptus grandis em lavras, mg. Cerne, 16:41–47.

Russell, S. J. (2010). Artiﬁcial intelligence a modern ap-

proach. Pearson Education, Inc.

Steinwart, I. and Christmann, A. (2008). Support vector

machines. Springer Science & Business Media.

Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduc-

tion to Data Mining, (First Edition). Addison-Wesley

Longman Publishing Co., Inc., Boston, MA, USA.

Tavana, M., Nazari-Shirkouhi, S., Mashayekhi, A., and

Mousakhani, S. (2022). An integrated data min-

ing framework for organizational resilience assess-

ment and quality management optimization in trauma

centers. In Operations Research Forum, volume 3,

page 17. Springer.

Tomczak, A., Grodzi

nski, G., Jakubowski, M., Jelonek, T.,

and Grzywi

nski, W. (2018). Effects of short-term stor-

age method on moisture loss and weight change in

beech timber. Croatian Journal of Forest Engineer-

ing: Journal for Theory and Application of Forestry

Engineering, 39(1):35–43.

Vieira, P. A., Buainain, A. M., and Figueiredo, E. V. C.

(2019). O brasil alimentar

a a china ou a china engolir

o brasil? Revista Tempo Do Mundo, 2(1):51–82.

Yazdi, M. R. T., Mozaffari, M. M., Nazari-Shirkouhi, S.,

and Asadzadeh, S. M. (2018). Integrated fuzzy dea-

anﬁs to measure the success effect of human resource

spirituality. Cybernetics and Systems, 49(3):151–169.

Yegnanarayana, B. (2009). Artiﬁcial neural networks. PHI

Learning Pvt. Ltd.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

614