Automated Classiﬁcation of Building Objects Using Machine Learning

Nadeem Iftikhar

, Peter Nørkjær Gade, Kasper Møller Nielsen and Jesper Mellergaard

University College of Northern Denmark, Soﬁendalsvej 60, Aalborg, Denmark

Keywords:

Building Information Modeling, Machine Learning, Building Object Classiﬁcation, Digital Tools.

Abstract:

In the construction sector, digital technologies are being employed to enable architects, engineers and builders

in the creation of digital building models. Although these technologies come equipped with inherent classiﬁ-

cation systems, they also bring forth certain obstacles. Frequently, these systems categorize building elements

at levels that exceed their necessary speciﬁcity. To illustrate, these classiﬁcation systems might allocate values

at a broader granularity, such as “exterior wall” rather than at a more precise level, like “exterior glass wall

with no columns”. As a result, the manual classiﬁcation of building elements at a granular level becomes

essential. Nonetheless, manual classiﬁcation frequently results in inaccuracies and erroneous semantic details,

while also consuming a signiﬁcant amount of time. Precise and prompt classiﬁcation of building objects holds

signiﬁcant importance for activities like cost planning, construction cost management and overall procurement

processes. To address this, the current paper suggests an automated classiﬁcation approach for building ob-

jects, focusing on speciﬁc types, through the utilization of machine learning. The effectiveness of the proposed

system is showcased using real-world data from a prominent architectural ﬁrm based in Scandinavia.

1 INTRODUCTION

The construction sector is experiencing a digital revo-

lution in response to heightened demands for sustain-

ability, safety and user speciﬁcations. This shift to-

wards digital transformation necessitates the adoption

of novel procedures to oversee and harmonize digi-

tal workﬂows. In this context, Building Information

Modeling (BIM)

stands out as a renowned instrument

employed by architects, engineers and builders to

generate, oversee and distribute digital models. These

models encompass a substantial volume of building

data, comprising both geometric and informational el-

ements. Hence, it is critical to systematically classify

this building information in accordance with industry

standards, ensuring a streamlined and proﬁcient pro-

cess. A classiﬁcation system operates similar to a uni-

versal language, facilitating a clear and unambiguous

exchange of digital data across diverse BIM software

platforms and among various BIM users. The act

of classifying building information serves a dual pur-

pose. On one hand, it enhances communication across

all stakeholders engaged in construction projects. Si-

multaneously, it empowers project collaborators (ar-

chitects, engineers and builders) to effectively align

https://orcid.org/0000-0003-4872-8546

https://www.autodesk.com/industry/aec/bim

projects with requirements, schedules and ﬁnancial

allocations. While numerous building design tools

possess the capability to automatically recognize and

classify building objects, either within general cate-

gories like “door” or within speciﬁc families like “sin-

gle ﬂush door,” it remains essential to classify these

objects at the precise type level conforming to na-

tional or international standards before their appli-

cation in a construction project. Nevertheless, the

majority of building design tools lack the inherent

capacity to automatically classify building objects

at the type level or autonomously assign assembly

codes to them. These assembly codes comply to es-

tablished national or international classiﬁcation stan-

dards and remain easily modiﬁable over the course

of the project’s life cycle. Additionally, the assem-

bly codes play a pivotal role in dictating the organi-

zational structure of construction information, align-

ing it with the core building objects. Consequently,

the assignment of classiﬁcations at the assembly code

level is often carried out through manual intervention

by architects and engineers. For instance, using the

Cuneco Classiﬁcation System (CCS)

, widely used

in Denmark, a product like a “window” can be cat-

egorized as “[L]%QQA90102.01”, with the accom-

panying description indicating its nature as an “ex-

https://ccs.molio.dk/News/About CCS

Iftikhar, N., Gade, P., Nielsen, K. and Mellergaard, J.

Automated Classiﬁcation of Building Objects Using Machine Learning.

DOI: 10.5220/0012197500003598

In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - Volume 1: KDIR, pages 331-338

ISBN: 978-989-758-671-2; ISSN: 2184-3228

331

terior multidisciplinary window - type 01”. Despite

the availability of software applications designed to

facilitate automated management of CCS codes, the

correct codes still necessitate manual input in the ma-

jority of classiﬁcation systems.

The manual categorization of numerous objects at

a speciﬁc level within a building model introduces

complexities, consumes considerable time and intro-

duces potential ambiguities in the classiﬁcation pro-

cess. Additionally, a previous study (Flager and Hay-

maker, 2007) highlighted that over 50% of the time

and resources invested by architects and engineers are

allocated to the management of design information,

which includes object classiﬁcation. Moreover, errors

during the classiﬁcation process or inaccuracies in se-

mantic details can result in ﬂawed construction cost

estimates, incorrect construction practices, erroneous

material choices and related issues. Despite the ar-

ray of challenges associated with BIM object classiﬁ-

cation, including the search for relevant international

or national BIM standards, establishment and storage

of tailored classiﬁcation tables, collaborative distribu-

tion of classiﬁcation tables among project teams, nav-

igation through multiple classiﬁcation systems, time-

intensive processes and absence of automation. The

acceptable automation of BIM object classiﬁcation at

the precise type level remains an open research con-

cern within the construction sector. This paper at-

tempts to tackle this issue by introducing an approach

that employs supervised machine learning to auto-

mate the classiﬁcation of objects at the type or assem-

bly code level.

In summary, this paper’s key contributions can be

outlined as follows:

− Offering comprehensive insights through ex-

ploratory data analysis of building information;

− Introducing a comprehensive solution for classi-

fying building elements through the utilization of

diverse supervised machine learning approaches;

− Demonstrating practical implementation using ac-

tual building data extracted from a prominent con-

struction project executed by a leading Scandina-

vian architectural ﬁrm.

The paper’s organization is as follows: In Section 2,

a survey of related work is provided. The motivation

driving this work is elaborated in Section 3. The ex-

ploratory data analysis is detailed in Section 4. The

machine learning approach for automated building

object classiﬁcation is presented in Section 5. Sec-

tion 6 outlines the experimental ﬁndings. The paper

concludes in Section 7, also highlighting potential di-

rections for future research.

2 RELATED WORK

The focal point of this section centers on prior re-

search attempts related to automated building object

classiﬁcation through the application of semantic en-

richment. A state-of-the-art review by (Zabin et al.,

2022) indicated that there is a need to integrate ma-

chine learning into BIM processes. Similarly a study

by (Amor and Dimyadi, 2021) examined evolving

approaches for automated compliance checking and

pointed out that research in semantic enrichment of

BIM models is necessary. In addition, (Zhang and

El-Gohary, 2012) suggested a natural language pro-

cessing approach for automated information extrac-

tion from construction regulatory documents. Sim-

ilarly, an automated compliance checking approach

is presented by (Salama and El-Gohary, 2011). A

prototype software based on an inference rule engine

to semantically enrich the building models is intro-

duced by (Belsky et al., 2016) and its extension is

suggested by (Sacks et al., 2021). Further, (Bloch and

Sacks, 2019) applied both supervised machine learn-

ing and rule-inferencing to correctly classify rooms

types in residential apartments. The machine learning

approach provided better accuracy than rule-based ap-

proach. A data-driven iterative method for automated

classiﬁcation of objects in BIM is presented by (Wu

and Zhang, 2019). Moreover, a deep learning frame-

work to utilize both geometric and relational informa-

tion of BIM objects for classiﬁcation is proposed by

(Luo et al., 2022). Likewise, a deep learning frame-

work for classifying objects based on synthetic data

sets created from BIM objects is presented by (Fr

ıas

et al., 2022). In addition, (Kim et al., 2019) proposed

an approach for automating the classiﬁcation of build-

ing element instances within BIM. The approach uti-

lized deep learning based classiﬁcation technique that

uses images of objects as inputs. A machine learning

based solution to automatically recognise elements in

buildings information models is proposed by (Bassier

et al., 2017). The solution can efﬁciently classify the

basic components such as ﬂoors, ceilings, roofs, walls

and so on. Additionally, (Emunds et al., 2022) pro-

posed a neural network model based on sparse con-

volutions for the classiﬁcation of IFC-based geome-

try and semantic enrichment of BIM models. Con-

cludingly, the study conducted by (Koo et al., 2022)

examined the viability of deep and machine learn-

ing models in the context of automatically classifying

subtypes of door and wall elements.

These previous studies emphasize on various con-

ceptual aspects and recent advancements in semantic

enrichment and automated classiﬁcation in construc-

tion industry. It can be concluded from these previous

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

332

works that the absence of automation in the classiﬁca-

tion of building objects at speciﬁc type level remains

an open research issue. To our best understanding,

this paper stands among a limited number that con-

centrate on automated building object classiﬁcation

through machine learning. Additionally, this paper

also considers the practical aspects of automated clas-

siﬁcation to enhance operational effectiveness and ef-

ﬁciency within the construction industry.

3 MOTIVATION

Given that the construction industry employs BIM to

create digital presentation of buildings. Hundreds,

or even thousands, of objects are incorporated into

a model before its completion. In the majority of

instances, these objects are required to be manually

classiﬁed. As previously noted, manual classiﬁcation

within the domain of BIM poses a signiﬁcant chal-

lenge.

Figure 1: BIM complete 3D model.

In this paper, the focus is on automatic classiﬁ-

cation of building objects based on their assembly

codes. The assembly codes used in this paper are

found in BIM7AA standard

, where BIM7AA is a

simple encoding method for BIM objects based on

Danish building standards. For instance, a typical

“exterior precast concrete wall” has assembly code

211, and “interior frame structured wall” has as-

sembly code 224, and so forth. While building a

BIM model, an architect/engineer inserts a standard

“wall” and edits it into the desired detailed speciﬁca-

tions; these often change in the duration of a project.

Therefore, the correct assembly code based on Dan-

ish building standards is hard to predict. Hence, the

architects/engineers have to point-and-click their way

through every object in the BIM and classify their as-

sembly codes. Classifying hundreds or thousands of

objects manually is time-consuming and erroneous.

In addition, ambiguous classiﬁcation results in addi-

http://bim7aa.dk/index UK.html

tional and/or unwanted expenses.

Figure 2: BIM 3D model (walls only).

To provide an illustration, Fig. 1 portrays a com-

prehensive 3D BIM model featuring walls, roofs,

spaces, coverings, stairs and columns. Conversely,

Fig. 2 exclusively displays the walls present within

the model. The data set encompassing all wall el-

ements within the model comprises a total of 4025

walls, where each of these walls has over 160 fea-

tures. The features contains a wide range of data types

including numerical, alphanumerical, unique identi-

ﬁers, binary indicators and categorical attributes.

Table 1: Selected set of wall features.

Wall feature Value Description

Area 92.09 length * height

Base Constraint PLAN 10 ﬂoor level

Length 26489.99 wall’s length

Structural 1 yes 1/no 0

Structural Usage 1 non load-bearing 0/load-bearing 1

Type Id 2147048 wall type

Unconnected Height 4462.99 wall’s element height

Volume 49.39 walls’s volume

Width 540 wall’s width

Assembly code 211 exterior precast concrete wall

For the purpose of illustration, only a subset of 10

features out of 160 (for the outer wall distinguished

by its highlighted color in Fig. 2) has been chosen.

A snapshot of feature data is presented in Table 1.

The ﬁnal row in Table 1, displays the assembly code,

which is intended to be automatically assigned or pre-

dicted.

4 EXPLORATORY DATA

ANALYSIS

In this section, an exploratory data analysis of the

classiﬁcation problem has been carried out. The pri-

mary objective of this analysis is to investigate the

data set and pinpoint the features that wield an im-

pact on the assembly code. To address the extensive

array of features (totaling 160 in this instance), Prin-

Automated Classiﬁcation of Building Objects Using Machine Learning

333

cipal Component Analysis (PCA)

has been employed

to reduce their dimensionality. To evaluate the signif-

icance of the chosen features, one method involves

creating a heatmap that visualizes their correlations.

By depicting the correlations between these features,

a heatmap offers a graphical depiction of their inter-

relationships, facilitating the evaluation of their im-

portance. The correlation coefﬁcient is measured on

a scale that spans from +1 to 0 to -1. A correlation

of -1 indicates a robust negative correlation between

two values, while a correlation of +1 indicates strong

positive correlation, and a value of 0 implies no cor-

relation between them.

Figure 3: Correlation heatmap.

The correlation heatmap is depicted in Fig. 3.

Upon analyzing the heatmap, it becomes evident that

Structural and Structural Usage exhibit positive cor-

relation, suggesting that one of them can safely be

removed. The remaining attributes, except for Type

Id, display cross-correlation and do not raise any con-

cerns regarding their relevance. Another feature to

look at is Type Id in relation to the Assembly Code

(the target variable). Both of them display some-

what correlations with the other attributes in a similar

fashion, indicating a degree of similarity in behavior.

This similarity raises the possibility of data leakage,

thus motivating the removal of Type Id. Data leakage

occurs when the model is trained using an attribute

that describes the target variable in some manner. In

such cases, the model is prone to making overly op-

timistic predictions. Furthermore, during exploratory

data analysis, it is noted that thirteen different classi-

ﬁcations are distributed across 4025 wall instances, as

illustrated in Fig. 4.

The depicted ﬁgure illustrates the representation

of different variations of Assembly Codes within the

data set. The Y-axis corresponds to the count of rows,

https://www.turing.com/kb/guide-to-principal-

component-analysis

Figure 4: Unequal distribution of assembly codes in the

training data set.

while the X-axis corresponds to the various possible

classiﬁcations. Among these classiﬁcations, namely

224, 221, 225, 211 and 226, hold dominance with

a combined representation of 3758 instances. Con-

versely, the remaining eight classiﬁcations are no-

tably non-existent, accounting for only 267 exam-

ples. This evident imbalance in the data set raises

concerns. Imbalanced data set can result in models

that exhibit substantial bias, making it extremely dif-

ﬁcult or even unattainable for the model to accurately

classify the underrepresented types. To address this

issue stemming from imbalanced data, over-sampling

has been implemented. Over-sampling involves gen-

erating multiple additional entries for the minority

classes, thereby achieving a balanced representation.

An effective approach for this is the the Synthetic Mi-

nority Oversampling Technique (SMOTE) (Chawla

et al., 2002). Through the application of the SMOTE

algorithm, the data set has undergone a substantial ex-

pansion, growing from its original 4025 rows to ap-

proximately 25000 rows.

5 MACHINE LEARNING

APPROACH

In this section, the automated building object clas-

siﬁcation system utilizing machine learning is intro-

duced. It features a web application for user interac-

tion and an API for data and model management. Su-

pervised learning algorithms and preprocessing tech-

niques are used to train the classiﬁer. The system’s

performance is evaluated using metrics, and the best

model is then deployed for label predictions on new

data.

5.1 Problem Formulation

In the realm of construction, a variety of build-

ing objects are often encountered, denoted as O =

, o

, ..., o

}. Each object o

is distinguished by

a set of features F

= { f

, f

, ..., f

} and is assigned

an assembly code C

based on a speciﬁc classiﬁca-

tion standard. The main goal is to devise a classiﬁer

φ : O → C that assigns each building object o

to an

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

334

assembly code C

, with the aim of minimizing the pre-

diction error E =

∑

i=1

L(C

, φ(o

)). Here, L(C

, φ(o

))

represents a loss function that quantiﬁes the differ-

ence between the actual assembly code C

and the pre-

dicted assembly code φ(o

5.2 Methodology

To achieve this, supervised machine learning algo-

rithms are proposed for training the classiﬁer φ using

a labeled data set D = {(o

, C

), (o

, C

), ..., (o

, C

)}.

The performance of this classiﬁer is then evaluated

using metrics such as accuracy, balanced accuracy

and F1-score. To further enhance the performance of

the model, PCA is proposed for dimensionality reduc-

tion and SMOTE for addressing data imbalance.

API

Preprocessing

Test Data Training Data

Train Models

Test Models Best Model

BIM Data

Database

BIM Data

Web App

BIM Modeling Tool

3. Request/Return

4. Save

5. Process

1. Export

6. Collect

7. Split (Labeled Data)

9. Use

8. Use

11. Save/Model Retraining

13. Predict Labels

14. Import (Enriched data)

12. New Data (Unlabeled Data)

2. Request/Response

10. Select

BIM Data

Figure 5: Process ﬂow.

5.3 Process Flow

The process ﬂow of the proposed solution is detailed

in Algorithm 1 and visualized in Fig. 5. The solu-

tion employs multi-classiﬁcation algorithms for accu-

rate predictions. Prior to implementation, BIM data

must be manually extracted from a BIM software tool

such as, Autodesk Revit

or Speckel

. Post prediction,

the enriched BIM data is manually imported back into

the BIM modeling tool. The solution comprises two

main components: developing a classiﬁer using la-

beled data and utilizing this classiﬁer for predicting

assembly codes. The classiﬁcation process, depicted

in Fig. 5, unfolds through the following subsections.

https://www.autodesk.com/products/revit/

https://www.speckel.io/

Result: Enriched CSV ﬁle (O

, C

pred

, P)

Step 1: Export BIM data D as CSV ﬁle(s)

containing objects O (with features F) and labels

Step 2: Load CSV ﬁle(s) into web application and

select features F and target C;

Step 3: Preprocess (F) to get preprocessed

features (

F);

Step 4: Train classiﬁer φ using selected machine

learning algorithms on training subset

(

train

, C

train

) and evaluate performance using

metrics L, where L = evaluate(φ,

test

, C

test

);

Step 5: Predict labels C

pred

and conﬁdence scores

P for new data using best model:

pred

, P) = φ

best

);

Algorithm 1: Automated building object classiﬁcation.

5.3.1 Export BIM Data

1. CSV ﬁle(s) are manually exported from the BIM

modeling tool(s), containing the required data

(D). The CSV ﬁle(s) should include the features

and labels of the building objects that need to

be classiﬁed according to a speciﬁc classiﬁcation

standard. The features can be numerical, alphanu-

merical, unique identiﬁers, binary indicators, or

categorical attributes. The labels can be assembly

codes or other types of classiﬁcations that follow

a standard encoding method.

5.3.2 Web Application Usage

2. The web application (W ), a user-friendly inter-

face, allows users to interact with the system by

selecting features and requesting predictions. It

communicates with the API, a software interme-

diary that connects the web application with the

database and the machine learning models.

3. The web application provides a “Find Features”

button that prompts the presentation of all po-

tential features extracted from the data. Users

can choose the features (F

) that are relevant for

the classiﬁcation task and the target (C

) variable

that represents the desired output. For example,

“Area”, “Length”, “Width”, “Structural Usage”,

etc. can be selected as features and “Assembly

Code” as target.

4. Data (D) containing the chosen features and tar-

get is stored in a database, which is a structured

collection of data that can be accessed and manip-

ulated by the system. The database ensures that

the data is organized and secure.

− A classiﬁer (φ) is created using selected ma-

chine learning algorithms (Random Forest

(RF), Gradient Boosting (GB), and K-Nearest

Automated Classiﬁcation of Building Objects Using Machine Learning

335

Neighbors (KNN)). These algorithms are able

to learn from labeled data and make predictions

for new data. The subsequent preprocessing

phase is initiated when the user activates the

“Create Classiﬁer” button.

5.3.3 Preprocessing Data

5. The preprocessing phase is automated, eliminat-

ing the need for manual involvement. This phase

retrieves the data (D) containing the previously

chosen features and target from the database.

− Data (D) is cleaned by handling missing or

duplicate values, outliers and inconsistencies.

During this phase appropriate methods are also

applied to handle any errors or anomalies in the

data. For example, missing or duplicate val-

ues can be removed or imputed, outliers can be

detected or treated and inconsistencies can be

resolved or corrected.

− Data (D) is scaled or normalized using appro-

priate methods to transform the values of the

features into a common range. This helps to

reduce the effect of different units or magni-

tudes on the performance of the machine learn-

ing models. For example, scaling or normal-

ization methods can include min-max scaling,

standardization or log transformation.

− Categorical data (D) is encoded using appropri-

ate methods to convert categorical features into

numerical values that can be used by the ma-

chine learning models. Categorical features are

those that have a ﬁnite number of possible val-

ues that represent categories or classes. For ex-

ample, encoding methods can include label en-

coding, one-hot encoding or ordinal encoding.

− Dimensionality of data (D) is reduced using

PCA (PCA(D)), which is a technique that trans-

forms a large number of correlated features into

a smaller number of uncorrelated components

that capture most of the variance in the data.

This helps to reduce noise and redundancy in

the data and improve computational efﬁciency

and performance of the machine learning mod-

els.

− Imbalance of data (D) is addressed using

SMOTE (SMOTE(D)), which is a technique

that generates synthetic samples for the minor-

ity classes in the data set to balance their repre-

sentation with the majority classes. This helps

to reduce bias and improve accuracy and gen-

eralization of the machine learning models.

6. Processed data (D) is reinserted into the database

for further use by the system.

5.3.4 Training and Testing Models

7. The system splits data (D) into training and test-

ing subsets (D

train

and D

test

). The training sub-

set is used to train the classiﬁer using the selected

machine learning algorithms. The testing subset

is used to test the classiﬁer and evaluate its per-

formance using evaluation metrics.

8. The classiﬁer (φ) is trained using selected ma-

chine learning algorithms (RF, GB, and KNN)

and the training subset (D

train

). These algorithms

learn from labeled data and make predictions for

new data. The training subset is used to ﬁt the pa-

rameters of the algorithms and optimize the clas-

siﬁer.

9. The classiﬁer (φ) is tested using the testing subset

test

) and performance is evaluated using met-

rics (accuracy, balanced accuracy, and F1-score).

These metrics measure how well the classiﬁer can

predict the correct labels for new data.

10. The best model is selected based on highest met-

ric scores among the three algorithms. The best

model is the one that can achieve the highest accu-

racy, balanced accuracy and F1-score on the test-

ing subset.

11. The best model is saved in the database for future

use and potential retraining.

5.3.5 Label Prediction

12. New unlabeled data (D

) is loaded from a BIM

modeling tool. The “Predict Labels” button can be

used to utilize the classiﬁer for predicting labels

for new data. The new data is a set of building

objects that need to be classiﬁed according to a

speciﬁc classiﬁcation standard. The new data is

loaded from a BIM modeling tool as a CSV ﬁle.

− The new data (D

) is preprocessed using the

same methods as before. The system prepro-

cesses the new data ensuring that it is compati-

ble with the classiﬁer and has the same format

and structure as the training and testing data.

− The best model is loaded from the database.

This model was selected based on the highest

metric scores in the previous step.

13. Labels (C

) for new data (D

) are predicted using

the best model. The labels are assembly codes

or other types of classiﬁcations that follow a spe-

ciﬁc classiﬁcation standard. Conﬁdence scores

for each prediction are also generated, indicating

how conﬁdent the model is about its prediction.

The enhanced data set is showcased in Table 2.

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

336

Table 2: Snapshot of the predicted assembly codes with probabilities.

Area Base constraint Length Structural Usage Unconnected Height Volume Width Assembly Code Probability

.. .. .. .. .. .. .. .. ..

19.6495 PLAN 03 4975 0 4533 2.7954 145 224 1

1.4679 PLAN 12 1472.4999 0 1110 0.2128 145 225 1

4.6728 PLAN 11 2010 0 2360 0.3738 80 221 1

11.9191 PLAN 01 2992.5000 1 4533 3.5757 300 221 0.8466

0.4799 ANIMAL STABLE 3199.9999 0 150 0.036 75 223 1

2.4478 PLAN 03 270 1 4533 1.3212 540 221 0.7125

13.7785 PLAN 01 3600 0 4538 1.9978 145 224 1

3.81 PLAN 02 2050 0 2000 0.3619 95 225 0.6966

13.8608 PLAN 03 3552 0 4533 2.0098 145 224 1

11.6124 PLAN 05 2727 0 4333 1.5386 132 224 1

22.8693 ROOF 6564.9997 0 3421 5.4886 240 217 0.6333

5.015 PLAN 12 2055 0 2360 0.3009 60 221 1

2.3101 PLAN 04 3650 1 650 0.4435 192 214 0.98

23.3638 PLAN 03 7115.2569 0 4533 3.3188 145 212 1

.. .. .. .. .. .. .. .. ..

− An enriched CSV ﬁle containing new data (D

predicted labels (C

) and conﬁdence scores (P)

is generated. This ﬁle can be used to update or

enrich the BIM model with accurate and con-

sistent classiﬁcations.

14. The enriched CSV ﬁle is manually imported into a

BIM modeling tool to complete the classiﬁcation

process.

By adhering to this process ﬂow, the suggested solu-

tion facilitates the establishment and utilization of a

classiﬁer, allowing automated prediction of assembly

codes within building objects.

6 EXPERIMENTS

Within this section, an evaluation of the models em-

ployed to classify building objects is conducted. The

experiments are carried out using real-world data de-

rived from a building project.

6.1 Setup

For conducting multi-class classiﬁcation, three super-

vised machine learning algorithms (Random Forest

(RF), Gradient Boosting (GB) and K-Nearest Neigh-

bors (KNN)) were employed for experimentation. RF,

GB and KNN were chosen for their unique strengths

in handling building object classiﬁcation. RF’s ro-

bustness to overﬁtting and ability to handle high-

dimensional data make it suitable for large data sets.

GB is known for its high accuracy and provides fea-

ture importance, crucial for understanding inﬂuential

characteristics in building object classiﬁcation. KNN

offers a simple yet effective approach, making no as-

sumptions about the data distribution, which is ben-

eﬁcial when classifying objects with similar features.

Collectively, these algorithms provide a comprehen-

sive and robust approach to the classiﬁcation task.

The experiments were conducted on the extended

data set comprises 25000 wall objects, each with a list

of over 160 features. The algorithms have run on a

single-node hardware platform with a 8th Generation

Intel Core i7-8565U 1.8 GHz processor, 32GB DDR4

RAM and 1TB SSD. The reported outcomes were de-

rived from running each algorithm 20 times, and the

results were averaged over the best 5 executions.

6.2 Test Results

Table 3 displays the classiﬁcation model test results.

It is emphasized that accuracy may not be reliable for

imbalanced data sets, hence balanced accuracy and

F1-score are also considered. Accuracy is the ratio

of correct predictions to the total predictions made.

Balanced accuracy represents average recall per class,

while F1-score is the harmonic mean of precision and

recall, with all having an optimal value of 1 and a

minimum value of 0.

Table 3: Evaluation metrics for classiﬁcation models.

Model Accuracy Balanced accuracy F1-score

RF 0.93 0.87 0.94

GB 0.90 0.83 0.91

KNN 0.82 0.71 0.81

Based on the evaluation metrics presented, the RF

model stands out as the most suitable choice, demon-

strating generally high scores (above 85%), which are

deemed acceptable in building object classiﬁcation

contexts.

Automated Classiﬁcation of Building Objects Using Machine Learning

337

7 CONCLUSIONS AND FUTURE

WORKS

In the construction sector, seamless collaboration be-

tween stakeholders, such as architects, engineers, and

builders, is essential to avoid miscommunications

arising from different terminologies. Classiﬁcation

systems address this by offering a standardized lan-

guage throughout the project life-cycle, from concep-

tion to maintenance. This process involves assigning

unique codes to each object within a BIM model, thus

facilitating accurate quantity evaluations, cost estima-

tions and comprehensive project planning. In this pa-

per, an automated approach for classifying building

objects at a speciﬁc type level has been presented, uti-

lizing machine learning algorithms such as Random

Forest, Gradient Boosting, and K-Nearest Neighbors.

The effectiveness of this classiﬁcation technique was

veriﬁed with a real-world data set, showing encour-

aging results. The proposed system, although promis-

ing, has limitations including data quality dependency

and possible inaccuracies due to algorithm assump-

tions. Its scalability and adaptability to other projects

or classiﬁcation schemes are yet to be conﬁrmed, with

its current evaluation limited to a speciﬁc project.

Future research should focus on improving data

quality and feature selection, experimenting with var-

ious machine learning algorithms, optimizing system

scalability and conducting assessments across a range

of projects and classiﬁcation schemes.

ACKNOWLEDGEMENTS

We sincerely thank Jan Buthke and the team at LINK

Arkitektur for their signiﬁcant contribution to this pa-

per, providing real data set that greatly enriched the

quality and relevance of our research.

REFERENCES

Amor, R. and Dimyadi, J. (2021). The promise of au-

tomated compliance checking. Developments in the

built environment, 5:100039.

Bassier, M., Vergauwen, N., and Genechten, B. V. (2017).

Automated classiﬁcation of heritage buildings for as-

built bim using machine learning techniques. In IS-

PRS Annals of the photogrammetry, remote sensing

and spatial information sciences, pages 25–30.

Belsky, M., Sacks, R., and Brilakis, I. (2016). Seman-

tic enrichment for building information modeling.

Computer-Aided Civil and Infrastructure Engineer-

ing, 31(4):261–274.

Bloch, T. and Sacks, R. (2019). Comparing machine learn-

ing and rule-based inferencing for semantic enrich-

ment of bim models. Automation in Construction,

91:256–272.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,

W. P. (2002). Smote: synthetic minority over-

sampling technique. Journal of artiﬁcial intelligence

research, 16:321–357.

Emunds, C., Pauen, N., Richter, V., Frisch, J., and

van Treeck, C. (2022). Sparse-bim: classiﬁcation

of ifc-based geometry via sparse convolutional neu-

ral networks. Advanced Engineering Informatics,

53:101641.

Flager, F. and Haymaker, J. (2007). A comparison of mul-

tidisciplinary design, analysis and optimization pro-

cesses in the building construction and aerospace in-

dustries. In 24th International Conference on Infor-

mation Technology in Construction, pages 625–630.

ıas, E., Pinto, J., Sousa, R., Lorenzo, H., and D

ıaz-

Vilari

no, L. (2022). Exploiting bim objects for syn-

thetic data generation toward indoor point cloud clas-

siﬁcation using deep learning. Journal of Computing

in Civil Engineering, 36(6):04022032.

Kim, J., Song, J., and Lee, J. K. (2019). Recognizing

and classifying unknown object in bim using 2d cnn.

In 18th International Conference on Computer-Aided

Architectural Design Futures, pages 47–57. Springer.

Koo, B., Jung, R., , and Yu, Y. (2022). Automatic classi-

ﬁcation of wall and door bim element subtypes using

3d geometric deep neural networks. Advanced Engi-

neering Informatics, 47:101200.

Luo, H., Gao, G., Huang, H., Ke, Z., Peng, C., and

Gu, M. (2022). A geometric-relational deep learn-

ing framework for bim object classiﬁcation. In Com-

puter Vision–ECCV 2022 Workshops, pages 349–365.

Springer.

Sacks, R., Ma, L., Yosef, R., Borrmann, A., Daum, S.,

and Kattel, U. (2021). Semantic enrichment for

building information modeling: Procedure for com-

piling inference rules and operators for complex ge-

ometry. Journal of Computing in Civil Engineering,

31(6):04017062.

Salama, D. M. and El-Gohary, N. M. (2011). Semantic

modeling for automated compliance checking. In In-

ternational Conference on Computing in Civil Engi-

neering, pages 641–648. ASCE Library.

Wu, J. and Zhang, J. (2019). New automated bim ob-

ject classiﬁcation method to support bim interoper-

ability. Journal of Computing in Civil Engineering,

33(5):04019033.

Zabin, A., Gonz

alez, V. A., Zou, Y., and Amor, R. (2022).

Applications of machine learning to bim: A system-

atic literature review. Advanced Engineering Infor-

matics, 51:101474.

Zhang, J. and El-Gohary, N. (2012). Extraction of construc-

tion regulatory requirements from textual documents

using natural language processing techniques. In In-

ternational Conference on Computing in Civil Engi-

neering, pages 453–460. ASCE Library.

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

338