Simulation Daily Mobility using J48 A l g o r i t h m s o f M a c h i n e L e a r n i n g

for Predicting Workplace

Khalid Qbouche

and Khadija Rhoulami

LRIT associated unit to CNRST (URAC°29) Faculty of Science, Mohammed V University in Rabat 4 Av.Ibn Battouta B.P.

1014 RP, 10006 Rabat, Morocco

DESTEC, FLSHR Mohammed V University in Rabat, Morocco

Keywords: Daily Mobility, Multi-agent System, Machine Learning, Gama platform.

Abstract: Nowadays, the urban development of the city has led to changes in various fields, such as population growth

and its daily various activities. These activities have been influenced by the development, concerning either

air, water, or land mobility. Mainly, human mobility is defined in terms of it. This latter fact makes it easy

for researchers to gain realistic insights for a rational simulation of human mobility in general and

workplace-related mobility in particular. More precisely, this paper will focus on j48 algorithms of Machine

Learning to predict a potential workplace, and in parallel to this, a tiny Multi-Agent system will be useful to

simulate the Rabat region's main traffic.

INTRODUCTION

Many academics are now investigating challenges in

metropolitan settings to develop more intelligent

uses of available resources for daily movement.

Daily Mobility (DM) is a subject area that tries to

simulate human motion in an urban setting. It is

especially significant to this study since any analysis

should have an impact on public transportation

systems and related subjects. Human mobility is

constantly dependent on a variety of circumstances,

such as traffic congestion, infrastructural design, and

weather conditions. As a result, the dynamic of

human movement always relates to other critical

facts. As a result, the dynamic of human movement

constantly relates to other critical accompanying

data, which when rationally evaluated may

undoubtedly lead to the notion of flawless

automatization, namely, smart city (SC) and

intelligent transportation system (ITS). Modeling

everyday mobility may be handled from a variety of

angles depending on the degree of information that

can be captured by the model. Aside from that, in

recent years, a slew of new models have emerged.

These models are frequently classified as

macroscopic, microscopic, or mesoscopic based on

their degrees of representation [1].

The macroscopic model captures a few

aggregated traffic characteristics such as average

density and average flow, and average speed.

Second, a microscopic model is a model that is used

to describe and track the activity of people. Finally,

the mesoscopic model is beneficial to the likely

distribution chain and its processes of movement.

Despite its relevance, we constructed a model using

HCP data from 2014 and applied it to ONDH

(Observatory National Development Human) data

from 2019, which did not include the workplace

variable. Thanks to a collection of categorical data,

we were able to apply the j48 algorithms of (ML) to

create the workplace model of prediction. Overall,

utilizing the Gama multi-agent system's design, we

were able to create a provided supplementary

foundation that allowed us to mimic the everyday

mobility of persons using the ONDH 2019 data [2].

In reality, structure contributes significantly to

the meaning of my writing. The second section gives

an overview of sustainable urban mobility. The

selected technique of prediction J48 and the method

of simulation are discussed in Section 3. Following

that, part 4 describes data used for construct the

model of prediction using data of HCP by

algorithms of prediction J48, while section 5

describes our experiments of prediction workplace

in the rabat region for persons and simulation their

displacement by system multi-agent before the

conclusion.

392

Qbouche, K. and Rhoulami, K.

Simulation Daily Mobility using J48 Algorithms of Machine Learning for Predicting Workplace.

DOI: 10.5220/0010735200003101

In Proceedings of the 2nd International Conference on Big Data, Modelling and Machine Learning (BML 2021), pages 392-398

ISBN: 978-989-758-559-3

RELATED WORKS

We discuss relevant work on daily mobility

simulation in this part; some of the works employ

data from mobile phone networks, GPS-based data,

and social media data.

Using HCP data 2014 provided by the Gama

platform, Khalid Qbouche and Khadija Rhoulami

[16] recreate everyday movement in the Rabat

region. Zargayouna and Mehdi [17] offer a

multimodal trip simulator that enables network

knowledge and prediction. In addition, web apps

might be used to track individual travelers. Through

powerful ad-hoc software combining Natural

Language Processing and Sentiment Analysis field

tools, Serna, Ainhoa, Gerrikagoitia, Jon, Bernabe,

Unai, and Ruiz, Tomas [18] investigated empirically

the feasibility of the automatic identification of

Sustainable Urban Mobility problems in the

discourses generated by the UGC. The WHO-

WHERE-WHEN (3W) model, proposed by Smolak,

Kamil, Rohm, Witold, Knop, Krzysztof, and Sila-

Nowicka, Katarzyna [19], is an enhanced privacy-

protective mobility modeling approach for synthetic

mobility data creation.

METHODS

3.1 Multi-agent System

There are various platforms available for designing

and implementing a multi-agent simulator, but there

are differences between them. One of the most

important selection criteria for a simulation platform

in the context of type applications is its capacity to

develop geospatial agent-based models.

Furthermore, integrating and processing geographic

data is simple. We have numerous multi-agent

platforms based on this requirement, such as Jade,

Mason, and Madkit; some of these platforms are as

follows: Swarm is a simulation platform that

includes a framework for importing GIS data layers

[3]. It does not, however, provide spatial primitives

or the ability to save the resulting environment.

Netlogo supports GIS data import and export, as

well as certain basic geometrical procedures [4], but

not sophisticated geographical analytic operations

Matsim is a popular platform for micro-mobility

simulations as well as an open-source framework for

building large-scale agent-based transportation

simulations. [5]. Transims simulates multimodal

movements and analyzes the impacts of traffic or

demography policy changes, as well as offering a

multi-agent simulation prototype capable of testing

planning scenarios and specifying individual actions

[6]. However, none of these solutions take linked

people into account. Connected passenger's paths are

continually monitored in the concept, and

alternatives are suggested to them in the event of

disruptions. AgentPolis is a multi-agent multimodal

transportation platform [7]. However, none of these

solutions take linked people into account. Gama is a

framework for developing spatially explicit agent-

based simulations [8] as well as geographic

information systems (GIS) applications (network

modeling as a graph, computation of shortest paths,

visualization, and management of 2D and 3D data,

etc.). We selected the Gama platform in our research

since it specializes in simulating individuals in

metropolitan networks, such as [16]. For spatially

explicit agent-based simulation, it also functions as a

modeling and simulation development environment.

[8]. Gama has also been created using a highly

generic approach for many application domains and

may be utilized for a wide range of applications [8].

3.2 Weka: Data Mining Software

A supervised learning approach was used to create

the model used in our research. WEKA, an open-

source and free knowledge analysis program [10],

was utilized in the software tool. WEKA employs a

variety of machine learning techniques.

The WEKA workbench includes a set of

visualization tools and algorithms for predictive

modeling, as well as graphical user interfaces that

make this capability accessible. The data will be

kept in JavaDB, while the presentation of the

findings and construction of the prototype was done

in JAVA. Weka 3.9.0 [9], a Java-based version, is

utilized in a wide range of applications, including

education and research.

Figure 1: WEKA GUI chooser

All of the typical Data Mining issues are covered

by the workbench, including regression,

classification, clustering, association rule mining,

and attribute selection.

Simulation Daily Mobility using J48 Algorithms of Machine Learning for Predicting Workplace

393

3.3 Algorithms for Classification

The act of giving labels to test patterns based on

previously identified training patterns is known as

classification. A learning phase, in which the

classification algorithm is trained, and a

classification phase, in which the algorithm labels

fresh data, are two frequent divisions of this process.

Additionally, there are two types of machine

learning: supervised and unsupervised.

All algorithms use a single collection to store

their data, read from a file, or create a database

query. Many of the algorithms in Machine Learning

are: Ross Quinlan [11] developed the ID 3 (Iterative

Dichotomizer 3) algorithm. It's used to make a

decision tree out of a collection of facts.

C4.5 now has ID3: From a set of data containing

class labels, classification creates a model of classes.

It's also a machine learning and data mining method

that works well with categorization issues. For the

target variable's forecast. The desired distribution of

the data may be easily understood with the

assistance of a tree classification method. J48 is a

kind of ID3 that has been extended. In J48, you may

use features like missing value accounting, decision

tree pruning, continuous attribute value ranges, rule

derivation, and more. The Java version of the C4.5

method is the J48 algorithm in the WEKA data

mining tool. With the WEKA tool, you have a lot

of options when it comes to tree pruning.

Pruning can be done to fine-tune a potential

over-fitting situation. The classification is repeated in

additional algorithms until each leaf is pure; that is,

the data categorization should be as perfect as

feasible. This algorithm creates the rules that

determine the data's specific identification. The

objective is to generalize a decision tree

progressively until it reaches a balance of

flexibility and accuracy [12]. The leaves formed a

class in a decision tree node in the center of the

characteristics of the data being tested, and the

branch is the outcome of the test attributes (records)

[13].

The Bayesian method is used to estimate the

likelihood of various assumptions. Furthermore, the

simplest type of Bayesian network is Naive Bayes,

in which all attributes are independent of the class

variable's value [14]. Furthermore, Naive Bayes is a

straightforward method for developing classifiers,

which are models that give class labels to issue

instances represented as vectors of feature values,

with the class labels selected from a limited range of

options. There is no one method for training such

classifiers; rather, many methods based on the same

concept exist: all naive Bayes classifiers assume that

the value of one feature is independent of the value

of any other feature, given the class variable. An

apple, for instance, is a red, spherical fruit with a

diameter of about 10 cm. Regardless of any possible

connections between the color, roundness, and

diameter data, a naive Bayes classifier examines

each of these properties to contribute independently

to the likelihood that this fruit is an apple. Naive

Bayes classifiers may be learned very quickly for

certain probability models in a supervised learning

environment. The maximum likelihood technique is

utilized to estimate parameters for naive Bayes

models in many practical situations; in other words,

the naive Bayes model may be employed without

using Bayesian probability or any Bayesian

processes [20].

CART stands for Classification and Regression

Tree. It's a way of making a binary decision tree

with two branches for each node.

By defining the category of test documents, the

K-NN method is used to evaluate the degree of

similarity between documents and k training data

and store a specific quantity of classification data.

This technique is an instant-based learning algorithm

that categorizes objects using the training set's

nearest feature space. The training sets are

represented in a multidimensional feature space. The

training set's category is used to divide the feature

space into regions. If the most common category

among the k closest training data, a point in the

feature space is allocated to that category. In most

cases, Euclidean Distance is employed to calculate

the distance between the vectors. The availability of

a similarity metric for finding neighbors of a given

document is a crucial component of this technique.

The training step consists solely of storing the

training set's feature vectors and classifications.

Distances between the new vector, which represents

an input document, and all stored vectors are

computed in the classification phase, and the k

closest samples are chosen. The closest point

allocated to a specific category is used to forecast

the annotated category of a document [21].

For our research, we utilized the J48

classification method, which is excellent for high

accuracy from the dataset sections in [14].

Furthermore, in [22], it has the greatest classification

accuracy (80.46%) for predicting a user's approval

of re-orientation systems. This technique applies to

discrete data, like in our instance of predicting

workplace for the new database HCP 2019 [1]

utilizing j48 Machine Learning Algorithms.

BML 2021 - INTERNATIONAL CONFERENCE ON BIG DATA, MODELLING AND MACHINE LEARNING (BML’21)

394

PARAMETERS AND DATA

4.1 Data Used

We built our model of prediction using individual

characteristics using HCP data as demographic

indicators [1] in our research (profession status,

education level, activity area, Type of activity ...).

The following is an example of this characteristic:

𝑅 =< 𝑝1, 𝑝2, 𝑝3 ……., 𝑝𝑛 > 𝑝𝑖 is a group of

people.

𝐶 =< 𝑐1, 𝑐2, 𝑐3 … …. , 𝑐𝑛 > An individual's

characteristics. The R record is a set of 2014 Hcp

census statistics for each individual's Rabat region,

as shown in the class diagram below:

Figure 2: The inhabiting class diagram in Rabat Region.

The model outlines the characteristics of a unique

identification called a "reference" is assigned to each

person. It also has POI (Point of Interest), which we

described as a set:

Map=<Refer, FirstName, LastName, Sexe,

Placeid, Address>.

We have described attributes of the HCP data 2014

[1] in the following picture, which may be used to

create a model of prediction using method j48. Some

of the features are:

Table 1 lists the characteristics of our prediction model.

Attributes Description

Aggregate

educational

level

0: No education level 1: Preschool 2:

Primary 3: College Secondary 4:

Qualifying Secondary 5: Superio

Education

Secto

1: Public 2: Private 3: Not determined

4: No education level

Schooling 1: Enrolled in general education (and

having completed the year) 2: Enrolled

in general education (and not having

completed the year) 3: Enrolled in

vocational training (and having

completed the year) 4: Enrolled in

vocational training (and not having

completed the year) 5: Unschooled 6:

Not determined 7: Person under 3 years

old 8: Person 49 years of age or older 9:

Age not determine

Activity

type

0:Active occupied 1:Unemployed who

has never worked 2: Unemployed

person who has already worked 3:

Housewife 4: Pupil / Student 5: Other

inactive 6: Not determine

Profession

Grand Group

0:0 - Members of the legislative bodies,

local elected officials, hierarchical

officials of the public service and

directors, and executive 1: 1 - Senior

executives and members of the liberal

professions. 2: 2 - Technicians and

intermediate professions 3 : 3 -

Employees 4: 4 - Merchants and

commercial and financial intermediaries

5: 5 - Farmers, fishers of fish and other

aquatic species, foresters, hunters and

workers as 6: 6 - Craftsmen and skilled

trades workers (except agricultural

workers) 7: 7 – Agricultural and fishing

workers and laborers (including skilled

workers) 8: 8 - Plant and machine

operators and assembly workers 9: 9 -

Non-farm laborers, material handlers,

and small trades workers 10: X –

Workers who cannot be classified by

occupation 11: Type of activity not

determined 12: Unemployed who has

never worked 13: Inactive

Status

Profession

1: Employer / Member of a cooperative

2: Independent 3: Home help /

Apprentice 4: Public sector employee 5:

Private sector employee 6: Other 7: Not

determined 8: Type of activity not

determined 9: Type of activity not

determined 10: Inactive

Activity

Section

1:A - Agriculture, forestry and fishing 2:

B – Extractive industries 3: C -

Manufacturing industries 4 : D -

Electricity, gas, steam and air

conditioning 5: E - Production, and

distribution of water, sanitation, waste

management and remediation 6: F –

Construction 7: G - Sale and repair of

motor vehicles and motorcycles 8: H -

Transport and storage 9: I -

Accommodation and catering 10: J –

Information and communication 11: K -

Financial and insurance activities 12: L -

Real estate activities 13: M -

Professional, scientific and technical

activities 14 : N - Administrative and

support service activities 15: O - Public

administration 16 : P - Education 17: Q -

Human health and social action 18: R -

Arts, entertainment, and recreational

activities 19: S - Other service activities

20: T - Activities of households as

Simulation Daily Mobility using J48 Algorithms of Machine Learning for Predicting Workplace

395

employers and undifferentiated activities

of households as producers of goods 21:

U - Extraterritorial activities 22: Not

determined 23: Type of activity not

determined 24: Unemployed who has

never worked 25: inactive

Work Place 0: Home 1: District / Douar of residence

2: Other districts/douar in the

municipality of residence 3: Other

municipality in the province of residence

4: Other provinces 5: Non-fixed location

6: Other places 7: Not determined 8:

Type of activity not determined 9:

Unemployed 10: inactive

The components of our model are depicted in the

diagram below:

Figure 3: Using WEKA Explorer to view the description

characteristics.

5 EXPERIMENTS AND THEIR

OUTCOMES

In this part, we show how we used j48 machine

learning methods to construct the workplace

prediction model, which we applied to the ONDH

data 2019 [23] to obtain

comprehensive information

on each individual. As a result, using ONDH data

[23] 2019, we were able to replicate individual daily

movements in the Rabat region.

5.1 Classification Data Set

Classification is the process of creating a model of

classes from a set of records with class labels. We

require one set of data to train this model, which is

referred to as the training data set, to build the

model. In this study, the percentage values for

training and testing were 70% for training and 30%

for testing. The categorization of our model is

depicted in the figure below.

Figure 4: Classification of training data using Decision

Tree (j48)

5.2 Matrix of Confusion

A confusion matrix is a table arrangement that

allows you to see how well an algorithm performs.

Furthermore, each row of the matrix represents

examples belonging to a predicted class, whereas

each column represents instances belonging to an

actual class. In addition, we show our model's

confusion matrix in the figure below.

Figure 5: Confusion Matrix of test data using Decision

Tree (J48)

5.3 The

R e s u l t o f t h e W o r k p l a c e

Prediction

Following our model's training and testing, data

from an unknown workplace was entered into the

system for prediction. The projected production of a

particular workplace is depicted in the diagrams

below.

BML 2021 - INTERNATIONAL CONFERENCE ON BIG DATA, MODELLING AND MACHINE LEARNING (BML’21)

396

Figure 6: The result of the prediction model

Figure 7: The result of the prediction model

Figure 8: The result of the prediction model.

5.4 Simulation of a Person's Relocation

using the Gama Platform

We offer the following data after simulating

transportation demand in the Rabat region using

platform Gama: The image below depicts a

simulation of a person leaving their home and going

to work. In addition, the ‘blue' point indicated the

workplace, the ‘black' point represented the

individual's dwelling, and the ‘red' point represented

the individual's mobility.

Figure 9: State of the simulated traffic by Gama platform

in the Rabat region

6 CONCLUSION

We used the ONDH 2019 [23] data to simulate

everyday mobility in the Rabat region in this post. In

this study, we also developed a model for

forecasting the workplace using j48 machine

learning techniques and applied it to ONDH 2019

[23] data to generate a comprehensive database.

Using the Gama platform, we simulated human

displacements. Any researcher who wants to choose

a prediction algorithm for data census in the future

can do so simply. We intend to expand our work

with the test performance of a multi-agent system

shortly

REFERENCES

Institutional website of the High Commission for Planning

of the Kingdom of Morocco. https://www.hcp.ma/.

Data Mining: Practical Machine Learning Tools and

Techniques. Elsevier, 2011.

The Swarm Simulation System: A Toolkit for Building

Multi-Agent Simulations — Santa Fe

Institutehttps://www.santafe.edu/research/results/wo

Simulation Daily Mobility using J48 Algorithms of Machine Learning for Predicting Workplace

397

rking-papers/the-swarmsimulation-system-a-toolkit-

for-building.

NetLogo Home Page.

https://ccl.northwestern.edu/netlogo/.

C. Tchervenkov, S. Horl, M. Balac, T. Dubernet, et K. W.

Axhausen, An improved replanning strategy for

congested traffic conditions in MATSim, Procedia

Comput. Sci., vol. 170, p. 779- 784, janv. 2020, DOI:

10.1016/j.procs.2020.03.156.

K. S. Lee, J. K. Eom, et D. Moon, Applications of

TRANSIMS in Transportation: A Literature Review,

Procedia Comput. Sci., vol. 32, p. 769-773, janv.

2014, DOI: 10.1016/j.procs.2014.05.489.

Agents4its, agents4its/agent polis. 2019.

GAMA-Platform, https://gama-

platform.github.io/wiki/Home.

Weka 3 - Data Mining with Open Source Machine

Learning Software in Java.

https://www.cs.waikato.ac.nz/ml/weka/.

Y. Lechevallier, WEKA, un logiciel libre d’apprentissage

et de data mining, p. 43.

B. R. Bakshi et G. Stephanopoulos, Representation of

process trends—IV. Induction of real-time patterns

from operating data for diagnosis and supervisory

control, Comput. Chem. Eng., vol. 18, no 4, p. 303-

332, avr. 1994, DOI: 10.1016/0098- 1354(94)85029-1.

G. Kaur et A. Chhabra, Improved J48 Classification

Algorithm for the Prediction of Diabetes, Int. J.

Comput. Appl., vol. 98, no 22, p. 13-17, juill. 2014.

J. Han, M. Kamber, et J. Pei, Data Mining: Concepts and

Techniques, 3rd edition, 2006.

P. Kuppan and N.Manoharan, A Tentative Analysis Of

Liver Disorder Using Data Mining Algorithms J48,

Decision Table And Naive Bayes, Int. J. Comput.

Algorithm, vol. 6, no 1, p. 37-40, juin 2017.

LACHOWICZ, Mirosław. Microscopic,

mesoscopic, and macroscopic descriptions of complex

systems. Probabilistic Engineering Mechanics, 2011,

vol. 26, no 1, p. 54-60.

Khalid Qbouche and Khadija Rhoulami. 2021. Simulation

Daily Mobility in Rabat region. In The 4th

International Conference on Networking Information

Systems and Security. (NISS2021), April 01, 02,

2021, KENITRA, AA, Morocco. ACM, New York,

NY, USA, 6 pages.

https://doi.org/10.1145/3454127.3454128

Zargayouna, Mahdi and Zeddini, Besma and Scemama,

Gerard and Othman, Amine, and others. (2013).

Agent-Based Simulator for Travelers Multimodal

Mobility. Frontiers in Artificial

Intelligence and Applications. 252. 10.3233/978-1-

61499- 254-7-81.

Serna, Ainhoa and Gerrikagoitia, Jon and Bernabe,

Unai and Ruiz, Tomas. (2017).

Sustainability analysis on Urban Mobility based on

Social Media content. Transportation Research

Procedia. 24. 1-8. 10.1016/j.trpro.2017.05.059.

Smolak, Kamil and Rohm, Witold and Knop, Krzysztof

and Sila- Nowicka, Katarzyna. (2020). Population

mobility modeling for mobility

Data simulation. Computers, Environment, and Urban

Systems. 84. 101526.

10.1016/j.compenvurbsys.2020.101526.

Zhang, H., and Li, D. (2007, November). Naive Bayes text

classifier. In 2007 IEEE International Conference on

Granular Computing (GRC

2007) (pp. 708-708). IEEE.

TAM, Vincent, SANTOSO, Ardi, and SETIONO, Rudy.

A comparative study of centroid- based,

neighborhood-based, and statistical approaches for

effective document categorization. In: Object

recognition supported by user interaction for service

robots. IEEE, 2002. p. 235-238.

Ihya, Rachida and Namir, Abdelwahed and Filali, Sanaa

and Aitdaoud, Mohammed and Guerss, Fatima zahra.

(2019). Machine learning algorithms J48 for

predicting a user's approval of an E-orientation

system. SCA ’19: Proceedings of the 4th International

Conference on Smart City Applications. 1-8.

10.1145/3368756.3368995.

Institutional site of the National Observatory of Human

Development .https://www.ondh.ma

BML 2021 - INTERNATIONAL CONFERENCE ON BIG DATA, MODELLING AND MACHINE LEARNING (BML’21)

398