Vocation Identiﬁcation for Heavy-duty Vehicles: A Tournament Bracket

Approach

Daniel Kobold Jr.

, Andy Byerly

, Rishikesh Mahesh Bagwe

, Euzeli Cipriano dos Santos Jr.

1 a

and Zina Ben Miled

1 b

Department of Electrical and Computer Engineering (IUPUI), Indianapolis, IN 46202, U.S.A.

Allison Transmission, Inc., One Allison Way, Indianapolis, IN 46222, U.S.A.

Keywords:

Heavy-duty Vehicles, Vocation, Classiﬁcation.

Abstract:

The identiﬁcation of the vocation of an unknown heavy-duty vehicle is valuable to parts’ manufacturers.

This study proposes a methodology for vocation identiﬁcation that is based on clustering techniques. Two

clustering algorithms are considered: K-Means and Expectation Maximization. These algorithms are used

to ﬁrst construct the operating proﬁle of each vocation from a set of vehicles with known vocations. The

vocation of an unknown vehicle is then determined by using one-versus-all or one-versus-one assignment.

The one-versus-one assignment is more desirable because it scales with an increasing number of vocations

and requires less data to be collected from the unknown vehicles. These characteristics are important to

parts’ manufacturers since their parts may be installed in different vocations. Speciﬁcally, this paper compares

the one-versus-one bracket and the one-versus-one round-robin tournament assignments to the one-versus-all

assignment. The tournament assignments are able to scale with an increasing number of vocations. However,

the bracket assignment also beneﬁts from a linear time complexity. The results show that despite its scalability

and computational efﬁciency, the bracket vocation identiﬁcation model has a high accuracy and a comparable

precision and recall. The NREL Fleet DNA drive cycle dataset is used to demonstrate these ﬁndings.

1 INTRODUCTION

The ability to identify the vocation of a heavy-duty

vehicle from drive cycle data collected during the ve-

hicle’s daily operation is valuable to many parts’ man-

ufacturers in order to track the end-use of the vehi-

cle. Electronic components and sensors are becoming

increasingly pervasive in vehicles. This led to new

sources of data. In fact, both OEMs and parts’ manu-

facturers now have access to a large stream of opera-

tional data that can be acquired during maintenance or

conﬁguration updates, with the vehicle owner’s con-

sent. However, as opposed to OEMs, parts’ manufac-

turers do not typically have knowledge of the actual

use, application or vocation of the vehicle. Moreover,

the same part can be deployed in a large number of

varying vocations. The classiﬁcation of vehicle us-

age and underlying parts into vocations beneﬁts the

component designers who likely do not have complete

awareness of how the vehicle will be used. This clas-

https://orcid.org/0000-0003-1584-8082

https://orcid.org/0000-0001-5401-0368

siﬁcation may be obvious in the OEM service chan-

nel where direct physical interaction with the vehicle

allows the identiﬁcation of the vocation of the vehi-

cle and consequently of the components and chassis

(e.g., refuse truck, cement mixer, dump truck, coach

bus, transit bus, etc.) However, this information is not

directly accessible to the parts’ manufacturer. Voca-

tion classiﬁcation can be used to: 1) detect when the

component is not being used in a manner consistent

with the intended vocation (e.g., a coach bus operat-

ing as a transit bus) or 2) identify ﬁeld issues that are

speciﬁc to the operational use of a given component

in the vehicle. These issues can then be addressed via

ﬁeld action (e.g., part conﬁguration updates) or future

design improvements to the component.

This paper presents a methodology for identifying

the vocation of an unknown heavy-duty vehicle using

features collected from the vehicle’s daily drive cy-

cles. The methodology follows a two-step approach.

First, the proﬁle of each vocation is established us-

ing a set of vehicles with known vocations. Second,

the daily drive cycles collected from the unknown ve-

hicle are compared to all vocation proﬁles and the

Kobold Jr., D., Byerly, A., Bagwe, R., Santos Jr., E. and Ben Miled, Z.

Vocation Identiﬁcation for Heavy-duty Vehicles: A Tournament Bracket Approach.

DOI: 10.5220/0010298702590266

In Proceedings of the 7th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2021), pages 259-266

ISBN: 978-989-758-513-5

259

most likely vocation is selected. The methodology is

demonstrated using two widely used clustering algo-

rithms K-means (KM) (Chakraborty et al., 2020) and

expectation maximization (EM) (Shin et al., 2019).

However, it can be extended to other clustering al-

gorithms such as particle swarm optimization (PSO)

(Kennedy and Eberhart, 1995).

Most classiﬁcation algorithms are best at handling

two classes (Athimethphat and Lerteerawong, 2012):

a positive and a negative class. These binary classi-

ﬁers have been extended to multiclass models using

the one-versus-all (Scholkopf and Smola, 2001) and

the one-versus-one methodology (Daengduang and

Vateekul, 2017). The one-versus-all consists of an en-

semble of classiﬁers where each classiﬁer is trained to

correctly predict one positive class while considering

all the remaining classes as negative. This method

has a linear complexity with respect to the number of

vocations. The one-versus-one is also an ensemble of

classiﬁers. However, a classiﬁer is developed for each

pair of classes leading to a quadratic complexity.

Vocation identiﬁcation for heavy duty-vehicles is

a multiclass application and the proposed methodol-

ogy was inspired by the one-versus-one classiﬁcation

approach which can accommodate a large number of

vocations. The daily measurements of the unknown

vehicle are compared to two vocations at a time in

a tournament bracket. However, as opposed to the

traditional one-versus-one classiﬁcation, a vocation is

eliminated in each round making the approach lin-

early scalable with respect to the number of vocations.

The proposed methodology is applied to 5 vocations

from the NREL ﬂeet DNA (NREL, 2019) dataset.

2 RELATED WORK

The purpose of a classiﬁer is to assign a given data

record to one of the pre-deﬁned classes. Since voca-

tions are known a-priori in our study, using a classiﬁer

with supervised learning would be expected. Some

of the widely recognized classiﬁcation algorithms in-

clude support vector machine (SVM) (Scholkopf and

Smola, 2001), random forest (RF) (Breiman, 2001),

and neural networks. Most of these algorithms are in-

herently two-class (binary) classiﬁers. However, they

have been extended to accommodate multiclass ap-

plications. For instance, SVM was applied to multi-

class classiﬁers using one-versus-one and one-versus-

all ensemble learners (Scholkopf and Smola, 2001).

Similarly, neural networks can use multiple nodes in

the output layer where each node corresponds to a

class (Sagi and Rokach, 2018). RF can also support

multiple classes if multiway trees are used instead of

binary decision trees (Murphy and Pazzani, 1991).

The purpose of a clustering algorithm is to: a)

identify clusters with similar records, b) select a rep-

resentative member for each cluster and c) adequately

assign a record to a cluster. These three aspects vary

from one clustering algorithm to the next. As opposed

to a classiﬁer, the ﬁrst step is performed using unsu-

pervised learning. For example, KM deﬁnes the sim-

ilarity between two records according to a distance

measure. The smaller the distances the more similar

are the records. Other similarity criteria that are opti-

mized to speciﬁc applications are proposed in (Kane-

maru et al., 2013), and (Wang et al., 2020).

Once a cluster is identiﬁed, a representative mem-

ber, called the centroid is selected and reﬁned itera-

tively as members are added to or removed from the

cluster. The centroid is typically calculated by averag-

ing across all the members of the cluster. Other clus-

tering algorithms, such as PSO, derive their efﬁciency

from the selection of appropriate centroids. Centroids

are mapped to particles in PSO. Each particle moves

in the feature space and its velocity is updated based

on the best position that the particle has found so far

and the current global best position across all parti-

cles.

The assignment of a record to a cluster also varies

from one clustering algorithm to the next. For KM,

each record is assigned to exactly one cluster based

on the distance between the record and the centroid of

the cluster. This assignment is referred to as a “hard”

assignment. EM uses a “soft” assignment (Wahba,

2002). That is, each record has a probability of be-

longing to each cluster.

Other important aspects of clustering algorithms

include the relationship among the clusters and the

appropriate number of clusters. Most clustering algo-

rithms assume that all clusters are at the same level.

This type of clustering is referred to as partitioning

(Ester et al., 1996). This is also the type of clustering

being used in this paper. In contrast, hierarchical clus-

tering (McInnes and Healy, 2017) allows some clus-

ters to be a subset of others.

Clustering has been used in several vehicular ap-

plications. For example, it was used in (Kanemaru

et al., 2013) for sharing of trafﬁc congestion informa-

tion. Each cluster of vehicles was used to represent a

given trafﬁc ﬂow thereby allowing the vehicle at the

head of the ﬂow to inform the vehicle at the tail of

the ﬂow of any trafﬁc congestion. In (Wang et al.,

2020), clustering was used to detect anomalous cab

trajectories. Each of the above applications innovate

by proposing a customized similarity measure for the

target application.

The ﬂeet DNA dataset used in this study was in-

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

260

troduced and extensively analyzed in (Duran et al.,

2018). Indeed, dimension reduction was performed

on the dataset using principal component analysis

(PCA) and cross-correlation to identify the eight most

expressive features in the dataset. These were found

to be aerodynamic speed, characteristic acceleration,

percent of total cycle distance accumulated at speeds

below 55 mph, percent of total cycle time duration

accumulated at vehicle speeds of 0 mph, number of

vehicle stops per mile, mean (nonzero) driving speed,

maximum driving speed and standard deviation of

(nonzero) driving speed. Using these eight features,

the study found that the ﬁrst 6 components of PCA

were able to describe 99% of the variance in the data.

KM was also used to cluster all the drive cycles in the

ﬂeet DNA dataset into three clusters.

The above study by NREL helped guide the

methodology proposed in this paper. That said, the

present paper addresses a different problem. The

NREL study (Duran et al., 2018) aims at identifying

a limited number of representative drive cycles across

all US commercial ﬂeets. The aim of the present pa-

per is to identify the speciﬁc vocation of an unknown

vehicle. The methodology is also different since it

demonstrates the use of a clustering algorithm for vo-

cation identiﬁcation. In fact, while targeting a differ-

ent application, the methodology proposed in this pa-

per shares this aspect with the approach for the detec-

tion of anomalous cab trajectories proposed in (Wang

et al., 2020). The algorithm proposed in this paper

enhances this methodology by showing that a one-

versus-one bracket assignment can be efﬁciently ap-

plied to a large number of classes.

3 METHODOLOGY

The proposed methodology creates a model that iden-

tiﬁes the vocation of an unknown vehicle. In the

next subsections, we describe the dataset, the train-

ing phase of the model which establishes the operat-

ing proﬁle of each vocation, and the three vocation

assignment algorithms.

3.1 Dataset

Each vehicle in the Fleet DNA dataset is represented

by a set of records where every record is an aggre-

gation of the drive cycle measurements over a single

day. The features of the records used in this study

are shown in Table 1. Their deﬁnitions are available

in (Duran et al., 2018) and references therein. For

convenience, some of these deﬁnitions are reproduced

below:

• Total Average Speed: Average speed over the trip

(including zero speed points).

• Driving Average Speed: Average speed over the

trip not including the zero speeds.

• Zero Seconds: Number of seconds at zero speed.

• Average Kinetic Power Density Demand: Mean

of the kinetic power density demand (with respect

to mass).

Table 1: Feature list.

Feature

1 Max Speed (mph)

2 Total Average Speed (mph)

∗

Total Speed Standard Deviation (mph)

4 Driving Average Speed (mph)

5 Driving Speed Standard Deviation (mph)

6 Zero Seconds (s)

7 Distance Total (miles)

8 Total Stops (count)

∗

Average Kinetic Power Density De-

mand (W /kg)

∗

Cumulative Instantaneous Kinetic En-

ergy Density (J/kg)

∗

Characteristic Acceleration (m/s

)

∗

Aerodynamic Speed (m/s)

13 Max Acceleration ( f t/s

)

14 Average Acceleration ( f t/s

)

∗

Max Deceleration ( f t/s

)

The list of the 15 features shown in Table 1 was

selected among the 350 available variables in the orig-

inal dataset using dimension reduction. Some of the

variables in the original data identify the vehicle, the

deployment or the vocation. These were used to label

the data. A large number of variables were removed

because they had a linear or an inverse relationship

with another variable (e.g., Characteristic Acceler-

ation and Characteristic Deceleration, Average Ac-

celeration and Average Deceleration). Variables re-

lated to potential energy (e.g., Cumulative Instanta-

neous Potential Energy Density and Average Poten-

tial Power Density Demand) were also removed be-

cause they are more dependent on the road elevation

than on the vocation of the vehicle. Moreover, daily

records with Zero Seconds > 18,000s were also re-

moved from all the vehicles because this is an indi-

cation that the vehicle was not in operation for more

than 5 hours in the given day.

The Fleet DNA dataset includes eight vocations:

Bucket Trucks, Class 8 Tractors, Delivery Vans, De-

livery Trucks, Transit Buses, Refuse Trucks, School

Buses, and Service Vans. The latter three vocations

were eliminated because they did not include sufﬁ-

Vocation Identiﬁcation for Heavy-duty Vehicles: A Tournament Bracket Approach

261

cient data. For the remaining vocations, the vocation

identiﬁcation model followed a training/testing split

at the vehicle level. This prevents information leakage

that may result from allowing records from the same

vehicle to participate in both the training and the test-

ing of the model. After assigning a vehicle to either

training or testing, 13 records were randomly sampled

without replacement from each vehicle. Each random

selection was considered as a separate vehicle. This

effectively allows a given vehicle to appear multiple

times in either the training or testing vehicle pools.

However, the underlying drive cycle will always be

unique as per the sampling policy. Moreover, to keep

the training records balanced across vocations, 10 ve-

hicles were selected per vocation for training. The

remaining vehicles were used for testing. This split

approach led to variations in the number of vehicles

available for testing across the vocations (Table 2). In

total, 50 vehicles are used for training and 81 are used

for testing across the 5 vocations.

Table 2: Number of test vehicles in each vocation.

Vocation Total num.

of vehicles

Num.

of test

vehicles

Bucket Truck (BT) 12 2

Class 8 Tractor (CT) 43 33

Delivery Truck (DT) 29 19

Delivery Van (DV) 26 16

Transit Bus (TB) 21 11

Each vocation represents a group of vehicles that

perform similar tasks. A detailed description of each

vocation in the ﬂeet DNA is provided in (NREL,

2019). Some of the vocations (e.g., Transit Bus) have

a distinct operational proﬁle while others have an op-

erational proﬁle that can be confounded with the re-

maining vocations. Delivery Vans (DV) and Delivery

Trucks (DT) are expected to have similar operating

proﬁles since the main difference between these two

vocations is the vehicle weight, with DT vehicles be-

ing typically heavier than DV vehicles. Bucket Trucks

(BT) perform tasks at the job site and will possibly

spend less time driving from one point to another.

Thus, compared to DT, DV and TB vehicles, the op-

erational proﬁle of BT vehicles should show lower

distances traveled and lower average speeds. Class

8 Tractors (CT) are typically used to haul a trailer

from a source (e.g., distribution center) to a destina-

tion (e.g., customer site). Therefore, CT vehicles are

expected to travel long distances over highways com-

pared to DT or DV vehicles. However, according to

the vocation characteristics in (NREL, 2019), the CT

vocation consists of various types of class 7 and 8 ve-

hicles that can be used for different tasks ranging from

food delivery to long-hauling tasks. This variation ex-

plains some of the results discussed in Section 4.

3.2 Model Development

The training is executed for each vocation indepen-

dently. It starts by randomly selecting a set of ini-

tial centroids for the target vocation from the avail-

able training data. During each iteration, records from

the training data are compared to each centroid of the

vocation. After processing all records, the centroids

are updated and a new training iteration begins. The

records and centroids are denoted as follows:

• r

= (r

[1],r

[2],...,r

[n]) represents a record

where each element r

[.] of the input vector r

the value of one of the input features and n is the

total number of features.

• C

= {cv

,cv

,...,cv

} is the set of centroids of

vocation v where each centroid represents a clus-

ter of the vocation v ∈ V = {BT,CT,DV, DT,T B}.

The total number of centroids, m, for each voca-

tion in this study is ﬁxed.

Under the KM algorithm, each record is assigned

to exactly one cluster which is selected according to

the minimum distance between the record and the

centroids of all clusters. In the case of EM, the as-

signment of a record to a cluster follows a probabilis-

tic measure. This measure is derived using Bayes’

rule, with the assumption that each feature has a nor-

mal distribution and that all the features are indepen-

dent. At the end of each training iteration of either

the KM or EM algorithms, the centroids of the clus-

ters are updated according to the record assignment

derived during the iteration.

3.3 Feature Reduction

Even though the starting dataset was manually re-

duced from 350 parameters to 15 features as de-

scribed in Section 3.1, a minimalist model is desir-

able in order to limit the deployment cost of the voca-

tion identiﬁer and promote its applicability in produc-

tion. This minimalist model should only include the

features that are necessary and practical for vocation

identiﬁcation. Feature reduction was performed using

the wrapper induction method (Khalid et al., 2014).

During each iteration of the feature reduction process,

the standard deviation of each target feature is eval-

uated for each cluster and the feature is removed if

the resulting value is below a certain pre-set threshold

across all the clusters. One feature was considered

per iteration until none of the features had a standard

deviation below this threshold. In addition, features

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

262

that are easier to collect (e.g., vehicle speed) were fa-

vored over features that may not be readily available

(e.g., characteristic acceleration and kinetic energy

density). In the remainder of the paper, the model

with the full feature set is labeled FFmodel and the

reduced feature model is labeled RFmodel.

3.4 Vocation Assignment

Once the model is trained, it is exposed to a record r

from an unknown vehicle. That is, for each vocation

v and centroid cv

of v, the conditional probability

P(cv

) is calculated. In the case of KM, this prob-

ability measure is binary. The record is then assigned

to the vocation - v

) - with the largest probability

according to the following equation:

) = argmax

v∈V

(

argmax

1≤ j≤m

{P(cv

)}

)

. (1)

Equation 1 is used for a single daily record from an

unknown vehicle. When the unknown vehicle has

multiple records, each record can be assigned to a dif-

ferent vocation and a consensus is needed to select

the winning vocation. Let R = {r

,...,r

} repre-

sent the set of records of the unknown vehicle. The

winning vocation of the unknown vehicle is the vo-

cation that is assigned the highest number of records.

This process is deﬁned by the following equation:

voc

(R) = argmax

v∈V

(

∑

i=1

) = v

)

. (2)

Equations 1 and 2 show the traditional one-versus-

all (T ) assignment where all the vocations compete

for the same vehicle at once. This assignment has an

important limitation since the wrong vocations may

weaken the chances of the correct vocation by acquir-

ing several of the records of the unknown vehicle.

This aspect is particularly important for the current

application because the number of vocations can be

large and the number of daily records available for

each unknown vehicle is small.

,a,b) = argmax

v∈{a,b}

(

argmax

1≤k≤m

{P(cv

)}

)

(3)

voc

(R) = argmax

v∈V

(

∑

a6=v

∑

i=1

,a,v) = v

)

(4)

In order to mitigate this potential limitation, the

one-versus-one round-robin tournament (R) assign-

ment was investigated. This assignment consists of

multiple rounds where each vehicle is exposed to ev-

ery combination of two vocations. The vocation of

choice is the one that is assigned the most records

across all of the rounds for a given vehicle as deﬁned

in (3) and (4).

Unfortunately, the round-robin assignment has a

quadratic time complexity with respect to the number

of vocations. The tournament bracket (B) also fol-

lows the one-versus-one assignment and consists of

multiple rounds where the unknown vehicle is only

exposed to two vocations in each round. However,

in the bracket assignment, a vocation is eliminated in

each round. The vocation that is retained is the one

that collects the highest number of records from the

unknown vehicle in the round and this vocation pro-

ceeds to the next round. The assignment concludes

when only one vocation remains.

Equation (5) shows the selection between two vo-

cations a and b for one round of the bracket assign-

ment. This equation is applied recursively in order to

determine the winning vocation as shown in (6).

(R,a,b) = argmax

v∈{a,b}

(

∑

i=1

,a,b) = v

)

(5)

voc

(R) = w



R,v

(R,v

c−1

c−2

)



(6)

where c is the number of vocations in V. As opposed

to the round-robin assignment, Equation (6) is only

executed c − 1 times allowing the bracket assignment

to have a linear time complexity with respect to the

number of vocations.

4 RESULTS AND DISCUSSION

The one-versus-all, round-robin and bracket assign-

ments are applied to the dataset described in Table 2.

During training, the centroids of each vocation are de-

termined using 130 daily records from the vocation.

The model is then exposed to the test vehicles.

4.1 One-versus-All Assignment

Table 3 shows the confusion matrix of the one-versus-

all FFmodel with KM and EM clustering. The results

are presented in this manner in order to facilitate the

analysis of confounding vocations and the identiﬁca-

tion of vocations with unique proﬁles. At the end of

the section, the aggregated accuracy, precision and re-

call of the models are discussed.

The assignment of a vehicle to a vocation follows

(2). Each row in Table 3 represents a vocation. The

entries are the number of vehicles of the target voca-

tion (row) that are assigned to a given vocation (col-

umn). The numbers in between parenthesis represent

the number of ties for each vocation. For example, the

Vocation Identiﬁcation for Heavy-duty Vehicles: A Tournament Bracket Approach

263

CT vocation has a total of 33 test vehicles (Table 2).

Using the KM algorithm, 20 out these vehicles were

correctly assigned to the CT vocation. The remain-

ing 13 vehicles were incorrectly assigned as follows:

6 to BT, 3 to DT, 2 to DV, 1 to TB and one vehicle

was a tie between DT and TB. The KM FFmodel was

able to correctly classify 51 out of the 81 test vehicles

whereas the EM FF model shows 59 true positives.

Table 3: Vocation assignment of the test vehicles using the

traditional one-versus-all KM and EM FFmodels.

BT CT DT DV TB

BT 2 0 0 0 0

CT 6 20 3(1) 2 1(1)

KM DT 2 0 10 4(1) 2(1)

DV 4 1 1 8 2

TB 0 0 0 0 11

BT 2 0 0 0 0

CT 2 21(1) 4 1 4(1)

EM DT 0 0 15 4 0

DV 4 1 1 10 0

TB 0 0 0 0 11

None of the BT vehicles were assigned to a dif-

ferent vocation under the two FFmodels. Despite the

low number of test vehicles in this vocation (Table 3),

this is still an indication of the unique BT proﬁle. TB

is another vocation with a distinct operational proﬁle

with no vehicles incorrectly classiﬁed under both FF-

models. The large number of DT vehicles that are

assigned to the DV vocation indicates that the two vo-

cations may be similar as discussed in Section 3.1.

Table 4: Vocation assignment of the test vehicles using the

one-versus-all KM and EM RFmodels.

BT CT DT DV TB

BT 2 0 0 0 0

CT 4 22 2(1) 3(1) 1

KM DT 2(1) 1 11(1) 1(1) 2(1)

DV 4(2) 1 1(1) 7(1) 1

TB 0 0 0 0 11

BT 2 0 0 0 0

CT 3 21(1) 4(1) 0(1) 3(1)

EM DT 1 1 14 3 0

DV 4(1) 0(1) 0 10(2) 0

TB 0 0 0 0 11

Feature reduction as described in Section 3.3 was

performed on the models. The features that were

eliminated include Total Speed Standard Deviation

and Average Kinetic Power Density Demand. The

eliminated features are indicated by a ’

∗

’ in Table 1.

The reduced feature model (RFmodel) includes only

9 features which can all be derived from two readily

available parameters: speed and distance traveled.

Table 4 shows the confusion matrix of the RF-

model under KM and EM. The model generated 53

and 58 true positives with KM and EM, respectively.

The number of true positives for the reduced and full

feature models are similar. However, the number of

ties is higher for the reduced feature model. This is

expected as fewer parameters are available to distin-

guish among all the vocations at the same time. The

one-versus-one assignment was introduced to help ad-

dress this limitation.

4.2 Round-robin Assignment

The KM and EM round-robin FFmodel models cor-

rectly classiﬁed 50 and 57 test vehicles, respectively

(Table 5). The number of true positives is comparable

to that of the corresponding traditional one-versus-all

model. However, the round-robin assignment does

not suffer from ties. The numbers of true positives

for the KM and EM RFmodels with round-robin as-

signment are 55 and 58, respectively (Table 6).

Table 5: Vocation assignment of the test vehicles using the

round-robin KM and EM FFmodels.

BT CT DT DV TB

BT 2 0 0 0 0

CT 5 17 7 2 2

KM DT 1 0 12 5 1

DV 5 1 1 8 1

TB 0 0 0 0 11

BT 2 0 0 0 0

CT 2 19 4 0 8

EM DT 0 0 15 4 0

DV 4 0 2 10 0

TB 0 0 0 0 11

As in the case of the one-versus-all assignment,

EM performs better than KM for the round-robin

models. Moreover, compared to the one-versus-all

assignment, the round-robin assignment has higher

number of true positives for all vocations except for

the CT vocation. As discussed in Section 3.1, this ex-

ception may be due to the fact that the CT vocation is

actually a combination of two or more vocations.

4.3 Bracket Assignment

Tables 7 and 8 show the bracket assignment for the

FFmodel and RFmodel, respectively. Similar to the

round-robin assignment, the bracket assignment does

not suffer from ties and the number of true positives

generated by the respective models is nearly the same.

In fact, the model with the highest number of true pos-

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

264

itives is the bracket RFmodel. While the difference in

performance may be marginal, the bracket RF model

offers several advantages: It scales linearly with re-

spect to the number of vocations; it is less susceptible

to an increasing number of vocations since only two

vocations are compared at a time; and it uses a re-

duced feature set that is readily available.

Table 6: Vocation assignment of the test vehicles using the

round-robin KM and EM RFmodels.

BT CT DT DV TB

BT 2 0 0 0 0

CT 4 19 6 3 1

KM DT 0 1 13 2 3

DV 2 1 2 10 1

TB 0 0 0 0 11

BT 2 0 0 0 0

CT 4 20 6 1 2

EM DT 0 1 15 3 0

DV 5 0 0 11 0

TB 0 0 1 0 10

Table 7: Vocation assignment of the test vehicles using the

bracket KM and EM FFmodels.

BT CT DT DV TB

BT 2 0 0 0 0

CT 5 19 6 1 2

KM DT 1 0 12 5 1

DV 5 1 1 8 1

TB 0 0 0 0 11

BT 2 0 0 0 0

CT 2 19 4 1 7

EM DT 0 0 15 4 0

DV 4 1 1 10 0

TB 0 0 0 0 11

Table 8: Vocation assignment of the test vehicles using the

bracket KM and EM RFmodels.

BT CT DT DV TB

BT 2 0 0 0 0

CT 4 20 5 3 1

KM DT 1 1 13 1 3

DV 2 2 1 10 1

TB 0 0 0 0 11

BT 2 0 0 0 0

CT 4 21 4 1 3

EM DT 2 1 13 3 0

DV 5 0 0 11 0

TB 0 0 0 0 11

The above results focus on the true positive as-

signments generated by each model. They show that

the bracket model delivers the same or higher number

of correct assignments compared to the other models

while being computationally more efﬁcient than the

round-robin model and more scalable than the one-

versus-all model. In the remainder of this section, we

also show that these beneﬁts do not come at the ex-

pense of a signiﬁcantly lower precision or recall.

The average accuracy of the models across all vo-

cations is 85% or higher. The precision and recall

of these models are included in Table 9. This table

shows that for each of the three assignments, EM has

higher precision and recall than KM. The results also

show that the reduced feature models have higher pre-

cision and recall compared to the full feature models.

Finally, the model with the highest precision and re-

call (i.e., 75.3%) is the one-versus-all RF model. The

bracket RFmodel has a higher precision and recall

(71.6%) than all round-robin models.

5 CONCLUSIONS

This paper introduced a methodology for vocation

identiﬁcation of heavy duty vehicles when the number

of vocations is expected to be large and the number of

records available for each unknown vehicle is small.

The proﬁle of the vocation is ﬁrst developed using a

set of training vehicles. This proﬁle consists of a set

of centroids that represent the operating modes of the

vocation. The unknown vehicle is then assigned to a

vocation using a tournament bracket. In each round,

two vocations are compared to the unknown vehicle

and the unlikely vocation is eliminated. This assign-

ment was compared to the one-versus-all and round-

robin assignments. Two models were considered. The

ﬁrst was based on 15 features. Some of these features

included complex variables which may not be acces-

sible to the parts’ manufacturer. The second model

is more practical and was limited to 9 features that

can be derived solely from speed and distance trav-

eled. Compared to the full feature model, the reduced

feature model had higher precision and recall.

Table 9: Precision (P) and recall (R) of the models with the

three different assignments.

One-versus-all Round Robin Bracket

KM EM KM EM KM EM

FFmodel

P 62.2 73.8 61.7 70.4 64.2 70.4

R 63.0 72.8 61.7 70.4 64.2 70.4

RFmodel

P 66.3 75.3 67.9 67.9 69.1 71.6

R 65.4 75.3 71.6 71.6 69.1 71.6

With the exception of the CT vocation, the number

of true positives for each vocation using the bracket

Vocation Identiﬁcation for Heavy-duty Vehicles: A Tournament Bracket Approach

265

assignment is also either the same or higher than the

true positives obtained using the one-versus-all and

the round-robin assignments. The bracket assignment

was introduced to avoid some of the drawbacks of the

one-versus-all assignment for this application. The

latter assignment inherently implies the availability of

a large number of records for the unknown vehicles

as these records are exposed to all the clusters of all

the vocations at once. The bracket assignment over-

comes this limitation by comparing two vocations at

a time and was shown in this study to have a com-

parable performance to that of the one-versus-all as-

signment. The bracket assignment was also compared

to a round-robin assignment which scales with an in-

creasing number of vocations. The results show that

the bracket assignment has higher precision and recall

but most importantly has lower time complexity.

There are several directions that are being consid-

ered for future work including exploring the possi-

bility of reducing vocation confounding by applying

weights to speciﬁc features. In addition, the proposed

vocation identiﬁcation algorithm relies on features ag-

gregated daily from the duty cycle of the vehicle over

a period of 13 days. Using data points collected over

shorter sample periods will enhance the applicability

of the algorithm to a wide range of vehicles.

ACKNOWLEDGEMENTS

This research was supported in part by Allison Trans-

mission, Inc.

REFERENCES

Athimethphat, M. and Lerteerawong, B. (2012). Bi-

nary classiﬁcation tree for multiclass classiﬁca-

tion with observation-based clustering. In 9th

International Conference on Electrical Engineer-

ing/Electronics, Computer, Telecommunications and

Information Technology, pages 1– 4.

Breiman, L. (2001). Random forests. Machine learning,

45(1):5–32.

Chakraborty, A., Faujdar, N., Punhani, A., and Saraswat, S.

(2020). Comparative study of k-means clustering us-

ing iris data set for various distances. In 10th Interna-

tional Conference on Cloud Computing, Data Science

& Engineering (Conﬂuence), pages 332–335.

Daengduang, S. and Vateekul, P. (2017). Applying one-

versus-one svms to classify multi-label data with large

labels using spark. In 9th International Conference on

Knowledge and Smart Technology, pages 72 – 77.

Duran, A., Phillips, C., Perr-Sauer, J., Kelly, K., and Konan,

A. (2018). Leveraging big data analysis techniques for

us vocational vehicle drive cycle characterization, seg-

mentation, and development. Technical report, SAE

Technical Paper.

Ester, M., Kriegel, H., Sander, J., Xu, X., et al. (1996).

A density-based algorithm for discovering clusters in

large spatial databases with noise. Kdd, 96(34):226–

231.

Kanemaru, Y., Matsuura, S., Kakiuchi, M., Noguchi, S., In-

omata, A., and Fujikawa, K. (2013). Vehicle cluster-

ing algorithm for sharing information on trafﬁc con-

gestion. In 13th International Conference on ITS

Telecommunications, pages 38–43. IEEE.

Kennedy, J. and Eberhart, R. (1995). Particle swarm opti-

mization. In International Conference on Neural Net-

works, volume 4, pages 1942–1948. IEEE.

Khalid, S., Khalil, T., and Nasreen, S. (2014). A survey of

feature selection and feature extraction techniques in

machine learning. In Science and Information Con-

ference, pages 372–378. IEEE.

McInnes, L. and Healy, J. (2017). Accelerated hierarchical

density based clustering. In International Conference

on Data Mining Workshops (ICDMW), pages 33–42.

IEEE.

Murphy, P. and Pazzani, M. (1991). Id2-of-3: Constructive

induction of m-of-n concepts for discriminators in de-

cision trees. In Machine Learning Proceedings, pages

183–187. Elsevier.

NREL (2019). Fleet dna project data.

Sagi, O. and Rokach, L. (2018). Ensemble learning: A sur-

vey. WIREs: Data Mining & Knowledge Discovery,

8(4):1.

Scholkopf, B. and Smola, A. (2001). Learning with ker-

nels: support vector machines, regularization, opti-

mization, and beyond. MIT press.

Shin, Y., Goh, Y., Lee, C., and Chung, J. (2019). Effective

data structure for smart big data systems applying an

expectation-maximization algorithm. In Third World

Conference on Smart Trends in Systems Security and

Sustainablity (WorldS4), pages 136–140.

Wahba, G. (2002). Soft and hard classiﬁcation by reproduc-

ing kernel hilbert space methods. Proceedings of the

National Academy of Sciences, 99(26):16524–16530.

Wang, J., Yuan, Y., Ni, T., Ma, Y., Liu, M., Xu, G., and

Shen, W. (2020). Anomalous trajectory detection and

classiﬁcation based on difference and intersection set

distance. IEEE Transactions on Vehicular Technology,

69(3):2487–2500.

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

266