Determining Capacity of Shunting Yards by Combining Graph

Classiﬁcation with Local Search

Arno van de Ven

, Yingqian Zhang

, Wan-Jui Lee

, Rik Eshuis

and Anna Wilbik

Eindhoven University of Technology, Eindhoven, The Netherlands

Maintenance Development, NS (Dutch Railways), Utrecht, The Netherlands

Keywords:

Planning and Scheduling, Machine Learning, Convolutional Neural Networks, Classiﬁcation, Local Search.

Abstract:

Dutch Railways (NS) uses a shunt plan simulator to determine capacities of shunting yards. Central to this

simulator is a local search heuristic. Solving this capacity determination problem is very time consuming, as it

requires to solve an NP-hard shunting planning problem, and furthermore, the capacity has to determined for

a large number of possible scenarios at over 30 shunting yards in The Netherlands. In this paper, we propose

to combine machine learning with local search in order to speed up ﬁnding shunting plans in the capacity

determination problem. The local search heuristic models the activities that take place on the shunting yard as

nodes in an activity graph with precedence relations. Consequently, we apply the Deep Graph Convolutional

Neural Network, which is a graph classiﬁcation method, to predict whether local search will ﬁnd a feasible

shunt plan given an initial solution. Our experimental results show our approach can signiﬁcantly reduce the

simulation time in determining the capacity of a given shunting yard. This study demonstrates how machine

learning can be used to boost optimization algorithms in an industrial application.

1 INTRODUCTION

The Dutch Railways (NS) operates a daily amount of

4,800 domestic trains serving more than 1.2 million

passengers each day. When trains are temporarily not

needed to operate a given timetable they are main-

tained and cleaned at dedicated shunting yards. Here,

NS is dealing with the so-called shunting activities

(Boysen et al., 2012). An example of a shunting yard

is shown in Figure 1.

Figure 1: Shunting yard with speciﬁc tracks for inspection

and cleaning activities. Source: www.sporenplan.nl.

NS is expanding their ﬂeet of train units by 37% in

the next ﬁve years. The management team questions

if the capacity of existing shunting yards is sufﬁcient

to handle the expansion. A shunt plan simulator has

been developed to help solve this capacity determina-

tion problem. The simulator is used to both determine

the capacity of shunting yards as well as analyze dif-

ferent scenarios on each shunting yard. Central to the

simulator is a local search heuristic (or LS) (van den

Broek, 2016; van den Broek et al., 2018). LS requires

an initial solution as a starting point, which is created

by a simple sequential algorithm that contains all im-

portant features to be able to apply search operators.

After a predeﬁned running time, LS either returns a

feasible plan, or cannot ﬁnd any feasible plan.

LS is much more computationally efﬁcient than

the previously formulated mathematical optimization

model (Kroon et al., 2008). However, given that there

are over 30 shunting yards in The Netherlands, and

more than 50 possible scenarios to be evaluated for

each shunting yard, NS has been looking for solutions

to speeding up capacity determination. In this paper,

we propose to use machine learning to approximate

the local search heuristic, that is, we learn the rela-

tion between the input instances and the correspond-

ing outputs of LS. Given any initial solution, a con-

structed classiﬁcation model predicts whether LS can

ﬁnd a feasible solution before actually applying LS. In

this way, LS does not have to evaluate every generated

initial solution, and hence its computation time on de-

termining the maximum capacity of a given shunting

yard is greatly reduced.

van de Ven, A., Zhang, Y., Lee, W., Eshuis, R. and Wilbik, A.

Determining Capacity of Shunting Yards by Combining Graph Classiﬁcation with Local Search.

DOI: 10.5220/0007398502850293

In Proceedings of the 11th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2019), pages 285-293

ISBN: 978-989-758-350-6

285

Essential to any local search algorithm is a solu-

tion representation that properly captures all impor-

tant aspects of the solution. The local search heuris-

tic by (van den Broek, 2016; van den Broek et al.,

2018) models the activities that take place on the

shunting yards as nodes in an activity graph. Repre-

senting shunt plans as activity graphs enables us to

use graph classiﬁcation. Recent research on graph

classiﬁcation has proven to achieve high accuracy in

predicting the class labels of an arbitrary graph, see

e.g., (Zhang et al., 2018), (Niepert et al., 2016) and

(Kipf and Welling, 2016). Therefore, in this paper,

we use a Deep Graph Convolutional Neural Network

(DGCNN) (Zhang et al., 2018) to train a model that

predicts the class label, i.e., feasible or infeasible, of

each graph in the dataset. To assess the effectiveness

of our approach, we measure the decrease in compu-

tation time in our experiments. We demonstrate how

machine learning methods can be used to boost opti-

mization algorithms in an industrial application.

The rest of our paper is organized as follows. In

Section 2, we describe background information, re-

lated work and clarify the position of our work within

the simulation process. Section 3 shows how we use

DGCNN to approximate LS. Section 4 describes the

experiment setup and results in terms of prediction ac-

curacy and decreased computation time.

2 BACKGROUND AND RELATED

WORKS

2.1 Shunt Plan Simulator

The shunt plan simulator at NS consists of three se-

quential stages: (1) generating an instance of a given

shunting yard, (2) generating an initial solution, and

(3) ﬁnding a feasible solution using a local search

heuristic. The maximum capacity of a given shunt-

ing yard is then determined by repeatedly running the

local search heuristic with different instances of dif-

ferent scenarios. After a sufﬁcient number of runs,

the simulation converges towards a number of train

units for which the heuristic can solve at least 95% of

the instances. This number is used to determine the

capacity of the given shunting yard. The capacity is

deﬁned as the number of train units a shunting yard

can serve during a 24-hour time period.

Figure 2 shows a diagram explaining the software

structure of the simulator. The instance generator is

a parameterizable program, developed by NS, which

derives instances for the Train Unit Shunting Problem

automatically. Instances can be generated for each

shunting yard individually with parameters speciﬁ-

cally based on a day-to-day schedule at that shunt-

ing yard. Examples of parameters are number of train

units, arrival/departure distribution and the set of ser-

vice tasks that can be performed. Parameters can be

changed to test different scenarios.

Instance

Generator

Local Search

Heuristic

Storage and Retrieval

Capacity

Analyzer

Initial Solution

Generator

DGCNN

Feasibility

prediction

Figure 2: Diagram software structure simulator.

The output of the instance generator is a set of ar-

riving trains (AT), a set of departing trains (DT) and

a set of service tasks for each train unit that has to be

performed. For both (AT) and (DT), train composi-

tion, train units and arrival/departure time are speci-

ﬁed. The set of service tasks contains a list of ser-

vice tasks for each train unit that has to be done in the

time that the train unit is present on the service site.

Trains can be composed of one or more train units of

the same type, which are a set of carriages that form

a self-propelling vehicle that can drive in both direc-

tions. Of the same train unit type, there exist multi-

ple subtypes, where the subtype indicates how many

carriages the train unit consists of. Figure 3 shows

a train unit type and corresponding subtypes. Train

composition, train units and arrival/departure time are

speciﬁed for both AT and DT. The set of service tasks

contains a list of service tasks for each train unit that

has to be done in the time that the train unit is present

on the shunting yard.

Figure 3: Train unit type VIRM with 6 and 4 carriages.

The output of the instance generator serves as in-

put for the initial solution generator. The algorithm

of Hopcroft-Karp (Hopcroft and Karp, 1973) is used

to produce a matching between arriving and depart-

ing train units. Next, a service task schedule is con-

structed in a greedy way, which forms an initial solu-

tion of the given instance. Note that generally initial

solutions are not feasible, that is, an initial solution

ICAART 2019 - 11th International Conference on Agents and Artiﬁcial Intelligence

286

may violate the temporal or routing constraints. The

purpose of an initial solution is that it contains all im-

portant features to serve as a starting point for the lo-

cal search heuristic to ﬁnd a feasible solution. In ear-

lier work (van den Broek, 2016; van den Broek et al.,

2018), 11 operators in LS have been deﬁned to move

through the search space. LS ends when a feasible so-

lutions has been found or when the predeﬁned maxi-

mum runtime has been reached. In the latter case, no

feasible solutions are found. Experiments in (van den

Broek, 2016) show that LS is capable to ﬁnd feasible

shunt plans in both artiﬁcial and real-world scenar-

ios. The performance of LS has been compared to

a mathematical optimization model developed at NS

that tries to ﬁnd the optimal solution, and LS is capa-

ble of planning more train units in most experiments.

As it is computationally expensive to use LS to eval-

uate every instance, in this work, we approximate LS

using a machine learning model.

2.2 Optimization Methods with

Machine Learning

In recent years, many studies have investigated boost-

ing optimization using machine learning, see e.g.

(Meisel and Mattfeld, 2010; Lombardi and Milano,

2018). For instance, in (Verwer et al., 2017), the au-

thors use regression models and decision trees to pre-

dict outcomes of auctions, and such predictive mod-

els are consequently used to evaluate and design op-

timal auction parameters. In (Defourny et al., 2012),

the authors combine the estimation of statistical mod-

els to return a decision rule given a state with sce-

nario tree techniques from multi-stage stochastic pro-

gramming. In the context of planning and schedul-

ing, (Peer et al., 2018) develop a Deep Reinforcement

Learning (DRL) solution to decide the best strategy

of parking trains. In their work, the existing opti-

mization model is completely replaced by a machine

learning model.

In our work, we use machine learning techniques

to learn the relation between the input instances and

the corresponding outcomes of local search. Our

work is also similar to the research line of simula-

tion optimization. The idea of simulation optimiza-

tion is to combine meta-heuristic search algorithms

with function approximation models for ﬁtness ap-

proximation in order to reduce the time on determin-

ing the capacity (Carson and Maria, 1997; Amaran

et al., 2016).

The position of our work in the shunt plan sim-

ulator is between the initial solution generation and

applying initial solutions to local search (Figure 2).

After generating an initial solution, a trained classiﬁ-

cation model (DGCNN) predicts whether LS can ﬁnd

a feasible solution. If the outcome is positive, LS is

applied to ﬁnd a feasible solution. Otherwise, the neg-

ative outcome leads to discarding the initial solution

and drawing a new instance from the instance genera-

tor. Therefore, accurately predicting feasibility leads

to a decrease in computation time since less time is

wasted on instances that may turn out to be infeasible

(see Section 4).

3 APPROXIMATING LS USING

DGCNN

A shunt plan can be modelled as an activity graph.

Figure 4 shows an example of an activity graph. The

activities nodes, including arrival (A), service (S),

parking (P), movement (M) and departure (D), are

connected by edges indicating the precedence rela-

tions. The solid, black arcs represent the order of op-

erations of one or more train units. The corresponding

train units of the nodes are between parentheses. The

blue edges determine the order of the movements, and

the green edge indicates which service task is com-

pleted ﬁrst. The assigned track for each parking node

is shown in subscript. The speciﬁc service task for

each service node is shown in subscript.

In this paper, we aim to predict whether an initial

solution that is represented by a given activity graph

can lead to a feasible solution. To this end, we treat

the prediction problem as a graph classiﬁcation prob-

lem. Given a graph G = (V, E) where V is a ﬁnite set

of nodes and E is a ﬁnite set of edges. Node features

encode information about tracks, train units, duration

and activities. Each graph G

∈ G has a corresponding

class y

∈ C where C is the set of class labels given as

C = 0 (infeasible), 1 (feasible). The accuracy of the

derived model is assessed by comparing the predicted

label y

with the actual label y

There are many successful machine learning al-

gorithms that could be used to predict feasibility of

initial solutions. However, most algorithms involve

heavy feature engineering on problem instances. Re-

cently, a Deep Graph Convolutional Neural Network

(DGCNN) has been proposed in (Zhang et al., 2018)

for graph classiﬁcation, which accepts graphs of arbi-

trary structure. The proposed architecture addresses

two main challenges: (1) how to extract useful fea-

tures characterizing the rich information encoded in

graph classiﬁcation and (2) how to sequentially read

a graph in a meaningful and consistent order.

To tackle the ﬁrst challenge, graph convolution

layers are used to extract local substructure features

from nodes and deﬁne a consistent node ordering.

Determining Capacity of Shunting Yards by Combining Graph Classiﬁcation with Local Search

287

12.00h

12.30h

13.00h

12.45h

12.07h

12.41h

Figure 4: The activity graph of a shunt plan. The activity nodes in an shunt graph are encoded with starting and/or ending

times. For clarity, only a few starting and ending times are visualized.

1 3

2 5

2 7

4 8

1 3

2 5

2 7

4 8

1 3

2 5

2 7

4 8

1 3

2 5

2 7

4 8

1 3

2 5

2 7

4 8

1 3

2 5

2 7

4 8

1 3

2 5

2 7

4 8

Input graph Graph convolution layers 1-D convolution Dense layers

Figure 5: The overall structure of DGCNN used in our problem. An input graph of arbitrary structure is ﬁrst passed through

multiple graph convolution layers where node labels are propagated between neighbors, visualized as different colors. Then

the node features are passed to traditional CNN structures to learn a predictive model. The ﬁgure is slightly modiﬁed from

(Zhang et al., 2018).

Their graph convolution model effectively mimics

two popular kernels, the Weisfeiler-Lehman Subtree

Kernel (Shervashidze et al., 2011b) and the Propa-

gation Kernel (Neumann et al., 2016), explaining its

graph-level classiﬁcation performance. To address the

second challenge, a SortPooling layer is introduced,

which sorts the node features under the previously

deﬁned order and uniﬁes input sizes. This is done

because in contrast to images graphs lack a tensor

representation with ﬁxed ordering, which limits the

applicability of neural networks on graphs. Finally,

traditional convolutional and dense layers are added

to read the sort graph representations and make pre-

dictions. The authors of (Zhang et al., 2018) show

DGCNN can achieve good performance on several

graphs such as social networks. In this paper, we ap-

ply a modiﬁed DGCNN, which is described as fol-

lows.

The second localized graph convolution step in-

volves appending node labels of neighbouring nodes

to original node labels. The variety of original node

labels deﬁnes how many new node labels will be cre-

ated after appending neighbouring node labels. Lo-

cal search speciﬁes eight different activities in shunt

graphs. This original representation can be modi-

ﬁed to include more information in the graphs. The

amount of original node labels can be increased by

including speciﬁc types of activities to effectively ex-

ploit the graph structure for a classiﬁcation task.

In our problem, shunt graphs contain, among oth-

ers, Parking (P) and Service (S) activity nodes. In-

stead of just using P and S as original node labels,

both can be encoded with more information. The spe-

ciﬁc parking track can be appended to get P

, where

i = 1, ..., T and T is the number of parking tracks

on a shunting yard. The speciﬁc service task can be

appended to get S

, where i = 1, ..., ST and ST is

the number of service tasks that can be performed on

a shunting yard. Experiments showed that including

both P

and S

is beneﬁcial.

As the train unit shunting problem is a scheduling

problem, the activity nodes in an shunt graph are en-

coded with starting and ending times. Therefore, the

nodes in an activity graph are implicitly sorted based

on the starting time. Thus, the sorting function of

Sortpooling in DGCNN is redundant, and therefore is

removed from our model. Figure 5 shows the network

structure that we use in our problem. It is slightly

ICAART 2019 - 11th International Conference on Agents and Artiﬁcial Intelligence

288

modiﬁed from DGCNN in (Zhang et al., 2018).

4 EXPERIMENTS

We evaluate how much running time can be reduced

in determining capacity in shunting yards with our ap-

proach. To this end, we ﬁrst generate and analyze data

from the simulator. Then we report the performance

of the DGCNN model on predicting whether initial

plans would lead to feasible plans. From the perfor-

mance of DGCNN, we can ﬁnally estimate the differ-

ence of running time with or without using DGCNN

feasibility prediction in the simulator (illustrated in

Figure 2).

4.1 Data Generation

In order to evaluate our method, we generate data in-

stances from the instance generator in the shunt plan

simulator. The instance generator can be speciﬁed ac-

cording to a set of input parameters based on the day-

to-day schedule at the given service site. The most

important parameters include: (1) number of train

units, (2) different train unit types and subtypes, (3)

probability distributions of arrivals per train unit type,

and (4) set of service tasks including duration.

We generated 10,000 instances with 21 train units

based on one of the service sites operated by NS. The

amount of 21 train units has been purposely chosen.

An increasing number of train unit increases the dif-

ﬁculty in ﬁnding feasible solutions. The preliminary

experiments have shown that the instances with fewer

train units are rather easy for the local search algo-

rithm to ﬁnd feasible solutions and hence, less insight-

ful and valuable to the business. For the shunting yard

that we used in the experiments, the instances with 20

to 22 train units are most interesting for NS, as they

are neither easy nor too difﬁcult for LS. Among them,

initial solutions generated for 21 train units are the

hardest to be correctly classiﬁed, and therefore they

are considered the most suitable data to explore the

usefulness of our approach to NS.

Initial solutions were created for all instances and

LS was applied to solve them. The maximum running

time for LS to solve each instance is set to 300 sec-

onds. Among 10,000 instances, LS was not able to

ﬁnd feasible solutions for 2,750 instances. The out-

comes (feasible, infeasible) were recorded as classiﬁ-

cation labels, where feasible instances (class 1) are

initial solutions leading to feasible plans using LS

within 300 seconds, while infeasible ones (class 0)

are those LS could not ﬁnd feasible plans within the

time limit.

Figure 6: Distribution of iterations for feasible and infeasi-

ble solutions.

Figure 7: Scatter plot of iterations versus runtime.

Figure 8: Histogram of the runtime (in seconds) for feasible

solutions.

Figure 9: T-SNE visualization on the data instances.

Figure 6 visualizes the distribution of iterations

for both feasible and infeasible instances. Regarding

feasible solutions, the minimum and maximum num-

bers of iterations in local search are 108 and 2599

respectively with an average of 733 iterations. The

minimum and maximum number of iterations for in-

feasible solutions are 1057 and 2939 with an average

Determining Capacity of Shunting Yards by Combining Graph Classiﬁcation with Local Search

289

of 1962 iterations. Clearly, the number of iterations

for infeasible solutions are much higher because lo-

cal search ran for the maximum time of 300 seconds

and was not able to ﬁnd a feasible solution. Figure

7 shows a scatterplot with the number of iterations

on the x-axis and runtime on the y-axis. The runtime

of feasible instances increases as the number of itera-

tions increases. The spread in the beginning is small,

meaning that the time per iteration is quite similar. As

the runtime increases, the spread becomes larger. Fig-

ure 8 shows a histogram of the runtime for all feasible

instances. Infeasible instances are omitted for clarity

because their runtime is always around 300 seconds.

Considering feasible instances, the minimum runtime

is 1 second, while the maximum runtime is 300 sec-

onds. The average runtime is 96 seconds. ±80% of

all feasible instances has been found within 150 sec-

onds.

In addition, we use T-distributed Stochastic

Neighbor Embedding (t-SNE) (van der Maaten and

Hinton, 2008) visualize the generated data instances.

It models each high-dimensional object by a two-

dimensional point in such a way that similar objects

are modeled by nearby points and dissimilar objects

are modeled by distant points with high probability.

t-SNE can be useful in visualizing patterns in data as

well as the separability of classes. Clearly separable

clusters indicate good classiﬁcation can be achieved,

while mixed clusters indicate the opposite. Figure 9

shows a visualization of the node feature data after

passing graphs through multiple convolution layers.

The ﬁgure indicates these two classes of graphs are

highly overlapped and thus form a difﬁcult classiﬁca-

tion problem.

4.2 Predicting Feasibility by DGCNN

The PyTorch (0.4.0) implementation of DGCNN is

used with Python (3.6.4) for the experiments. Train-

ing was done on an 1.7 GHz Intel Core i7 MacBook

Air. The DGCNN implementation is not parallelized,

thus only 1 CPU core is used. Every time a new epoch

begins, training data is randomly shufﬂed and pro-

cessed in batches of several graphs to enable faster

learning.

When applying DGCNN, we need to determine

the level of details, or node representation, on the

node labels in the graph. We apply the Weisfeiler-

Lehman subtree kernel (Shervashidze et al., 2011a) to

append node labels of the neighbouring nodes to the

original node labels. The appended labels are sorted

alphabetically and compressed into new, shorter la-

bels. At the end of an iteration, the counts of the

original node labels and the counts of the compressed

node labels are represented as a feature vector. Neu-

ral networks are trained on these feature vectors. The

original node labels deﬁne how many new node labels

will be created after appending neighbouring node la-

bels. The length of the feature vector depends on the

amount of different node labels in the initial solution.

Figure 10 visualizes how the length of the feature vec-

tor changes if node labels differ for the same graphs.

In the left graph, originally, all nodes have the same

node label. The right graph originally contains three

different node labels. The appended and compressed

labels after one iteration of the Weisfeiler-Lehman

subtree kernel are visualized below both graphs. The

feature vectors of both graphs contain the counts of

the compressed node labels after one iteration. As

can be seen, the length of the feature vector gets big-

ger when the level of detail (variety of node labels in

the original graph) increases.

Figure 10: Different feature vectors for different amounts

of node labels.

Too many node labels may result in very dissim-

ilar feature vectors. In both cases, neural networks

may not be able to distinguish between feasible and

infeasible instances. Hence it is important to select

the best amount of the original node labels. Based on

the data associated to the nodes (Figure 4) the level of

detail can be determined in the following three ways:

(1) regular labels, (2) regular labels and service tasks,

(3) regular labels, service tasks, and parking loca-

tions. Regular labels are the labels of the 8 node types

(7 illustrated in Figure 4 and one additional activity

called “Saw Move”). This level of detail would result

in the smallest feature vectors. One step further is to

specify the service task as a node type. On the cho-

sen service site, ﬁve different service tasks are avail-

able: internal cleaning, soap external cleaning, oxalic

external cleaning, technical checkup A, and techni-

cal checkup B. Either one would replace the regular

service task node (S) resulting in 12 different node la-

bels. The most detailed representation speciﬁes both

parking locations and service tasks, which results in

ICAART 2019 - 11th International Conference on Agents and Artiﬁcial Intelligence

290

24 different node labels given 13 different tracks in

the service site.

4.2.1 Sampling

The generated instances are not balanced, with 7205

samples in class 1, and 2795 in class 0. Under-

sampling and oversampling are two commonly used

methods dealing with class imbalance problems. The

risk of undersampling is loss of information due to

removing potentially important instances. Whereas

oversampling increases the possibility of overﬁtting.

We create three datasets with different balancing

strategies: (1) only undersampling, with 2795 sam-

ples for both classes, (2) only oversampling, with

7205 samples for both classes, (3) both under- and

oversampling, with 5000 samples for both classes.

Together with the three different node representa-

tion strategies, we end up with nine datasets. DGCNN

is applied on all nine datasets to ﬁnd the best combi-

nation of methods dealing with class imbalance and

the level of detail of node labels. We use 5-fold cross

validation. Table 1 shows the classiﬁcation perfor-

mance. It shows that the performance increases as we

add more detailed information about the planning in-

stances on the nodes in the graph. The three datasets

with the highest level of detail are highlighted in the

table. In addition, the results show that using under-

sampling is the best of the methods dealing with class

imbalance. The runtime for undersampled datasets

is also signiﬁcantly lower than when (a combination

with) oversampling is used, which is logical as the

undersampled dataset is smaller.

4.2.2 Hyperparameter Tuning

We use the best performing dataset to tune the hyper-

parameters of DGCNN using grid search. The follow-

ing combination of parameter values has achieved the

best performance and is used to generate the ﬁnal pre-

diction model: (1) unifying nodes in graph: 0.7; (2)

learning rate: 1 × 10

−5

; (3) number of convolution

layers: 3; (4) number of nodes in convolution layers:

64; (5) number of training epochs: 120; (6) batch size:

100.

Table 2 shows the confusion matrix of the ﬁnal

classiﬁcation model. Each column of the matrix rep-

resents the instances in a predicted class while each

row represents the instances in an actual class. Each

cell counts the number of instances that corresponds

to the row and column value. Correctly predicted

classes are true negatives (TN; top left cell) and true

positives (TP; bottom right cell). Incorrectly pre-

dicted classes are false negatives (FN; bottom left

cell) and false positives (FP; top right cell). The ﬁ-

nal classiﬁcation model of DGCNN is able to predict

feasibility of an initial solution with 65.1% accuracy.

It has been shown to be a difﬁcult classiﬁcation prob-

lem. A previous study (Dai, 2018) applied heavy fea-

ture engineering and tested various classiﬁers for this

classiﬁcation task, which resulted in a highest accu-

racy of 66.3%. However, to derive features, that ap-

proach assumes extensive domain knowledge on the

shunting services planning problem. In comparison,

DGCNN takes initial solutions directly as inputs.

Despite the difﬁculty of the classiﬁcation problem,

in the next section, We show the value of our approach

in speeding up ﬁnding feasible solutions for capacity

determination.

4.3 Accelerating Simulations to

Determine Capacities

Being able to predict feasibility of an initial solution

before applying local search may lead to a decrease

in computation time when determining the maximum

capacity of a service site. We measure the effect of

our approach by calculating the expected difference

in running time with and without using DGCNN. As

every instance was solved by the local search heuristic

and its running time was recorded, we derive in Table

3 the running time of LS without DGCNN for all four

types of instances (TN, FP, FN, and TP), as well as

the average running time of feasible and infeasible in-

stances.

The total running time on the testing data with-

out applying DGCNN in Table 3 is 221,710 seconds,

roughly 62 hours. This is the existing situation, where

the local search algorithm has to evaluate every gen-

erated instance. We call our approach where DGCNN

is applied to predict the feasibility of instances before

applying LS “the new situation”.

We use the following process to estimate the run-

ning time in the new situation. For each instance in

the test set, DGCNN is used to predict whether it

is feasible or infeasible. If feasible, the local search

heuristic is applied to ﬁnd a feasible solution (or ter-

minate if it turns out to be infeasible given the prede-

ﬁned time limit). If, however, the predicted outcome

is infeasible, this instance is discarded immediately

and a new instance is drawn from the instance gener-

ator. This new instance is again fed to DGCNN, and

the prediction of feasibility leads to either applying

LS, or discarding this instance. This process contin-

ues until all instances have been classiﬁed as feasible.

Figure 11 shows a Markov Transition Diagram to vi-

sualize this process, where the probabilities of transi-

tions are obtained from Table 2.

Determining Capacity of Shunting Yards by Combining Graph Classiﬁcation with Local Search

291

Table 1: Accuracy, standard deviation and runtime DGCNN on test sets of 9 datasets.

DGCNN

RESULTS

Undersampling Oversampling

Undersampling and

Oversampling

# node labels 8 12 24 8 12 24 8 12 24

Accuracy (%) 59.89 60.80 62.10 60.15 60.85 61.28 60.15 60.19 60.89

Standard

deviation (%)

±1.28 ±0.92 ±0.88 ±0.47 ±0.74 ±0.16 ±0.56 ±1.06 ±0.50

runtime (h) 3.6 3.6 3.6 13.1 13.1 13.1 10.4 10.4 10.4

Table 2: Confusion matrix of the ﬁnal classiﬁcation model

DGCNN.

Predicted labels

0 1

Correct

Incorrect

372

33.3%

185

16.5%

67%

33%

Actual

labels 1

205

18.3%

356

31.9%

63%

37%

Correct

Incorrect

64%

36%

66%

34%

65.1%

34.9%

Table 3: Runtime per quadrant and average runtimes.

Quadrant Time (sec)

True negatives 110,877

False positives 56,037

False negatives 24,362

True positives 30,434

Averages Time (sec)

Average feasible 97.7

Average infeasible 299.7

Figure 11 shows that if an instance is classiﬁed

as feasible, it will never leave that state. Note that

being classiﬁed as feasible can either be correct (true)

or incorrect (false). Since no new instances will be

generated for instances classiﬁed as FP or TP, those

runtimes remain the same in the new situation. If an

instance is classiﬁed as infeasible, a new instance is

drawn. This new instance can be transferred to any

other state based on the probabilities. The runtime for

TN and FN will change in the new situation.

The total runtime for the TN instances without

DGCNN is 110,877 seconds. The total runtime de-

creases to 62,123 seconds when using DGCNN. A

decrease of 44.0%. The runtime for FN without

DGCNN is 24,362 seconds. The total runtime in-

creases to 34,259 seconds when using DGCNN. This

is because the instances are actually feasible, but in-

correctly classiﬁed as infeasible. Therefore, new in-

stances will be generated and some of those will turn

out to be infeasible, causing a longer running time.

While the runtime for FN instances increased with

40.6%, the total runtime of all instances decreased

FN TP

0.183

0.333

0.319

0.165

0.319

0.165

0.333

0.183

Figure 11: Markov Transition Diagram transfer probabili-

ties.

with 38,857 seconds to a total of 182,853 seconds.

This decrease in runtime will save 17.5% when deter-

mining the capacity of a service site.

We have shown that using our approach, roughly

51 hours can be saved in determining whether one

given shunting yard has sufﬁcient capacity in han-

dling 21 train units with one particular scenario. Such

tests have to be done for more than 50 scenarios.

Hence, our approach will save about 100 days on de-

termining whether 21 train units can be handled in the

testing yard. Furthermore, if it is concluded that the

site has sufﬁcient capacity for 21 trains units, the sce-

narios with 22 or more train units will be generated

and tested in order to ﬁnd out the maximum number

of the units that the given shunting yard can deal with.

With 35 service sites in the Netherlands, the time re-

duction using our approach has a great impact.

5 CONCLUSION

In this work we have shown that combining a

Deep Graph Convolutional Neural Network with lo-

cal search leads to a decrease in computation time in

determining capacities in shunting yards. The compu-

tation time was decreased by 17.5% determining the

capacity of one shunting yard if DGCNN is used to

predict whether an initial solution will become feasi-

ICAART 2019 - 11th International Conference on Agents and Artiﬁcial Intelligence

292

ble after applying local search.

Our results demonstrate how existing research in

graph classiﬁcation can be used to boost optimization

algorithms in an industrial application. It shows the

value of using machine learning models as approxi-

mation functions of optimization algorithms in ﬁnd-

ing solutions. As future work, we may increase the

performance of our approach by collecting data in a

more robust way. Since there are a lot of randomness

in creating initial solutions and in the process of ﬁnd-

ing feasible solutions, it could be beneﬁcial to apply

local search multiple times on one initial solution.

ACKNOWLEDGEMENTS

The work is partially supported by the NWO funded

project Real-time data-driven maintenance logistics

(project number: 628.009.012).

REFERENCES

Amaran, S., Sahinidis, N. V., Sharda, B., and Bury, S. J.

(2016). Simulation optimization: a review of algo-

rithms and applications. Annals of Operations Re-

search, 240(1):351–380.

Boysen, N., Fliedner, M., Jaehn, F., and Pesch, E. (2012).

Shunting yard operations: Theoretical aspects and

applications. European Journal of Operational Re-

search, 220(1):1–14.

Carson, Y. and Maria, A. (1997). Simulation optimization:

Methods and applications. In Winter Simulation Con-

ference Proceedings, pages 118–126.

Dai, L. (2018). A machine learning approach for optimiza-

tion in railway planning. Master’s thesis, Delft Uni-

versity of Technology.

Defourny, B., Ernst, D., and Wehenkel, L. (2012). Sce-

nario trees and policy selection for multistage stochas-

tic programming using machine learning. Journal on

Computing. Published online before print.

Hopcroft, J. and Karp, R. (1973). An algorithm for max-

imum matchings in bipartite graphs. Annual Sympo-

sium on Switching and Automata Theory, 2(4):225–

231.

Kipf, T. and Welling, M. (2016). Semi-supervised classi-

ﬁcation with graph convolutional networks. CoRR,

abs/1609.02907.

Kroon, L. G., Lentink, R. M., and Schrijver, A. (2008).

Shunting of passenger train units: an integrated ap-

proach. Transportation Science, 42(4):436–449.

Lombardi, M. and Milano, M. (2018). Boosting com-

binatorial problem modeling with machine learning.

In Proceedings of the Twenty-Seventh International

Joint Conference on Artiﬁcial Intelligence (IJCAI-18),

pages 5472–5478.

Meisel, S. and Mattfeld, D. (2010). Synergies of operations

research and data mining. European Journal of Oper-

ational Research, 206(1):1–10.

Neumann, M., Garnett, R., Bauckhage, C., and Kersting,

K. (2016). Propagation kernels: efﬁcient graph ker-

nels from propagated information. Machine Learning,

102(2):209–245.

Niepert, M., Ahmed, M., and Kutzkov, K. (2016). Learn-

ing convolutional neural networks for graphs. CoRR,

abs/1605.05273.

Peer, E., Menkovski, V., Zhang, Y., and Lee, W.-J. (2018).

Shunting trains with deep reinforcement learning. In

Proceeding of 2018 IEEE International Conference

on Systems, Man, and Cybernetics. ieee.

Shervashidze, N., Schweitzer, P., Leeuwen, E. J. v.,

Mehlhorn, K., and Borgwardt, K. M. (2011a).

Weisfeiler-lehman graph kernels. Journal of Machine

Learning Research, 12(Sep):2539–2561.

Shervashidze, N., Schweitzer, P., van Leeuwen, E.,

Mehlhorn, K., and Borgwardt, K. (2011b). Weisfeiler-

lehman graph kernels. Journal of Machine Learning

Research, 12:2539–2561.

van den Broek, R. (2016). Train shunting and service

scheduling: an integrated local search approach. Mas-

ter’s thesis, Utrecht University.

van den Broek, R., Hoogeveen, H., van den Akker, M., and

Huisman, B. (2018). A local search algorithm for train

unit shunting with service scheduling. Transportation

Science, submitted.

van der Maaten, L. and Hinton, G. (2008). Visualizing data

using t-SNE. Journal of Machine Learning Research,

2579-2605:671–680.

Verwer, S., Zhang, Y., and Ye, Q. C. (2017). Auction opti-

mization using regression trees and linear models as

integer programs. Artiﬁcial Intelligence, 244:368–

395.

Zhang, M., Cui, Z., Neumann, M., and Chen, Y. (2018). An

end-to-end deep learning architecture for graph clas-

siﬁcation. In AAAI, pages 4438–4445.

Determining Capacity of Shunting Yards by Combining Graph Classiﬁcation with Local Search

293