Position Paper: Ontology in the Rail Domain

The Railway Core Ontologies

Christopher Morris, John Easton and Clive Roberts

Birmingham Centre for Railway Research and Engineering, University of Birmingham, Birmingham, U.K.

Keywords:

Railway Domain Ontology, Data Integration, Railway Core Ontologies.

Abstract:

This paper presents the railway core ontologies, a group of related ontologies designed to model the rail

domain in detail. The purpose of these ontologies is to enable improved data integration in the rail domain,

which will deliver business beneﬁts in the form of improved customer perceptions and more efﬁcient use of the

rail network. The modularity of the ontologies allows for both detailed modelling of the domain at a high level

and the storing of instance data at lower levels. It concludes that the beneﬁts of improved rail data integration

are best realised through the use of the railway core ontologies.

1 INTRODUCTION

The railway core ontologies are a set of related mod-

els that capture the rail domain in depth, using a

modular structure to ensure that the ontology is as

lightweight as possible for any given application.

In the UK the technical strategy leadership group

reported that “Excluding Network Rail’s own infor-

mation systems, research discovered over 130 infor-

mation systems maintained by approximately 20 sup-

pliers” (Technical Strategy Leadership Group, 2012).

Previous projects also note the fragmented and pro-

prietary nature of the existing information systems. It

has been found that “Where electronic data exchange

standards for rail do exist, many are proprietary bi-

nary formats used to provide point-to-point interfaces

between speciﬁc systems and not intended for use in a

generalised context.” (Easton et al., 2013a). This sit-

uation caused the regulatory body responsible for the

oversight of the UK rail domain, the Ofﬁce of Rail

Regulation, to express concerns as to the current state

of asset condition monitoring (Ofﬁce of Rail Regula-

tion, 2012).

One area which will give rise to increased volumes

of data is that of remote condition monitoring as de-

scribed in (Garcia Márquez et al., 2003). Typical ap-

plications are remote sensors to report on the impend-

ing failure of railway systems such as points, in the

case of infrastructure, or bearings in the case of vehi-

cles. Similarly capable of generating large volumes of

data are inspection vehicles, which regularly run over

the rail network, to inspect the ﬁxed infrastructure.

These often record video, and many sensor readings

per second along with the time and location of the

reading. Data volumes are discussed in the context of

the US rail network by Allan Zarembski (Zarembski,

2014) and in the context of European rail networks

by Núñez et al (Núñez et al., 2014). Numerous stud-

ies, including (García Márquez et al., 2008), suggest

beneﬁts in the form of a reduction in unplanned, reac-

tive, maintenance and a reduction in manual inspec-

tion, may be derived from asset condition monitoring.

Another area experiencing growth is passenger in-

formation. Studies, such as those reported by Dziekan

and Kottenhoff in (Dziekan and Kottenhoff, 2007)

show that customer perceptions and ridership can be

improved by improving customer information.

There already exist a number of ontologies, such

as the REWERSE project, reported in (Lorenz, 2005),

that cover the broader transport domain, though these

do not aim to create a domain ontology for the rail

industry. This does include the locations of inter-

changes between modes (including rail) and the path

(both geographic and network) taken by rail lines,

along with journey planning information. The Ark-

TRANS project serves the multi-modal transport do-

main and has been employed by the Norwegian State

Railway company (NSB) to produce a functioning

journey planner along with being the basis for a num-

ber of other projects. ArkTRANS is introduced in

(Natvig and Westerheim, 2003). Further work has

been done implementing the concepts set out in the

original project, including a domain ontology for

freight transport, set out in (Gönczy et al., 2012) with

Morris, C., Easton, J. and Roberts, C..

Position Paper: Ontology in the Rail Domain - The Railway Core Ontologies.

In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - Volume 2: KEOD, pages 285-290

ISBN: 978-989-758-158-8

285

some cross over to the rail network and a project re-

lating to the maritime domain (Ørnulf, 2011).

Technology has matured since the use of ontol-

ogy for the rail domain was ﬁrst proposed by the In-

teGRail project, the ﬁnal report for which was com-

pleted in 2010 (Köpf, 2010). Much of the work done

for that project however pre-dated the availability of a

number of the tools employed by RaCoOn.

The railway core ontologies aim to allow the rail

domain to share information more readily, for exam-

ple, when a new type of asset is added, its status

can be inferred without further software development.

TThis is discussed by Tutcher et al (Tutcher et al.,

2013), where it is used by both industry and academia.

2 BENEFITS OF USING THE

RAIL CORE ONTOLOGY

The beneﬁts of using the rail core ontology stem from

the ability to draw together disparate data sources.

The following section outlines usage scenarios for the

railway core ontologies.

As an example consider component life-cycle in-

formation stored in a maintenance database. There is

little value in knowing how many times a given com-

ponent has functioned without knowing how many

times it can be expected to function. Taking this a

step further, knowing how a given component per-

forms when it is in good working order and how it

degrades allows for better informed maintenance - it

is possible to replace it before it fails. Another exam-

ple in passenger information systems is knowing that

a train passed a certain signal and was travelling at a

certain speed is useful for basic platform boards, how-

ever, knowing that there are speed restrictions further

along the line allows for more accurate information.

The railway core ontologies serve this purpose

well; not only do they serve as a standard for data in-

terchange, as do XML standards used within the rail

industry (railML), but they allow for any stakeholder

to add their own custom extensions without a need to

consult standards committees and thus avoid the de-

lay that imposes. If, for example, a company is work-

ing on a new product then they can simply create any

extensions to the ontology it may require. These ex-

tensions could then be released at the same time as

the product. Furthermore as the ontologies naturally

grow less functionality will need modelling - it will

be possible to make available product documentation

using existing ontological constructs. For this reason

nothing is limited to pre-deﬁned lists.

The use of ontological reasoning has beneﬁts

above and beyond those from the data integration.

Reasoning can for example be employed to infer the

condition of assets, the location of trains or availabil-

ity of routes, from incomplete data.

There are other beneﬁts of moving the logic out of

the ‘front end’ and into the ontologies. When the front

end is updated to reﬂect a changing IT environment

the ontologies need not be altered, conversely edit-

ing the ontology need not imply development work.

Since changes to the ontology need not be a task for

a developer then as changes are made to the rail net-

work any engineer can update the relevant ontology

to reﬂect them. Providing tools to make this possible

are part of the scope of this project.

2.1 Railway Core Ontology Usage

Scenarios

Discussion with stakeholders, such as the UK’s in-

frastructure manager, Network Rail, and a large en-

gineering ﬁrm along with work stemming from the

Factor-20 project, (Roberts et al., 2011), has identi-

ﬁed the following scenarios where the railway core

ontologies would add value.

2.1.1 Scenario 1 – Forward Planning

A scenario suggested by the UK infrastructure man-

ager was the ability to answer forward looking “what

if” questions relating to the costs and beneﬁts of as-

set maintenance. The example given was of a bridge

which is found to be in a condition such that it will

soon need heavy maintenance. In this example the

available options were to repair or replace it, with a

further choice as to the loading gauge of the replace-

ment. Whilst exact costs will vary on a case by case

basis it should be possible to estimate costs from pre-

vious work. Questions such as how much revenue

would be lost were the bridge to be replaced more

cheaply with one not gauged for freight as well as the

relative cost of repair versus replacement needed ex-

pert assessment, which was done on the basis of less

than ideal data. Data currently resides in a number

of silos: the ﬁnance system, the technical drawing

document management system and train movement

databases being three of the most prominent.

2.1.2 Scenario 2 – Maintenance Timing

A similar forward looking question answering sce-

nario, put forward by the UK infrastructure manager,

that brings together cost centre and operational infor-

mation is determining the best time to carry out main-

tenance. When maintenance is best timed, from a rev-

enue loss perspective is a non trivial question. To an-

swer it information as to the number and ticket type of

KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development

286

passengers must be brought together with information

regarding outstanding maintenance items, whilst also

considering the needs of freight operating companies.

The ontologies could also be of use more directly

in the domain of asset maintenance. This would take

available information from the physical network and

make it available at both an operational level for day

to day work and a corporate level for long term plan-

ning.

2.1.3 Scenario 3 – Train Identiﬁcation

The linking of trains with alerts from track-side mon-

itoring equipment could be more efﬁcient using the

ontologies. Wheel impact load detectors, a device for

detecting trains with misshapen wheels, are a good

example of this. These track mounted sensors mea-

sure how much force a wheel strikes the track with

and, if the force is too great, the wheel is known to

be defective and liable to damage the track. The only

data available from the detector is the time and force

of the impact. Where data from these sensors is found

to be exceptional it is currently a manual process to

identify the offending wheel and repair it. Data from

one system is printed, assessed, phone calls made,

the problematic train identiﬁed and the maintenance

department of the train operating company (who are

distinct from the infrastructure owners in most Euro-

pean countries) informed. Train movement data can

tell you in what order trains passed down a given

line. Making that information available, along with

the time of impact and axle count from the wheel im-

pact load detector would allow a system using ontolo-

gies to infer which wheel of a particular train had trig-

gered the detector. This concept was also studied as

part of the FuTRO project (Tutcher et al., 2013).

2.1.4 Scenario 4 – Predictive Maintenance

Whilst planned maintenance has a lower total cost

than corrective or condition-based maintenance, pre-

dictive maintenance is more cost effective still, as

stated by Umiliacchi et al. (Umiliacchi et al., 2011).

In the case of planned maintenance, assets are re-

paired on a ﬁxed schedule, regardless of condition;

in the case of predictive maintenance, work is done

when a computer model, informed by sensor data,

predicts it will be necessary. The intervening step of

condition-based maintenance has been taken up for

certain asset types, but often only with the use of

crude thresholds and manual inspection. Generally,

measurement equipment, be it integrated in a points

motor or attached to a measurement vehicle, sends

back continuous values for sensor readings, but these

are only taken into account should they be outside of

the speciﬁed envelope. When the reading exceeds the

speciﬁed maximum or minimum, the asset is main-

tained, thus remaining in a safe condition.

Predictive maintenance goes a step beyond this

using the trend exposed by the rate and direction of

change of the underlying data to suggest when main-

tenance would best be scheduled. Whilst this is possi-

ble on an asset by asset, system by system basis, with-

out the use of the railway core ontologies this process

would be very slow and costly, to such an extent that

it is unlikely to be undertaken on a large scale. The

railway core ontologies would allow the implementa-

tion of a network wide system with only asset speciﬁc

data being added, rather than extensive customisation.

2.1.5 Scenario 5 – Customer Information

Train location information is currently disseminated

via a system known as DARWIN, for the purposes of

passenger information. This takes the available train

position data from track-side infrastructure, the accu-

racy of which varies depending on source, and com-

bines it with timetable data. This is then displayed on

the passenger information boards in stations as well

as on a range of websites and mobile applications.

Efforts, described by Easton et al. (Easton et al.,

2013b), are underway to integrate data from GPS

units on trains with that from traditional track circuits

(regularly over a km in length) to improve accuracy.

Track circuits are electric circuits used to ascertain

which section of track a train is on. This is an area in

which the railway core ontologies can help; however

there are far greater beneﬁts to be derived when inte-

gration with other transport modes is considered. In

order as to plan a multimodal journey it is necessary

to have information from all modes available and inte-

grated in one place. Here the rail core ontology is not

the only one that needs consideration; work based on

the Arktrans model (Natvig and Westerheim, 2003)

also needs consideration, as does the General Tran-

sit Feed Speciﬁcation, itself available as linked data –

discussed in (Santos and Moreira, 2014).

3 IMPLEMENTATION

The strength of the railway core ontologies comes

from their modularity. The rail domain is very large

and a model which describes all of it in detail would,

with the currently available technology, be unsuitable

for high volume information retrieval tasks.

Several of the implementation decisions taken

build upon the work of the ﬁrst project focused on

a domain ontology for the rail domain, the Inte-

Position Paper: Ontology in the Rail Domain - The Railway Core Ontologies

287

GRail project. This, as summarised in (Köpf, 2010),

aimed to “allow IMs[Infrastructure Managers] and

RUs[Railway Undertakings] across Europe to act as

a single system.” Amongst other deliverables a “Net-

work Statement Checker” was produced as a demon-

strator. This is a tool to ascertain whether a given

train conﬁguration can run over a (often transnational)

route. This will be based on the availability of auto-

mated signalling systems, gauge, electriﬁcation and

other factors which vary from area to area. This work

was limited primarily by the available technology –

whilst the report was published in 2010 it was written

in 2009 and not all of the tools employed in the im-

plementation of the railway core ontologies existed at

that point.

The architecture of the broader system of which

the ontologies will form a part also requires consider-

ation. As originally suggested by InteGRail the sys-

tem will use a Service Orientated Architecture (SOA).

The system will by its nature be distributed across

many stakeholders. Whilst questions of ‘ownership’

of information in this context are complex, the solu-

tion will involve physically disparate systems owned

not merely by differing stakeholders, but at times,

commercial competitors.

3.1 Provenance

The number of stakeholders gives rise naturally to is-

sues of trust and provenance. These are being tackled,

in part, by the use of the prov ontology

. The issue of

access to information is further addressed by the use

of triple stores that allow selective visibility - for ex-

ample allowing users access to selected graphs.

3.2 Internal Organisation

The modular design of the ontologies is key to

their practicality. The ontologies themselves are com-

posed of a number of different namespaces only a

subset of which need to be imported for any given

task. The ontologies break into three broad cate-

gories:

1. An upper ontology, which may easily be mapped

onto other upper level ontologies as needed. The

upper level deals with concepts such as obser-

vation and place. Figure 1 shows a selection of

classes from this layer.

2. A core ontology, which can be either three or four

dimensional, depending on which ﬁle is imported.

When working with large volumes of data it is en-

visaged that the 3D version will be used, whereas

http://www.w3.org/TR/prov-o/

Figure 1: The upper ontology - http://purl.org/ub/upper.

the 4D version is better suited to model exchange.

Reiﬁcation as a means of representing changes

over time was dismissed owing to the prolifera-

tion of triples it engenders and the reduction in

OWL reasoning capability it causes.

3. Task ontologies: Task speciﬁc ontologies that

represent a deep study into a particular area.

Timetabling, Rolling Stock and Infrastructure

have been implemented, with more expected to

follow. Figure 2 shows a selection of classes from

the Rolling stock module. Note that this includes

such things as locomotive models.

Figure 2: The Rolling Stock sub-ontology.

Note that whilst all lower levels depend on the levels

above, but none of the higher levels depend on any

level below.

The way the namespaces and thus modules of the

ontology inter-relate may be observed from Figure 3,

which shows the namespaces used by the rolling stock

module. The namespace entitled “vocab” forms a part

of the core ontology, “u” is the upper level. Note that,

where possible, external namespaces are used.

Figure 3: Rail Core Ontology Namespaces.

Another implementation choice was that of com-

plexity. Here there is a clear trade off: expressivity vs

computational complexity. This trade-off has been the

subject of much work, both at this centre and within

KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development

288

the wider community. The result was that each mod-

ule is split into two parts: one containing the T-Box,

which can itself be large, and another containing con-

straints which can be far more expressive. The con-

straints ontology must then be kept as small as possi-

ble. In this case the T-box is where possible OWL-RL

compliant. A-Box raw data uses RDF schema reason-

ing whereas the constraints are as expressive as nec-

essary. The triple store with which this was developed

- stardog - allows for constraints to be reasoned over

separately and at a different complexity to other parts

of the ontology.

3.3 The FuTRO project

The FuTRO project “aimed to show how shared, open

access ontologies and linked data could help the UK

rail industry” (Tutcher et al., 2013). The Railway

Core Ontologies provided the network model used by

this project, in particular information as to which as-

set was monitored by which sensor and how these as-

sets interacted was stored. Listing 1 is an exert of the

pattern used to this end.

Listing 1: Observation Pattern. Note that comments and

labels have been omitted to improve readability.

:associatedObservation

a owl:ObjectProperty ;

rdfs:domain u:IndependentThing ;

rdfs:range u:Observation ;

owl:propertyChainAxiom ( :monitoredBy :

observedEvent ) ;

owl:propertyChainAxiom ( :dependsOn :

monitoredBy :observedEvent ) .

A demonstrator was developed, which used in-

ference to show which lines were unavailable in the

event of mechanical failure of railway switches. Sim-

ilarly the ontology has provision for storing the phys-

ical location of the asset such that maintenance teams

can get get to it.

The project has suggested that very high volume

raw data be stored separately to the ontology and only

a key to it be retained. In the case of the FuTRO

project this was accomplished using the REDIS

key

value store where appropriate. Further extensions are

envisaged to allow the ontology to advertise the web

services that could be called to retrieve that data.

Another factor aiding implementation at this time,

which was not available at the time of the InteGRail

project, is the existence of libraries for working with

linked data. The FuTRO project made used of the dot-

NetRDF

to allow software written in the “.Net” fam-

http://redis.io/

http://dotnetrdf.org/ a comprehensive and well docu-

mented open source library for interfacing with triple stores

ily of languages to interface with triple stores, which

removed one of the barriers to integration with the

project industrial partners.

As is stated in the ﬁnal report (Tutcher et al.,

2013) two demonstrators were produced, along with

that report. The train location demonstrator is pub-

licly available and accessible at: http://purl.org/rail/

trainlocator

It demonstrates the use not only of linked data but

also ontological reasoning, to infer a train’s location

even when it is no longer available in the original for-

mat.

4 FUTURE WORK

Although the core ontologies already capture enough

of the rail domain to demonstrate value many areas

require further modelling effort. Remedying this will

require input from domain experts and as such tools

need to be provided to allow railway engineers to edit

the ontology. In order to align with existing indus-

try best-practice for the procurement of new ICT sys-

tems, these tools should, where possible, be commer-

cial off the shelf software (COTS).

There is also a need to enrich samples of existing

data, documenting the process and making available

sample code for use by stakeholders in the rail in-

dustry. Tools enabling conversion from the popular

“railML” XML format would enable a quicker transi-

tion.

Tools to query the ontology and present the results

to non-technical users are also required. Network Rail

has expressed an interest in work around presenting

Key Performance Indicators to senior managers for

example, beyond the work outlined in Section 2.1.

The prioritisation and selection of data sources

for conversion also requires consideration. The pos-

sibility of using a combination of contractual com-

pulsion and reciprocal data exchange, to encourage

those stakeholders that supply assets (railway infras-

tructure, vehicles etc.) to supply them along with data

sheets in a format compatible with the railway core

ontologies needs examination. Other existing data

will need to be prioritised according to its value to

stakeholders.

The industry also requires design patterns for the

larger system and guidance for using these patterns.

Many means exist for webservices to advertise there

availability and this needs to be standardised such

that all stakeholders can access information. Distri-

bution and ownership of information issues also need

addressing - the current solution, discussed in Sec-

tion 3.1 is only partial.

Position Paper: Ontology in the Rail Domain - The Railway Core Ontologies

289

5 CONCLUSIONS

It is clear that much can be achieved by bringing to-

gether data from different sources; data currently re-

sides in isolated silos and value for all stakeholders

exists where it is brought together. This value exists

as money saved by more efﬁcient maintenance, lower

costs incurred due to equipment failure, less staff time

wasted manually compiling reports and less long term

IT expenditure. Beyond the savings there is also value

from improved passenger perceptions: improved cus-

tomer information has been shown to improve cus-

tomer perceptions of the service. Twinned with im-

proved reliability this has potential to increase rider-

ship and freight trafﬁc.

The use of an ontology as opposed to lighter

weight, data only, standards has multiple advantages:

It is easier and quicker to modify, without affecting

front-end applications. The ontologies can be mod-

iﬁed more quickly than could a traditional standard.

Information from one sub-domain can be reused in

another, and can be combined to easily obtain further

information. The ontology can be self documenting.

The disadvantages are skills, which are less common

than traditional IT skills and computational complex-

ity. These can both be overcome; the lack of IT skills

by provision of well designed software and complex-

ity by carefully addressing the trade off between ex-

pressivity and complexity discussed in Section 3.

All these advantages suggest the Rail Domain On-

tologies are the best way to bring together data in the

UK Rail Domain.

REFERENCES

Dziekan, K. and Kottenhoff, K. (2007). Dynamic at-stop

real-time information displays for public transport: ef-

fects on customers. Transportation Research Part A:

Policy and Practice, 41(6):489–501.

Easton, J., Chen, L., Davies, R., Tutcher, J., and Roberts, C.

(2013a). Ontime - Part D7.1. Technical report.

Easton, J., Chen, L., Davies, R., Tutcher, J., and Roberts, C.

(2013b). RSSB Cross industry Railway Information

Systems Workshop 7 th December 2012. Technical

Report January.

García Márquez, F. P., Lewis, R. W., Tobias, A. M., and

Roberts, C. (2008). Life cycle costs for railway condi-

tion monitoring. Transportation Research Part E: Lo-

gistics and Transportation Review, 44(6):1175–1187.

Garcia Márquez, F. P., Schmid, F., and Collado, J. C.

(2003). Wear assessment employing remote condition

monitoring: A case study. Wear, 255(7-12):1209–

1220.

Gönczy, L., Kövi, A., Andreas, P., Thalhammer, A., and

Stavrakantonakis, I. (2012). e-Freight ontology. Tech-

nical Report 233758.

Köpf, H. (2010). InteGrail – Publishable Final Activity Re-

port. Technical report.

Lorenz, B. (2005). A1-d4 ontology of transportation net-

works.

Natvig, M. K. and Westerheim, H. (2003). Joint effort on es-

tablishment of ARKTRANS–a system framework ar-

chitecture for multimodal transport. Technical report.

Núñez, A., Hendriks, J., Li, Z., Schutter, B. D., and

Dollevoet, R. (2014). Facilitating Maintenance De-

cisions on the Dutch Railways Using Big Data: The

ABA Case Study. In Big Data (Big Data), 2014 IEEE

International Conference on, pages 48–53, Washing-

ton, DC.

Ofﬁce of Rail Regulation (2012). Network Rail Monitor -

Quarter 1 of Year 4 of CP4 |1 April 2012 – 21 July

2012. Technical Report July.

Roberts, C., Easton, J., Sharples, S., Golightly, D., and

Davies, R. (2011). Rail Research UK Feasibility Ac-

count ( FA ): Factor 20 – reducing CO 2 emissions

from inland transport by a major modal shift to rail (

EP / H024743 / 1 ). Technical report, Birmingham.

Santos, M. Y. and Moreira, A. (2014). Integrating Public

Transportation Data: Creation and Editing of GTFS

Data. 2:53–62.

Technical Strategy Leadership Group (2012). The Indus-

try’s Rail Technical Strategy 2012. The Rail Technical

Strategy 2012. Technical report.

Tutcher, J., Easton, J., Roberts, C., Myall, R., Hargreaves,

M., and Tiller, C. (2013). Ontology-based data man-

agement for the GB rail industry Feasibility study.

Technical report, RSSB.

Umiliacchi, P., Lane, D., Romano, F., and SpA, A. (2011).

Predictive maintenance of railway subsystems using

an Ontology based modelling approach. pages 22–26.

Zarembski, A. M. (2014). Some Examples of Big Data in

Railroad Engineering. In IEEE BigData 2014, Wash-

ington, DC.

Ørnulf, R. J. (2011). A Maritime ITS Architecture for e-

Navigation and e-Maritime: Supporting Environment

Friendly Ship Transport. pages 1156–1161.

KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development

290