lity. This problem is solved by taking a c ollaborative
filtering app roach, inspired by (de Smet et al., 2017),
that evaluates both athletes and races at the same time.
In this way, equivalent distances approximation can
be computed from a collection o f race results tak ing
the attendees’ level into account.
The second problem is that previous attempts fully
describe race elevation profiles by two global metrics
that are either the cumulative elevation g ain
1
(Scarf,
2007) or the average elevation gradient of non-loop
races (fell running races) that present a relatively co n-
stant gradient (Kay, 2012). This is rarely observed
in practice. In the present p a per, the full elevation
profile is considered; this allows for a more realistic
relationship extraction.
To estab lish the desired relationship between flat
equivalent distances and route elevation profiles, one
need to possess races’ equivalent distances for which
the elevation profiles are kn own. For this purpose,
races equivalent distances are, first, established using
race results. This step is referred to as collaborative
filtering (Section 3) . Then, taking elevation profile
data as inputs, a regression mod el that reproduces the
obtained race equivalent distances is built (Sectio n 4).
These two steps are validated by assessing how equi-
valent distances improve race time prediction compa-
red to actual distances.
In the following, all equations are expressed using
SI unit system: speeds in [m/s], times in [s] and eleva-
tion gr adient in [m/m]. Other u nits are used in figures
for convenience.
2 DATA SOURCES
The two steps methodology requires two kind s of
data. In the first step (the collaborative filterin g part),
race results are used to compute flat equivalent distan-
ces for a set of races. In the second step (the flat equi-
valency modelling part), elevation profiles are used to
model flat equivalent distances as a function of the in-
stant elevation gradient along the routes.
2.1 Race Results
A set of 228 031 races times was gathered b y parsing
official results of 61 6 Belgian races. Th ey represent
a large variety of endurance r aces that took place in
2014 and 2015. From these results a subset of 179
674 race times (445 races, 7480 athletes) is kept to
obtain a data set presenting properties that allow a co l-
laborative filtering a pproach to operate (as explained
1
The cumulative elevation gain is the sum of all positive
vertical displacement along the route.
in Section 3.3). Race results are used to com pute flat
equivalent distances that are then put in relation with
race elevation data.
2.2 Elevation Data
Race routes data were collected through measure-
ments made by runners during the races using their
sports watches. Runners uploaded tracks and made
them publicly available to the online community.
Those tracks contain data such as geographic coor-
dinates, timestamps and altitudes. Consumer grade
GPS-based elevations have poor accuracy (Bauer,
2013). In a previous work (de Sm et et al., 2017) route
elevations we re gathered by querying publicly availa-
ble topography data such a s SRTM data (Shuttle Ra-
dar Topography Mission) or Google Maps APIs. They
are both based on radar topography survey made from
space. It is observed that, in su ch databases, the alti-
tudes of the treetops is assigned to route parts that are
covered by trees: this causes artificial high elevation
gradients on routes that pass under trees.
Fortunately, some high-end sports watches in-
clude barometric altimeter that have good relative
accuracy: the altitude is known with a n additive bias
that would need to be calibrated . The relative accu-
racy is what is of primarily interest, the absolute alti-
tude accuracy being irrelevant to our purpose as our
analysis is based on elevation gradient only. Routes
recorde d with such devices could be found only for
129 races o f our 445. Those 129 races are used to
model our flat equivalent distance model from eleva-
tion data.
Although more tha n two thirds of the races were
not used in the flat equivalency modelling part, they
are still useful in the c ollaborative filtering part be-
cause they help to ch aracterize athletes and there fore
improve the flat equivalent distance estimation of the
129 races that are used in the flat equivalency model-
ling section.
2.3 Instant Elevation Gradient
Elevation profiles as they are record e d, even by high-
end devices, are noisy signals that need to be filtere d;
especially b e cause our application requires to take the
gradient: th e derivative of a noisy signal can take ar-
tificially high amplitudes. A simple way to take the
gradient and smoo thing at the same time is to take the
average altitude on a n-meters distance ahead min us
the average altitude on the same distance behind. The
chosen distance acts then as a smoothing factor. Mo re
formally, if the elevation profile e(x) is re-sampled
every meter, its gradient g(x) at distance x is given