Time-Aware Contrastive Representation Learning for Road Network and

Trajectory

Ashraful Islam Shanto Sikder and Naushin Nower

Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh

{bsse1124, naushin}@iit.du.ac.bd

Keywords:

Contrastive Learning, Mutual Information, Transformer, Hard Negative Sampling.

Abstract:

Modeling and learning representations for road networks and vehicle trajectories is essential for improving

various Intelligent Transportation System (ITS) applications. Existing methods often treat road network and

trajectory data separately, focus only on one, employ two-step processes that result in information loss and

error propagation, or ignore temporal dynamics. To address these limitations, we propose a framework called

Time-Aware Contrastive Representation Learning for Road Network and Trajectory (TCRLRT). Our approach

introduces an end-to-end model that simultaneously learns road network and trajectory representations, en-

hanced by a temporal encoding module that captures temporal information and a synthesized hard negative

sampling module to enhance the discriminative power of the learned representations. We validate the ef-

fectiveness of TCRLRT through extensive experiments conducted on two real-world datasets, demonstrating

improved performance over baseline methods across multiple downstream tasks. The results highlight the

advantages of joint representation learning with temporal modeling and hard negative sampling, leading to

robust and versatile representations.

1 INTRODUCTION

Vehicle technology and intelligent transportation sys-

tems (ITS) (Yangxin Lin and Ma, 2017) are key to

enhancing safety, efﬁciency, and sustainability. A

fundamental challenge within ITS is to accurately

model and understand the interactions between road

networks and vehicle trajectories. Effective represen-

tation learning (Yoshua Bengio and Vincent, 2013)

for road networks and trajectories transforms complex

spatial-temporal data into machine-interpretable for-

mats, facilitating various applications such as travel

time estimation, trafﬁc speed inference, route predic-

tion, etc.

A road network can be described as a type of

graph structure that represents the interconnected lay-

out of road segments, encompassing both the topolog-

ical structure and additional contextual details about

the connections between these segments. On the other

hand, a trajectory represents sequential data com-

posed of successive road segments that capture spa-

tial and temporal movement patterns, embedding the

dynamic nature of mobility and the associated seman-

tics. The structural and temporal characteristics of

road networks and vehicle movements can be cap-

tured through representation learning methods, and

the learned representations can be directly used in a

variety of downstream tasks by ﬁne-tuning.

Most existing representation learning models ei-

ther focus solely on road networks or trajectory

data (Tobias Skovgaard Jepsen and Nielsen, 2019;

Ning Wu and Pan, 2020; Meng-xiang Wang and

Yu, 2019). Treating them separately leads to ig-

noring the valuable inter-relations between them. l

Recent approaches have demonstrated that integrat-

ing road network representations into trajectory learn-

ing, or vice-versa leads to more robust representations

(Peng Han and Zhang, 2021; Yile Chen and Ellison,

2021; Yu Zheng and Ma, 2009). Existing methods of-

ten adopt a two-stage approach, where they ﬁrst learn

the representation of one aspect and then use it as a

foundation for the other. However, this type of ap-

proach leaves room for error propagation between the

stages and can not directly deﬁne the objectives to

learn the cross-scale relationship between road net-

work and trajectory.

Contrastive learning (Ting Chen and Hinton,

2020) has recently emerged as a promising technique

in semi-supervised settings, leveraging cross-scale in-

formation to learn the inter-related representations

effectively, by maximizing their mutual information

(Philip Bachman and Buchwalter, 2019). JCLRNT

498

Sikder, A. I. S. and Nower, N.

Time-Aware Contrastive Representation Learning for Road Network and Trajectory.

DOI: 10.5220/0013297700003941

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 11th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2025), pages 498-505

ISBN: 978-989-758-745-0; ISSN: 2184-495X

(a) Two different trajectories. (b) Extracted road network.

Figure 1: Two different trajectories with the same source

and destination (a) and the underlying road network (b) in

Xian.

(Zhenyu Mao and Zhao, 2022) introduced a jointly

contrastive learning framework that learns road net-

work and trajectory representations simultaneously.

However, they overlooked temporal dynamics, which

are crucial for representation learning.

To address the limitations, we propose Time-

Aware Road Network and Trajectory Contrastive

Representation Learning (TCRLRT). Our model is

an end-to-end approach that jointly learns representa-

tions of road networks and trajectories while incorpo-

rating temporal information through a learnable tem-

poral encoding module. This enables the model to

capture not only the spatial structure but also the time-

sensitive nature of vehicle movements. Additionally,

we enhance the training process through the synthe-

sis of hard negative examples (Joshua Robinson and

Jegelka, 2020), which helps the model learn more dis-

criminative representations and improves its overall

performance.

In summary, our key contributions include:

1. An integrated model that jointly learns road and

trajectory representations, enriched with temporal

information to capture dynamic trafﬁc behavior.

2. Incorporating a hard negative sampling strategy to

optimize the training process and enhance model

robustness.

3. Comprehensive experimental validation on real-

world datasets across multiple downstream tasks

demonstrates the effectiveness and versatility of

the proposed approach.

2 RELATED WORKS

Representation learning in road networks and tra-

jectories has attracted substantial attention in recent

years due to its importance in various trafﬁc-related

applications. Existing research can be broadly cate-

gorized into road network representation learning, tra-

jectory representation learning, and joint approaches

that integrate both types of data.

The study of road network representation learn-

ing typically aims to capture road segments’ struc-

tural and functional properties. Traditional graph

embedding methods, such as Node2Vec (Grover and

Leskovec, 2016), employ biased random walks and

skip-gram models to learn node embeddings, mak-

ing them general-purpose but often insufﬁcient for

road-speciﬁc tasks. Similarly, DGI (Deep Graph Info-

max) (Velickovic et al., 2019) leverages unsupervised

learning to maximize mutual information between lo-

cal and global graph representations but lacks explicit

trafﬁc and road-speciﬁc adaptations. More special-

ized methods have been developed to address these

limitations. For example, RFN (Relational Fusion

Networks) (Tobias Skovgaard Jepsen and Nielsen,

2019) introduces a more targeted approach, model-

ing interactions among nodes and edges through rela-

tional views and message passing. IRN2Vec (Meng-

xiang Wang and Yu, 2019) focuses on capturing the

relationships between road segment pairs using sam-

ples from the shortest paths, enhancing the embed-

ding process by incorporating task-related informa-

tion through multi-objective learning. HRNR (Hi-

erarchical Road Network Representation) (Ning Wu

and Pan, 2020) advances these efforts by employing a

hierarchical GNN (Scarselli et al., 2009) architecture

to embed functional and structural properties at mul-

tiple levels—from road segments to larger structural

regions. Despite these advancements, many of these

methods either neglect trajectory data or only utilize

it in isolated post-processing steps, missing out on

potentially mutually beneﬁcial learning between road

segments and trafﬁc movement

Trajectory representation learning methods pri-

marily focus on modeling sequential movement data

for downstream tasks such as travel time prediction

and similar trajectory search. T2Vec (Xiucheng Li

and Wei, 2018) takes an approach employing an

encoder-decoder structure with LSTM (Hochreiter

and Schmidhuber, 1997) units to handle noisy tra-

jectory sequences and reconstruct trajectories to en-

hance representation learning. Advanced methods

such as Toast (Yile Chen and Ellison, 2021) go fur-

ther by integrating road network context with trajec-

tory data, applying a Transformer-based module to

incorporate auxiliary trafﬁc information. This multi-

step approach has demonstrated success in improv-

ing trajectory-based task performance. Similarly,

GTS (Graph Trajectory Similarity) (Peng Han and

Zhang, 2021) combines POI embeddings and GNN-

LSTM networks to represent trajectories by learn-

Time-Aware Contrastive Representation Learning for Road Network and Trajectory

499

ing both point-wise and sequence-level dependencies.

Although these approaches address trajectory repre-

sentation to varying degrees, they often do so with-

out a uniﬁed approach that fully integrates road net-

work data, which can lead to suboptimal performance

in downstream applications.

Recently there have been efforts to create inte-

grated models that leverage both road and trajec-

tory data to embed interconnected elements simul-

taneously. Joint Contrastive Learning of Road Net-

work and Trajectories (JCLRNT) (Zhenyu Mao and

Zhao, 2022) presents a signiﬁcant step forward by

employing a contrastive learning framework to maxi-

mize mutual information between road and trajectory

representations. However, it ignores modeling the

temporal information in the learning phase. START

(Jiawei Jiang and Wang, 2023) also proposes a frame-

work for utilizing the road network and trajectories

simultaneously, including temporal embeddings with

minutes index and day-of-week index. However, it

focuses only on trajectory representation learning.

Our proposed model builds on these existing ap-

proaches by addressing their limitations and further

enhancing the learning process. Speciﬁcally, we

model an end-to-end contrastive learning framework

with within-scale and cross-scale mutual informa-

tion maximization, incorporating temporal informa-

tion through a separate temporal encoder. The tempo-

ral information is modeled as a time-ordered sequence

in replacement of the ordinary positional encoding.

We also provide the model with synthesized harder

negative samples. This allows for a more comprehen-

sive and robust representation, leading to improved

performance across various road and trajectory-based

tasks compared to the baseline methods.

3 PRELIMINARIES

In this section, we introduce the notation and prelim-

inaries, followed by the formal problem deﬁnition.

Scalars are represented in italics (e.g., n), vectors in

lowercase boldface (e.g., h), matrices in uppercase

boldface (e.g., A), and sets in script capitals (e.g., G).

3.1 Notations and Deﬁnitions

Road Network: A road network is modeled as a di-

rected graph G = ⟨S,A

⟩, where S is the set of vertices

representing road segments, with |S| as the number of

segments. The adjacency matrix A

∈ R

|S|×|S|

has en-

tries A

] that are binary, indicating whether there

is a common intersection between the end of segment

and the start of segment s

Trajectory: A trajectory T is a time-ordered se-

quence of pairs of consecutive road segments and

timestamps, represented as T = [⟨s

⟩]

i=1

, where

∈ S denotes the i-th road segment in the trajectory,

and t

is the visit timestamp for s

. Trajectories capture

the movement of an object within the road network G.

Representation Learning for Road Networks and

Trajectories: Given a road network G = ⟨S,A

⟩ and a

set of historical trajectories D, the objective is to learn

a representation matrix H

∈ R

|S|×d

, where the i-th

row, h

, represents the embedding for road segment

. Additionally, for each trajectory T ∈ D, we aim to

learn a representation vector h

∈ R

4 PROPOSED TCRLRT METHOD

This section introduces a novel road and trajectory

representation learning model called Time-Aware

Contrastive Representation Learning for Road Net-

work and Trajectory (TCRLRT). Figure 2 overviews

our proposed model. The proposed TCRLRT takes

the road network and trajectory sequence as input.

These inputs are processed through an encoding mod-

ule which consists of a graph encoder (GAT) for road

network representation learning, a sequence encoder

for trajectory encoding, and a temporal encoder. Then

we calculate the loss function with its three com-

ponents which estimate mutual information (MI) for

road-road, trajectory-trajectory, and road-trajectory

pairs. Finally, we jointly maximized these three MI

estimators to obtain the representations.

4.1 Encoding Module

The encoding module includes a graph encoder for

road network representation and a sequence encoder

for trajectory representation. We also use a tempo-

ral encoder to encode temporal information in trajec-

tory representations. We use Graph Attention Net-

works (GATs) (Petar Velickovic and Bengio, 2017)

and a Transformer Encoder (Ashish Vaswani and

Polosukhin, 2017) for graph and sequence encoding,

respectively.

4.1.1 Graph Encoder for Road Segment

Representations

Since road networks are directed graphs, spectral

methods are not suitable. We represent road segments

using GAT:

= GAT(V

) (1)

Here, V

is the initial embedding matrix for road seg-

ments, A

is the adjacency matrix, and H

is the out-

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

500

Figure 2: Proposed Time-Aware Contrastive Representation Learning for Road Network and Trajectory Model.

put of the graph encoder, the representation matrix of

the graph with s road segments. The ith row of H

the representation vector for the road segment in the

road segment set S. GAT enables effective handling

of directed graphs and has demonstrated superior per-

formance.

4.1.2 Sequence Encoder for Trajectory

Representation

Trajectories are encoded as sequences, using

road segment representations to generate a

trajectory representation. Given a trajectory

τ = {(s

),(s

),. .. ,(s

)}, we represent the

input to the sequence encoder as:

= {h

,. .. ,h

} (2)

where h

∈ R

is the ith row of H

, representing the

embedding of the road segment s

in the trajectory.

The input to the sequence encoder, H

, is combined

with T E, the temporal encoding of the time sequence

of the trajectory, in place of traditional positional en-

coding. The sequence encoder processes H

to pro-

duce the ﬁnal trajectory embedding h

, incorporating

both the structural information from road segments

and the temporal encoding T E to capture the timing

of the trajectory data effectively.

4.1.3 Temporal Encoding

Traditional positional encoding can only represent ba-

sic sequential orders. However, trajectory data in-

volves visit records distributed unevenly across the

temporal axis. Inspired by the work of (Huaiyu Wan

and Lin, 2022), we replace the positional encoding

by creating an encoding with making two signiﬁcant

modiﬁcations: (1) replacing the position indexes with

the preﬁx sum of time differences of consecutive pairs

in trajectories, and (2) using trainable parameters.

Formally, traditional positional encoding in trans-

formers is deﬁned as:

PE(o) = [cos(ω

o),sin(ω

o),. .. ,cos(ω

o),sin(ω

o)]

(3)

where ω

10000

2k/d

, PE represents the positional en-

coding function, o is the position index, ω

are the

positional parameters, and 2d is the dimension of the

encoding vector. The limitation of using positional

encoding in trajectory representation is that it does not

accurately capture the temporal sequence of the tra-

jectory. Positional encoding assumes a uniform distri-

bution of positions, which may work well in NLP and

vision tasks where words or image pixels are evenly

spaced. However, the temporal distance between con-

secutive road segments in a trajectory is often non-

uniform. To address this, we use a temporal encoding

deﬁned as:

TE(t) = [cos(ω

t),sin(ω

t),...,cos(ω

t),sin(ω

t)]

(4)

Here, TE replaces the position index o with an abso-

lute timestamp t, and the parameters {ω

,ω

,. .. ,ω

}

are set to be trainable. This approach enables the

transformer to capture meaningful temporal distances

between records directly within its encoder. We in-

corporate this temporal encoding into our model by

combining it with the trajectory encoding as follows:

= Pool(TransEnc(H

+ TE)) (5)

where TransEnc(·) : R

|τ|×d

→ R

|τ|×d

is the trans-

former encoder applied to the input sequences, fol-

lowed by a mean-pooling operation Pool(·) : R

|τ|×d

→

Time-Aware Contrastive Representation Learning for Road Network and Trajectory

501

. The outputs H

and h

, generated by the GAT and

transformer encoder, respectively, are used as the ﬁnal

representations of the road network and trajectory.

4.2 Negative Sampling

In contrastive learning, we need an anchor, posi-

tive samples, and negative samples that contribute to

learning the contrast in representations. Positive sam-

ples are typically taken as an augmented or denoised

version of the original anchor. For negative sampling,

methods such as random index-based sampling and

minibatch sampling are commonly used. We adopt a

synthesized negative sampling approach like MOCHI

(Yannis Kalantidis and Larlus, 2020) to create hard

negatives for an effective and optimized representa-

tion learning process. This approach combines pos-

itive samples with some negatives to generate syn-

thetic hard negatives, which helps the model learn

more challenging examples. Given embeddings of

a sample x, we generate synthetic hard negatives by

combining a positive sample x

with negative sam-

ples x

−

using a weighted average:

mixed

= α · x

−

+ (1 − α) · x

(6)

where α is chosen to be between 0.3 and 0.7 to bal-

ance the contribution of positive and negative sam-

ples.

5 CONTRASTIVE LOSS

FUNCTION

The contrastive loss function is designed to optimize

latent space representations by maximizing the mu-

tual information (MI) for positive pairs (related sam-

ples) and minimizing it for negative pairs. We use

the loss function deﬁned by (Zhenyu Mao and Zhao,

2022) as consisting of three components:

Road-Road Contrastive Loss: The road-road con-

trastive loss (L

) measures the MI between each

road segment and its contextual neighbors. The con-

text of a road segment includes structural neighbors,

recorded in adjacency matrices A

with a direct con-

nection. Formally, the road-road loss L

is deﬁned

as:

= −

|S|

∑

∈S





|C(s

∑

∈C(s

)

I(h

)





(7)

Trajectory-Trajectory Contrastive Loss: The

trajectory-trajectory loss L

T T

is formulated using the

contrastive objective:

T T

= −

|T |

∑

∈T

I(h

′

) (8)

where τ

′

is a noisy version of trajectory τ

. Noisy tra-

jectories are generated using techniques such as ran-

dom masking and replacements, along with a ”de-

tour” strategy that replaces part of the trajectory with

an alternative path sharing the same start and end-

points.

Road-Trajectory Contrastive Loss: The road-

trajectory contrastive loss L

is deﬁned as:

= −

|T |

∑

∈T

|S|

∑

∈S

] · I(h

)

(9)

where w

], representing the RS-T distance, com-

bines the original trajectory length and the length

of an alternative route, providing a ﬂexible ”soft”

weighting for potential positive samples:

] =

|τ|

|τ

′

| + δ(s

,τ)

(10)

Here, δ(s

,τ) represents the minimum number of

segments from s

to any segment in τ.

Overall Loss Function: The overall loss L is com-

puted as a weighted sum of L

, L

T T

, and L

L = λ

· L

+ λ

T T

· L

T T

+ λ

· L

(11)

where the λs are some weight parameters such that

+λ

T T

+λ

= 1. In all cases, the Jensen-Shannon

mutual information estimator is used to enhance sta-

bility against variations in the number of negative

samples.

6 EXPERIMENTS

We evaluate our proposed framework in two real-

world datasets and four trafﬁc-related tasks and com-

pare it with the state-of-art methods.

6.1 Datasets and Preprocessing

The datasets used in this study are provided by the

GAIA project in collaboration with Didi and con-

sist of two months of car-hailing trip data from the

cities of Xi’an and Chengdu, China. Each dataset in-

cludes GPS records for individual trips. Road net-

work data for both cities was gathered from Open

Street Map, and a map-matching algorithm was em-

ployed to align the GPS coordinates to speciﬁc road

segments. Through this process, trajectories were

converted into sequences of road segments. To en-

sure quality, we ﬁltered out trajectories that included

fewer than three road segments or had a duration

shorter than one minute. The Xian dataset contains

6,161 road Segments with 15,779 edges, whereas the

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

502

same numbers for the Chengdu dataset are 6,632 and

17,038. Average Road Segments per Trip for the Xian

and Chengdu dataset are 31.11 and 30.87 respectively.

6.2 Downstream Tasks and Benchmarks

We conduct four downstream trafﬁc tasks, with two

road segment-based tasks and the other two being

trajectory-based tasks. We compare our method to

several state-of-the-art road and trajectory representa-

tion learning methods, as well as graph representation

learning methods. Methods designed solely for spe-

ciﬁc tasks are excluded from the comparison, as we

aim to learn robust representations for various tasks.

Task-speciﬁc methods often include tailored repre-

sentations and components, resulting in an inconsis-

tent and unfair comparison.

6.2.1 Road Segment-Based Tasks

To assess the representation of road networks, we fo-

cus on two main tasks: (1) road label classiﬁcation

and (2) trafﬁc speed prediction.

Road Label Classiﬁcation: This task is analogous to

node classiﬁcation in graphs. Road-type labels, such

as motorways and living streets, are collected from

the Open Street Map. The ﬁve most common label

types are selected as prediction targets. A classiﬁer

composed of a fully connected layer followed by a

softmax layer is applied to the road segment represen-

tations. The performance is evaluated using Micro-F1

(Mi-F1) and Macro-F1 (Ma-F1) scores.

Trafﬁc Speed Prediction: This is a regression task

where the objective is to predict the average speed on

each road segment, calculated from trajectory data.

A linear regression model is trained using the road

representations, and the evaluation is conducted using

Mean Absolute Error (MAE) and Root Mean Squared

Error (RMSE).

6.2.2 Comparison with Existing Methods

We compare our approach to various advanced road

and graph representation methods:

• Node2Vec: Learns node embeddings by explor-

ing neighborhoods within w-hops through param-

eterized random walks.

• DGI: A contrastive learning approach that max-

imizes the mutual information between node and

graph representations.

• RFN: Builds node and edge representations based

on relational views, using message passing for in-

teraction.

• IRN2Vec: Captures relationships between road

segment pairs using shortest path samples and

multi-objective learning.

• HRNR: Utilizes a hierarchical GNN-based archi-

tecture with three levels to capture structural and

functional properties.

• Toast: Incorporates auxiliary trafﬁc context to

train a skip-gram model and uses a Transformer

module to extract travel-related semantics.

• JCLRNT: JCLRNT applies a uniﬁed framework

to learn road network representations by using

within-scale road-road contrast and cross-scale

road-trajectory contrast with an adaptive weight-

ing strategy to optimize road-trajectory represen-

tation.

6.2.3 Trajectory-Based Tasks

To evaluate trajectory representations, we focus on

two main tasks: (1) trajectory similarity search and

(2) travel time prediction.

Trajectory Similarity Search: The objective is to

identify the most similar trajectory to a given query

trajectory from a database. Trajectory representations

are used to calculate similarity scores and rank the

results in descending order. Performance metrics in-

clude Hit Ratio@10 (HR@10) and Mean Rank (MR).

Travel Time Prediction: This task involves predict-

ing the travel time for a given trajectory.

6.2.4 Benchmarks for Trajectory

Representation

The following methods are used as benchmarks for

trajectory representation:

• ParaVec: Learns paragraph embeddings by treat-

ing each trajectory as a paragraph.

• T2Vec: An encoder-decoder model that recon-

structs trajectories from noisy sequences of road

segments using LSTM units.

• Toast: Description already provided in 6.2.1

• GTS: Learns embeddings for points of interest

(POIs) followed by trajectory encoding using a

GNN-LSTM network.

• JCLRNT: Description already provided in 6.2.1

6.3 Simulation Settings

The training dataset comprises 500,000 trajecto-

ries, and we train the model using the Adam opti-

mizer (Kingma and Ba, 2015) with a batch size of 64

over 10 epochs. First, the representation vectors for

Time-Aware Contrastive Representation Learning for Road Network and Trajectory

503

Table 1: Performance Comparison for Road Label Classiﬁcation and Trafﬁc Speed Inference.

Task Road Label Classiﬁcation Trafﬁc Speed Inference

Xian Chengdu Xian Chengdu

Mi-F1 ↑ Ma-F1 ↑ Mi-F1 ↑ Ma-F1 ↑ MAE ↓ RMSE ↓ MAE ↓ RMSE ↓

Node2Vec 0.524 0.495 0.586 0.559 7.12 9.00 6.41 8.22

DGI 0.463 0.337 0.475 0.358 6.43 8.41 6.12 7.98

RFN 0.516 0.484 0.577 0.570 6.89 8.77 6.57 8.43

IRN2Vec 0.497 0.458 0.531 0.506 6.52 8.52 6.60 8.59

HRNR 0.541 0.527 0.631 0.609 7.03 8.82 6.52 8.45

Toast 0.602 0.599 0.692 0.659 5.95 7.70 5.71 7.44

JCLRNT 0.637 0.629 0.729 0.701 4.69 6.85 5.02 7.08

Proposed TCRLRT 0.645 0.634 0.742 0.713 4.57 6.78 4.96 7.01

Table 2: Performance Comparison for Similar Trajectory Search and Travel Time Estimation.

Task Similar Trajectory Search Travel Time Estimation

Xian Chengdu Xian Chengdu

MR ↓ HR@10 ↑ MR ↓ HR@10 ↑ MAE ↓ RMSE ↓ MAE ↓ RMSE ↓

Para2vec 216 0.251 279 0.205 220.5 302.7 244.7 345.4

T2Vec 46.1 0.781 38.6 0.806 165.2 240.7 207.5 311.0

Toast 10.1 0.885 13.7 0.905 127.8 190.8 175.6 265.0

GTS 11.0 0.889 12.9 0.896 126.3 186.7 176.1 267.9

JCLRNT 8.87 0.928 9.54 0.912 121.9 179.5 163.6 243.5

Proposed TCRLRT 8.50 0.932 9.10 0.916 120.0 178.2 162.0 241.5

the road segments and trajectories are extracted from

both the benchmark models and our proposed model.

These vectors, standardized to a dimension of 128, are

used in various downstream tasks. The trajectory data

set is split into training and evaluation sets based on

the date, ensuring that there is no overlap. Temporal

sequences are constructed as an absolute time-ordered

sequence with a preﬁx sum starting at 0. The value of

, λ

T T

, and λ

is found to be optimal at 0.1, 0.1,

and 0.8 respectively. The parameter α for the genera-

tion of negative samples using mixing is set to 0.3, a

lower value is chosen to produce harder samples.

6.4 Results and Analysis

The simulation results for the four tasks are presented

in Tables 1 and 2, with the best results highlighted in

bold. Higher values of Mi-F1, Ma-F1, and HR@10

indicate better performance (↑), while lower values

of MAE, RMSE, and MR indicate better performance

(↓). General methods such as Node2vec, DGI and

Para2Vec perform poorly, as they do not capture the

unique characteristics of road networks and trajecto-

ries. Methods like IRN2Vec and T2Vec perform bet-

ter due to their richer contextual information. Toast

and GTS show improvements in travel-time estima-

tion and trajectory retrieval. JCLRNT employs con-

trastive learning but does not consider temporal mod-

eling. Our approach incorporates temporal informa-

tion and a hard negative sampling strategy to optimize

training, outperforming JCLRNT and other baselines.

7 CONCLUSIONS

In this paper, we proposed a model of representa-

tion learning for road networks and trajectories. Our

approach introduces an end-to-end framework that

jointly learns road network and trajectory represen-

tations, incorporating a learnable temporal encoding

and synthesizing harder negatives for optimized train-

ing. We conducted experiments on two real-world

datasets, evaluating the model on four downstream

tasks: two focused on road segments and two on tra-

jectories. For future work, we plan to enhance hard

negative sampling techniques speciﬁcally tailored for

road network and trajectory-based tasks. We also plan

to incorporate more modalities of features like text

and images to model the representation learning with

heterogeneous graphs.

REFERENCES

Ashish Vaswani, Noam Shazeer, N. P. J. U. L. J. A. N. G.

L. K. and Polosukhin, I. (2017). Attention is all you

need. CoRR, abs/1706.03762.

Grover, A. and Leskovec, J. (2016). node2vec: Scal-

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

504

able feature learning for networks. In Proceedings

of the 22nd ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining, page

855–864. Association for Computing Machinery.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural Comput., 9(8):1735–1780.

Huaiyu Wan, Yan Lin, S. G. and Lin, Y. (2022). Pre-

training time-aware location embeddings from spatial-

temporal trajectories. IEEE Transactions on Knowl-

edge and Data Engineering, 34(11):5510–5523.

Jiawei Jiang, Dayan Pan, H. R. X. J. C. L. and Wang,

J. (2023). Self-supervised trajectory representation

learning with temporal regularities and travel seman-

tics. In 39th IEEE International Conference on Data

Engineering, ICDE 2023, Anaheim, CA, USA, April

3-7, 2023, pages 843–855. IEEE.

Joshua Robinson, Ching-Yao Chuang, S. S. and Jegelka, S.

(2020). Contrastive learning with hard negative sam-

ples. ArXiv, abs/2010.04592.

Kingma, D. P. and Ba, J. (2015). Adam: A method for

stochastic optimization. In 3rd International Confer-

ence on Learning Representations, ICLR 2015, San

Diego, CA, USA, May 7-9, 2015, Conference Track

Proceedings.

Meng-xiang Wang, Wang-Chien Lee, T.-y. F. and Yu, G.

(2019). Learning embeddings of intersections on road

networks. In Proceedings of the 27th ACM SIGSPA-

TIAL International Conference on Advances in Ge-

ographic Information Systems, pages 309–318, New

York, NY, USA. Association for Computing Machin-

ery.

Ning Wu, Xin Wayne Zhao, J. W. and Pan, D. (2020).

Learning effective road network representation with

hierarchical graph neural networks. In Proceedings

of the 26th ACM SIGKDD International Conference

on Knowledge Discovery & Data Mining, pages 6–

14, New York, NY, USA. Association for Computing

Machinery.

Peng Han, Jin Wang, D. Y. S. S. and Zhang, X. (2021). A

graph-based approach for trajectory similarity compu-

tation in spatial networks. In Proceedings of the 27th

ACM SIGKDD Conference on Knowledge Discovery

and Data Mining, pages 556–564, Virtual Event, Sin-

gapore. Association for Computing Machinery.

Petar Velickovic, Guillem Cucurull, A. C. A. R. P. L. and

Bengio, Y. (2017). Graph attention networks. ArXiv,

abs/1710.10903.

Philip Bachman, R. D. H. and Buchwalter, W. (2019).

Learning Representations by Maximizing Mutual In-

formation Across Views, pages 1–11. Curran Asso-

ciates Inc., Red Hook, NY, USA.

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M.,

and Monfardini, G. (2009). The graph neural net-

work model. IEEE Transactions on Neural Networks,

20(1):61–80.

Ting Chen, Simon Kornblith, M. N. and Hinton, G. (2020).

A simple framework for contrastive learning of visual

representations. In Proceedings of the 37th Interna-

tional Conference on Machine Learning, pages 1–11.

JMLR.org.

Tobias Skovgaard Jepsen, C. S. J. and Nielsen, T. D. (2019).

Graph convolutional networks for road networks. In

Proceedings of the 27th ACM SIGSPATIAL Interna-

tional Conference on Advances in Geographic Infor-

mation Systems, pages 460–463, New York, NY, USA.

Association for Computing Machinery.

Velickovic, P., Fedus, W., Hamilton, W. L., Li

o, P., Bengio,

Y., and Hjelm, R. D. (2019). Deep graph infomax. In

7th International Conference on Learning Represen-

tations, ICLR 2019, New Orleans, LA, USA, May 6-9,

2019. OpenReview.net.

Xiucheng Li, Kaiqi Zhao, G. C. C. S. J. and Wei, W. (2018).

Deep representation learning for trajectory similarity

computation. In 2018 IEEE 34th International Con-

ference on Data Engineering (ICDE), pages 617–628.

IEEE.

Yangxin Lin, P. W. and Ma, M. (2017). Intelligent trans-

portation system (its): Concept, challenge and oppor-

tunity. In 2017 IEEE 3rd International Conference on

Big Data Security on Cloud (BigDataSecurity), IEEE

International Conference on High Performance and

Smart Computing (HPSC), and IEEE International

Conference on Intelligent Data and Security (IDS),

pages 167–172. IEEE.

Yannis Kalantidis, Mert B

ulent Sariyildiz, N. P. P. W. and

Larlus, D. (2020). Hard negative mixing for con-

trastive learning. In Advances in Neural Information

Processing Systems 33: Annual Conference on Neural

Information Processing Systems 2020, NeurIPS 2020,

December 6-12, 2020, virtual. NeurIPS.

Yile Chen, Xiucheng Li, G. C. Z. B. C. L. Y. L. A. K. C.

and Ellison, R. (2021). Robust road network repre-

sentation learning: When trafﬁc patterns meet travel-

ing semantics. In Proceedings of the 30th ACM Inter-

national Conference on Information and Knowledge

Management, pages 211–220, Virtual Event, Queens-

land, Australia. Association for Computing Machin-

ery.

Yoshua Bengio, A. C. and Vincent, P. (2013). Representa-

tion learning: A review and new perspectives. IEEE

Transactions on Pattern Analysis and Machine Intel-

ligence, 35(8):1798–1828.

Yu Zheng, Lizhu Zhang, X. X. and Ma, W.-Y. (2009). Min-

ing interesting locations and travel sequences from

GPS trajectories. In Proceedings of the 18th Inter-

national Conference on World Wide Web, pages 791–

800, Madrid, Spain. Association for Computing Ma-

chinery.

Zhenyu Mao, Ziyue Li, D. L. L. B. and Zhao, R. (2022).

Jointly contrastive representation learning on road net-

work and trajectory. In Proceedings of the 31st ACM

International Conference on Information & Knowl-

edge Management, pages 1501–1510, New York, NY,

USA. Association for Computing Machinery.

Time-Aware Contrastive Representation Learning for Road Network and Trajectory

505