
tleneck along Brown line and hence more advanced
strategy should be enforced at these stations to im-
prove social distancing.
5 CONCLUSIONS
In this paper, we addressed the problem of multi-
class transit assignment in a congested network us-
ing multi-agent cooperative route guidance system
trained with reinforcement learning method. We
decompose the problem of assigning passengers to
routes as a decision making problem at each stop
and formulate it as Markov games. We then pro-
pose TransMARL, a multi-agent reinforcement learn-
ing method based on Proximal Policy Optimization
(PPO) algorithm with several adaptations, including
parameter sharing and a curriculum learning compo-
nent that speeds up the training process. Empirical
results show that in terms of solution quality, Trans-
MARL can successfully reduce overcrowding in crit-
ical routes by 57.79% with only 8 minutes time delay.
There are some improvements that could be in-
corporated to multi-agent actor critic model in or-
der to increase the overall performance. This work
has been mainly focused on the use of centralized
learning decentralized execution and parameter shar-
ing techniques, leaving the study of other approaches
to induce cooperative behavior outside the scope of
this paper. The following ideas could be tested:
1. The importance of inter-agent communication to
solve tasks that require synchronization has been
long studied. To achieve strong coordination, a
shared communication memory is used. Agents
then learn information sharing and extraction pro-
tocol through the shared memory (Pesce and
Montana, 2020).
2. In a partially observable environment, each
agent has no knowledge of other agents’
goal/destination. Consequently, the agents must
infer other agents’ hidden goal and policy in or-
der to solve the task. There are several ways to
do this: learn a separate representation of other
agents’ policy, use agent’s own policy to predict
other agents’ action, and learn other agents’ poli-
cies directly from other agents’ raw observation.
3. Even though the above-mentioned ideas could po-
tentially increase the overall performance signifi-
cantly, it remains a challenge to make those ap-
proaches scalable to large number of agents. One
possible solution is to estimate the degree of in-
fluence of other agents’ policy on current agent’s
reward and only agents with high degree of influ-
ence are taken into consideration (Jaques et al.,
2019).
The path allocation model developed in this work
mainly focuses on two user classes: guided and un-
guided agents. This is based on the assumption of ho-
mogeneous users. However, each passenger usually
has different route preferences and thus has different
utility cost function. For instance, different passen-
gers with same source and destination may choose
different routes based on their preferences (time-
efficient or congestion-free). An important direction
of future work is to develop a personalized route rec-
ommendation that takes individual route preferences
into account.
ACKNOWLEDGEMENTS
We thank department of transportation, Taipei City
Government for providing data of AFC. This study
was supported in part by NSTC, Taiwan Grants 112-
2221-E-259-016-MY3, MOST 111-2221-E-001-017-
MY3 and MOST 109-2327-B-010-005.
REFERENCES
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean,
J., Devin, M., Ghemawat, S., Irving, G., Isard, M.,
et al. (2016). Tensorflow: A system for large-scale
machine learning. In 12th USENIX symposium on op-
erating systems design and implementation (OSDI16),
pages 265–283.
Bengio, Y., Louradour, J., Collobert, R., and Weston, J.
(2009). Curriculum learning. In Proceedings of
the 26th annual international conference on machine
learning, pages 41–48.
Cascetta, E. and Cantarella, G. E. (1991). A day-to-day
and within-day dynamic stochastic assignment model.
Transportation Research Part A: General, 25(5):277–
291.
Currie, G. (2010). Quick and effective solution to rail over-
crowding: free early bird ticket experience in Mel-
bourne, Australia. Transportation research record,
2146(1):35–42.
Department of Transportation, T. C. G. (2019). Taipei MRT
Automatic Fare Collection Data. https://rnd.ntut.e
du.tw/p/406-1042-97509,r1647.php?Lang=zh-tw.
[Online; accessed 28-January-2024].
Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Or-
tega, P., Strouse, D., Leibo, J. Z., and De Freitas, N.
(2019). Social influence as intrinsic motivation for
multi-agent deep reinforcement learning. In Interna-
tional Conference on Machine Learning, pages 3040–
3049. PMLR.
Coordinated Route Recommendation for Improving Social Distancing in a Congested Subway Network
35