THESIS
2020
1 online resource (xiv, 147 pages) : illustrations (some color)
Abstract
Nowadays, urban systems are widely deployed in many major cities, e.g. ride-sharing
system, express system, take-out food system, emergency medical service system, etc.
Having modernized and facilitated the daily life of citizens significantly, these systems
are facing severe operation challenges. For example, how to match passengers to drivers
in a ride-sharing system, how to dispatch couriers in real time in an express system, etc.
Previously, operation problems in urban systems are often tackled by methods in operation
research, e.g. optimization, or heuristic algorithms based on practical system settings.
For an urban system, as we often want to generate a sequence of real-time actions to maximize
the total reward in a long time, reinforcement learning is a proper choice. Besides,
a...[
Read more ]
Nowadays, urban systems are widely deployed in many major cities, e.g. ride-sharing
system, express system, take-out food system, emergency medical service system, etc.
Having modernized and facilitated the daily life of citizens significantly, these systems
are facing severe operation challenges. For example, how to match passengers to drivers
in a ride-sharing system, how to dispatch couriers in real time in an express system, etc.
Previously, operation problems in urban systems are often tackled by methods in operation
research, e.g. optimization, or heuristic algorithms based on practical system settings.
For an urban system, as we often want to generate a sequence of real-time actions to maximize
the total reward in a long time, reinforcement learning is a proper choice. Besides,
as the system is often large and complex, deep learning methods are necessary to capture
representative and enriched features of the environment.
In this thesis, we investigate how D̲eep R̲einforcement L̲earning, i.e. DRL, can effectively
learn operation policies for urban systems. For an urban system, according
to how it operates, C̲entral-A̲gent R̲einforcement L̲earning, i.e. CARL, or M̲ulti-A̲gent
R̲einforcement L̲earning, i.e. MARL, can be chosen to describe its operation process. For
a system whose operation is described by CARL, we focus on how to properly formulate
the problem and design each component of the model, i.e. the state, action, and immediate
reward, thus to optimize the final target of the system. We adopt the take-out food system
as an example and propose a D̲eep R̲einforcement O̲rder P̲acking model, i.e. DROP,
to solve the operation problem in it. For a system whose operation can be described by
MARL, besides designing each component of the model, we also try to guarantee that
agents in the system cooperate with each other properly. We adopt the express system as
an example, where there are many couriers working it, and propose a D̲eep R̲einforcement
C̲ourier D̲ispatching model, i.e. DRCD, to solve the operation problem in it. DRCD can
guarantee the cooperation among couriers to some extent but not globally, therefore, we
further propose a C̲ooperative M̲ulti-A̲gent R̲einforcement L̲earning model, i.e. CMARL,
to guarantee the cooperation among couriers globally by incorporating another Markov
Decision Process along the agent sequence. Experiments based on real-world data are
conducted to confirm the superiority of DROP, DRCD, and CMARL, compared with baselines.
In MARL, besides cooperation among agents, competition also exists, although it is
not common in modern urban systems. We briefly discuss about this scenario at the end
to make this thesis complete.
Post a Comment