However, we should not exclude the possibility
that the DQN problems could be related to the chosen
hyperparameters, especially of the convNets. Maybe
choosing larger filters could contract such problem.
Moreover, assuming that the DQN has an excellent
solution in itself in another different implementation,
maybe adding a wide component would improve nei-
ther the training speed nor the results.
Conclusively, we believe that integrating a good
wide component to the WDQN model can be the rea-
son for a substantial speedup of learning. Adding a
deep component to a linear agent could improve its
linear limitation considerably by converting it into
a non-linear model. Precautions are needed when
choosing which features to integrate into the com-
bined agent. Lastly, a favorable wide component can
compensate for the difficulties of the deep component
to learn from insufficient examples.
6 CONCLUSION
Our research shows that the WDQN agents can out-
perform linear and DQN agents in score, winning rate
and learning speed. The chosen features also play a
role in achieving these results. However, there can
be learning limitations depending on the selected fea-
ture(s). The research demonstrates that combining a
neural network with a linear agent helps improve re-
sults by allowing the model to learn non-linear rela-
tionships while adding information about the interac-
tion between specific features, while also making the
agent adaptable to uncertainty. Furthermore, the wide
component can complement the weaknesses of a non-
linear agent by helping the agent learn faster and con-
centrate on finding less obvious important features.
Our method is straightforward and employable for
various deep reinforcement contexts. For real-world
implementations such as robotics, the combination of
linear and non-linear functions in our Wide and Deep
Reinforcement Learning provides an interesting tool
for integrating new devices like sensors in the form
of features into DRL agents, or for including better
expert knowledge with human chosen features. Fu-
ture work could look into extending WDQN to in-
clude newer DQN-related algorithms and developing
methods that make implementing WDQNs easier; for
example, automatically setting the learning rate of the
wide component from the deep component’s to re-
duce the number of hyperparameters. In addition, one
could research how to ensure that the influence of neg-
ative features of the wide component can be overrid-
den by the deep component.
ACKNOWLEDGEMENTS
We thank our colleagues from the Chair for Bioinfor-
matics and Information Mining of the University of
Konstanz, who provided insight and corrections that
greatly assisted the research. Especially, we are grate-
ful to Christoph Doell and Benjamin Koger for the
continuous assistance with the paper’s structure and
the engineering of the deep neural networks.
REFERENCES
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra,
T., Aradhye, H., Anderson, G., Corrado, G., Chai,
W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V.,
Liu, X., and Shah, H. (2016). Wide & Deep Learning
for Recommender Systems. In Proceedings of the 1st
Workshop on Deep Learning for Recommender Sys-
tems, DLRS 2016, pages 7–10, New York, NY, USA.
ACM.
DeNero, J. and Klein, D. (2010). Teaching Introductory
Artificial Intelligence with Pac-Man. Proceedings of
the Symposium on Educational Advances in Artificial
Intelligence, pages 1885–1889.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep
Learning. MIT Press.
Henderson, P., Islam, R., Bachman, P., Pineau, J., Pre-
cup, D., and Meger, D. (2018). Deep Reinforcement
Learning that Matters. In Proceedings of the Thirtieth-
Second AAAI Conference on Artificial Intelligence,
AAAI’18. AAAI Press.
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog,
A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M.,
Vanhoucke, V., and Levine, S. (2018). QT-Opt: Scal-
able Deep Reinforcement Learning for Vision-Based
Robotic. CoRR, abs/1806.10293.
Kim, H. J., Jordan, M. I., Sastry, S., and Ng, A. Y.
(2004). Autonomous Helicopter Flight via Rein-
forcement Learning. In Thrun, S., Saul, L. K., and
Sch
¨
olkopf, B., editors, Advances in Neural Infor-
mation Processing Systems 16, pages 799–806. MIT
Press.
Lin, L.-J. (1992). Self-Improving Reactive Agents Based
on Reinforcement Learning, Planning and Teaching.
Machine Learning, 8(3):293–321.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., Graves, A., Riedmiller,
M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beat-
tie, C., Sadik, A., Antonoglou, I., King, H., Kumaran,
D., Wierstra, D., Legg, S., and Hassabis, D. (2015).
Human-Level Control through Deep Reinforcement
Learning. Nature, 518(7540):529–533.
Russell, S. J. and Norvig, P. (2003). Artificial Intelligence:
A Modern Approach. Pearson Education, 3 edition.
Sutton, R. S. and Barto, A. G. (2018). Introduction to Rein-
forcement Learning. Working Second Edition.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
58