generation of a bigger synthetic dataset, which share
similarities with the 49,800 neighbourhoods. We also
plan to integrate prices and points of interest (that
could reflect the nature of a neighbourhood, for in-
stance an organic shop is usually found in middle or
upper class neighbourhoods).A fourth perspective is
the correlation between variables, which are not to-
tally independent. For instance, a rural area has more
chances to be classified as countryside and to host
houses. The prediction of a given variable could im-
pact the classification of others, especially the most
difficult ones such as geographical or social. Finally,
we plan to release a tool named predihood for letting
researchers implement and test their classification al-
gorithms on our dataset.
ACKNOWLEDGEMENTS
This work has been partially funded by LABEX
IMU (ANR-10-LABX-0088) from Université de
Lyon, in the context of the program "Investissements
d’Avenir" (ANR-11-IDEX-0007) from the French
Research Agency (ANR), during the HiL project
15
.
REFERENCES
Barret, N., Duchateau, F., Favetta, F., Miquel, M., Gen-
til, A., and Bonneval, L. (2019). À la recherche du
quartier idéal. In EGC, page 429–432.
Bellahsène, Z., Bonifati, A., and Rahm, E. (2011). Schema
matching and mapping. Springer.
Bigot, R., Croutte, P., Müller, J., and Osier, G. (2011). Les
classes moyennes en europe. Le CRÉDOC, Cahier de
recherche, 282.
Bonneval, L., Duchateau, F., Favetta, F., Gentil, A., Jelassi,
M. N., Miquel, M., and Moncla, L. (2019). Étude des
quartiers : défis et pistes de recherche. In EGC.
Bruce, P. and Bruce, A. (2017). Practical Statistics for Data
Scientists: 50 Essential Concepts. O’Reilly.
Caragliu, A., Del Bo, C., and Nijkamp, P. (2011). Smart
cities in europe. J. of urban technology, 18(2):65–82.
Christen, P. (2012). Data matching: concepts and tech-
niques for record linkage, entity resolution, and dupli-
cate detection. Springer Science & Business Media.
Cranshaw, J., Schwartz, R., Hong, J., and Sadeh, N. (2012).
The livehoods project: Utilizing social media to un-
derstand the dynamics of a city. In AAAI Conference
on Weblogs and Social Media.
Delmelle, E. C. (2015). Five decades of neighborhood clas-
sifications and their transitions: A comparison of four
us cities, 1970–2010. Applied Geography, 57:1 – 11.
15
http://imu.universite-lyon.fr/projet/hil/
Dhar, V. (2013). Data science and prediction. Communica-
tions of the ACM, 56(12):64–73.
Donoho, D. (2017). 50 years of data science. J. of Compu-
tational and Graphical Statistics, 26(4):745–766.
Guyon, I. and Elisseeff, A. (2003). An introduction to vari-
able and feature selection. J. of Machine Learning
Research, 3(3):1157–1182.
Halevy, A., Rajaraman, A., and Ordille, J. (2006). Data
integration: the teenage years. In VLDB, pages 9–16.
Jordan, M. I. and Mitchell, T. M. (2015). Machine learn-
ing: Trends, perspectives, and prospects. Science,
349(6245):255–260.
Le Falher, G., Gionis, A., and Mathioudakis, M. (2015).
Where Is the Soho of Rome? Measures and Algo-
rithms for Finding Similar Neighborhoods in Cities.
ICWSM, 2:3–2.
Lillesand, T., Kiefer, R. W., and Chipman, J. (2015). Re-
mote sensing and image interpretation. Wiley & Sons.
Mukaka, M. M. (2012). A guide to appropriate use of corre-
lation coefficient in medical research. Malawi medical
journal, 24(3):69–71.
Ojo, A., Curry, E., and Zeleti, F. A. (2015). A tale of open
data innovations in five smart cities. In Int. Conf. on
System Sciences, pages 2326–2335. IEEE.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., and Duch-
esnay, E. (2011). Scikit-learn: Machine learning in
Python. J. of Machine Learning Research, 12:2825–
2830.
Shen, W., Wang, J., and Han, J. (2015). Entity linking with
a knowledge base: Issues, techniques, and solutions.
TKDE, 27(2):443–460.
Tang, E. and Sangani, K. (2015). Neighborhood and price
prediction for san francisco airbnb listings.
Yu, M., Li, G., Deng, D., and Feng, J. (2016). String simi-
larity search and join: a survey. Frontiers of Computer
Science, 10(3):399–417.
Yuan, X., Lee, J.-H., Kim, S.-J., and Kim, Y.-H. (2013). To-
ward a user-oriented recommendation system for real
estate websites. Information Systems, 38(2):231–243.
Zhang, A. X., Noulas, A., Scellato, S., and Mascolo, C.
(2013). Hoodsquare: Modeling and recommending
neighborhoods in location-based social networks. In
Social Computing, pages 69–74. IEEE.
Predicting the Environment of a Neighborhood: A Use Case for France
301