label the rest of the data with a more na
¨
ıve approach).
Nonetheless, the difference is very small and shows
improvements concerning the other methods tested.
In future works, we suggest exploring the secu-
rity implications of integrating personal information
into the PCFG models. As we have highlighted in
this work, language and culture are shapers of pass-
word structures. Thus, websites, and other stakehold-
ers that use passwords as an authentication method,
should consider the cultural patterns proper of their
users.
REFERENCES
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T.
(2016). Enriching word vectors with subword infor-
mation. CoRR, abs/1607.04606.
Cali
´
nski, T. and JA, H. (1974). A dendrite method for clus-
ter analysis. Communications in Statistics - Theory
and Methods, 3:1–27.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018).
BERT: pre-training of deep bidirectional transformers
for language understanding. CoRR, abs/1810.04805.
Generic Human (2012). Complexity dictionary.
https://stackoverflow.com/questions/8870261/how-
to-split-text-without-spaces-into-list-of-words.
Accessed: 2020-03-06.
Golla, M. and D
¨
urmuth, M. (2018). On the accuracy of
password strength meters. In Proceedings of the 2018
ACM SIGSAC Conference on Computer and Commu-
nications Security, CCS ’18, page 1567–1582, New
York, NY, USA. Association for Computing Machin-
ery.
Hitaj, B., Gasti, P., Ateniese, G., and P
´
erez-Cruz, F. (2017).
Passgan: A deep learning approach for password
guessing. CoRR, abs/1709.00440.
Hranick
´
y, R., Zobal, L., Ry
ˇ
sav
´
y, O., Kol
´
a
ˇ
r, D., and Miku
ˇ
s,
D. (2020). Distributed pcfg password cracking. In
Chen, L., Li, N., Liang, K., and Schneider, S., editors,
Computer Security – ESORICS 2020, pages 701–719,
Cham. Springer International Publishing.
Ma, J., Yang, W., Luo, M., and Li, N. (2014). A study of
probabilistic password models. In 2014 IEEE Sympo-
sium on Security and Privacy, pages 689–704.
Maoneke, P. B., Flowerday, S., and Isabirye, N. (2018). The
Influence of Native Language on Password Composi-
tion and Security: A Socioculture Theoretical View.
In Janczewski, L. J. and Kutylowski, M., editors, 33th
IFIP International Conference on ICT Systems Secu-
rity and Privacy Protection (SEC), volume AICT-529
of ICT Systems Security and Privacy Protection, pages
33–46, Poznan, Poland. Springer International Pub-
lishing. Part 1: Authentication.
Martin, L., Muller, B., Ortiz Su
´
arez, P. J., Dupont, Y., Ro-
mary, L., de la Clergerie,
´
E. V., Seddah, D., and Sagot,
B. (2020). Camembert: a tasty french language model.
In Proceedings of the 58th Annual Meeting of the As-
sociation for Computational Linguistics.
Melicher, W., Ur, B., Segreti, S. M., Komanduri, S., Bauer,
L., Christin, N., and Cranor, L. F. (2016). Fast, lean,
and accurate: Modeling password guessability using
neural networks. In 25th USENIX Security Symposium
(USENIX Security 16), pages 175–191, Austin, TX.
USENIX Association.
Mori, K., Watanabe, T., Zhou, Y., Akiyama Hasegawa,
A., Akiyama, M., and Mori, T. (2019). Compara-
tive analysis of three language spheres: Are linguistic
and cultural differences reflected in password selec-
tion habits? In 2019 IEEE European Symposium on
Security and Privacy Workshops (EuroS PW), pages
159–171.
Morris, R. and Thompson, K. (1979). Password security: A
case history. Commun. ACM, 22(11):594–597.
Princeton University (2010). About WordNet.
https://wordnet.princeton.edu/.
Rousseeuw, P. (1987). Rousseeuw, p.j.: Silhouettes:
A graphical aid to the interpretation and validation
of cluster analysis. comput. appl. math. 20, 53-65.
Journal of Computational and Applied Mathematics,
20:53–65.
Speer, R., Chin, J., Lin, A., Jewett, S., and Nathan, L.
(2018). Luminosoinsight/wordfreq: v2.2.
Thurner, S., Hanel, R., and Corominas-Murtra, B. (2014).
Understanding zipf’s law of word frequencies through
sample-space collapse in sentence formation. Journal
of the Royal Society, Interface / the Royal Society, 12.
Veras, R., Collins, C., and Thorpe, J. (2014). On the seman-
tic patterns of passwords and their security impact.
Wang, D., Wang, P., He, D., and Tian, Y. (2019). Birth-
day, name and bifacial-security: Understanding pass-
words of chinese web users. In 28th USENIX Security
Symposium (USENIX Security 19), pages 1537–1555,
Santa Clara, CA. USENIX Association.
Wang, D., Zhang, Z., Wang, P., Yan, J., and Huang, X.
(2016). Targeted online password guessing: An un-
derestimated threat.
Weir, M. (2017). Pcfg˙cracker. https://github.com/lakiw/
pcfg cracker. Accessed: 2020-03-03.
Weir, M., Aggarwal, S., de Medeiros, B., and Glodek,
B. (2009). Password cracking using probabilistic
context-free grammars. pages 391–405.
Wheeler, D. L. (2016). zxcvbn: Low-budget password
strength estimation. In 25th USENIX Security Sympo-
sium (USENIX Security 16), pages 157–173, Austin,
TX. USENIX Association.
Yu, S., Xu, C., and Liu, H. (2018). Zipf’s law in 50 lan-
guages: its structural pattern, linguistic interpretation,
and cognitive motivation.
Zipf, G. (1949). Human behavior and the principle of least
effort. Addison-Wesley, Cambridge.
Sociocultural Influences for Password Definition: An AI-based Study
549