ponent, as its performance would be measured in the
usability for the security researcher using our frame-
work. The best metric would be a usability study with
for example a derived SUS questionnaires and a group
of security researchers as participants. Sadly this was
not feasible in the scope of this project, so we concen-
trated on the constructive approach met in designing
the assessment module, regarding a final evaluation
also as future work. By design the framework con-
centrates on a few crucial features to determine the
harmfulness of a leak. Most users should agree that
a leak is more dangerous if more people than average
are affected. Also it is common sense that a leak-
age of clear text passwords or credit card informa-
tion is more critical than leaked password hashes. The
consideration whether some data types in our critical
category are more harmful than the other ones is not
part of this project, but could be researched in future
projects. It was the priority of the found framework
to clearly identify a leak that is definitely harmful and
give a warning via the web-interface and optionally
warn or inform its operator by email. Being assem-
bled in a pipeline, each stage relies on the work of
its predecessors. If no affected user count could be
detected, it could not be compared to the average af-
fected user count. Regarding that we implemented a
medium threat level area in which each user has to
decide on its own whether it is worth to look further
into it or not, configurable by an email at threat-level
threshold variable. This short coming is not rooted in
the process itself, but determined by the limited infor-
mation provided by the news articles in the first place.
8 CONCLUSION
Login credentials are crucial information about digital
identities of individuals. Attacks on service providers
regularly bring these identities into the unauthorized
hands of criminals. The information is then used for
identity theft or it is aggregated into larger collections
and sold accordingly. This paper contributes with a
new approach to identify digital identity leakage as
foundation for a larger framework to proactively in-
form affected individuals. Therefore, a wide variety
of newspapers and blogs are crawled on a regular ba-
sis. After the identification of relevant articles, the
substantial information is extracted and forwarded to
a security analyst. It is shown that relevant articles can
be reliably classified by in depth text analysis. Fur-
thermore, it is possible to extract required informa-
tion to identify the source of the leak and the amount
of identities affected. Running as a service, it steadily
monitors the Internet for related information with low
operational cost. Even though the described software
cannot substitute security analysts, it decisively sup-
ports them in tracking down identity leaks, be they
small or huge collections of stolen identities.
REFERENCES
Aitchison, J. and Dextre Clarke, S. (2004). The thesaurus: a
historical viewpoint, with a look to the future. In Cat-
aloging & Classification Quarterly, volume 37, pages
5–21.
Amazon (2019). Alexa top million. https://aws.amazon.
com/de/alexa-top-sites/. Last accessed 2019/12/17.
Benjamin, V., Li, W., Holt, T., and Chen, H. (2015). Explor-
ing threats and vulnerabilities in hacker web: Forums,
irc and carding shops. In 2015 IEEE International
Conference on Intelligence and Security Informatics
(ISI), pages 85–90.
DeBlasio, J., Savage, S., Voelker, G. M., and Snoeren,
A. C. (2017). Tripwire: Inferring internet site com-
promise. In Proceedings of the 2017 Internet Mea-
surement Conference, IMC ’17, pages 341–354, New
York, NY, USA. ACM.
Dictionary.com, LLC (2019). Thesaurus.com - synonyms
and antonyms of words. https://www.thesaurus.com.
Last accessed 2019/12/17.
Dobuch, G. (2019). 20-year-old german hacker con-
fesses in doxxing case. https://www.handelsblatt.com/
23841212.html. Last accessed 2019/12/17.
Grisham, J., Samtani, S., Patton, M., and Chen, H. (2017).
Identifying mobile malware and key threat actors in
online hacker forums for proactive cyber threat intel-
ligence. In 2017 IEEE International Conference on
Intelligence and Security Informatics (ISI), pages 13–
18.
Gruss, D., Schwarz, M., W
¨
ubbeling, M., Guggi, S.,
Malderle, T., More, S., and Lipp, M. (2018). Use-
after-FreeMail: Generalizing the use-after-free prob-
lem and applying it to email services. ASIACCS 2018
- Proceedings of the 2018 ACM Asia Conference on
Computer and Communications Security, pages 297–
311.
Han, W., Li, Z., Ni, M., Gu, G., and Xu, W. (2016). Shadow
Attacks based on Password Reuses: A Quantitative
Empirical View. IEEE Transactions on Dependable
and Secure Computing, X(X):1–1.
Hasso-Plattner-Institut f
¨
ur Digital Engineering gGmbH
(2017). HPI Leak Checker. https://sec.hpi.de/
leak-checker. Last accessed 2019/12/17.
Hunt, T. (2017). have i been pwned? https://
haveibeenpwned.com. Last accessed 2019/12/17.
Husari, G., Al-Shaer, E., Ahmed, M., Chu, B., and Niu, X.
(2017). Ttpdrill: Automatic and accurate extraction
of threat actions from unstructured text of cti sources.
In Proceedings of the 33rd Annual Computer Security
Applications Conference, ACSAC 2017, pages 103–
115, New York, NY, USA. ACM.
Track Down Identity Leaks using Threat Intelligence
105