
Workshop on Building Analysis Datasets and Gather-
ing Experience Returns for Security, BADGERS ’12,
page 17–24.
Allodi, L. and Massacci, F. (2013). Poster: Analysis of
exploits in the wild. In IEEE Symposium on Security
& Privacy.
Bellon, S., Koschke, R., Antoniol, G., Krinke, J., and
Merlo, E. (2007). Comparison and evaluation of clone
detection tools. IEEE Transactions on Software Engi-
neering, 33(9):577–591.
Cao, A. and Dolan-Gavitt, B. (2022). What the fork? find-
ing and analyzing malware in github forks. In Proc.
of NDSS, volume 22.
Chien, O. K., Hoong, P. K., and Ho, C. C. (2014). A com-
parative study of hits vs pagerank algorithms for twit-
ter users analysis. In 2014 International Conference
on Computational Science and Technology (ICCST),
pages 1–6. IEEE.
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong,
M., Shou, L., Qin, B., Liu, T., Jiang, D., et al. (2020).
Codebert: A pre-trained model for programming and
natural languages. arXiv preprint arXiv:2002.08155.
Forum of Incident Response and Security Teams (FIRST)
(2023). Common vulnerability scoring system version
4.0: Specification document. Accessed on Jun 10th,
2024.
Gong, Q., Liu, Y., Zhang, J., Chen, Y., Li, Q., Xiao, Y.,
Wang, X., and Hui, P. (2023). Detecting malicious
accounts in online developer communities using deep
learning. IEEE Transactions on Knowledge and Data
Engineering, 35(10):10633–10649.
Gonzalez, D., Zimmermann, T., Godefroid, P., and Sch
¨
afer,
M. (2021). Anomalicious: automated detection
of anomalous and potentially malicious commits on
github. In Proceedings of the 43rd International Con-
ference on Software Engineering: Software Engineer-
ing in Practice, ICSE-SEIP ’21, page 258–267.
Gy
¨
ongyi, Z., Garcia-Molina, H., and Pedersen, J. (2004).
Combating web spam with trustrank. In Proceed-
ings of the Thirtieth International Conference on Very
Large Data Bases - Volume 30, VLDB ’04, page
576–587. VLDB Endowment.
Householder, A. D., Chrabaszcz, J., Novelly, T., Warren, D.,
and Spring, J. M. (2020). Historical analysis of exploit
availability timelines. In 13th USENIX Workshop on
Cyber Security Experimentation and Test (CSET 20).
USENIX Association.
Hu, Y., Zhang, J., Bai, X., Yu, S., and Yang, Z. (2016). In-
fluence analysis of github repositories. SpringerPlus,
5.
Jacobs, J., Romanosky, S., Suciu, O., Edwards, B., and
Sarabi, A. (2023). Enhancing vulnerability prioritiza-
tion: Data-driven exploit predictions with community-
driven insights.
Kornblum, J. (2006). Identifying almost identical files using
context triggered piecewise hashing. Digital Investi-
gation, 3:91–97.
lazyhope (2023). Pythonclonedetection. Accessed on Jun
10th, 2024.
Lopes, C. V., Maj, P., Martins, P., Saini, V., Yang, D., Zitny,
J., Sajnani, H., and Vitek, J. (2017). D
´
ej
`
avu: a map
of code duplicates on github. Proc. ACM Program.
Lang., 1(OOPSLA).
Neil, L., Mittal, S., and Joshi, A. (2018). Mining threat
intelligence about open-source projects and libraries
from code repository issues and bug reports. In 2018
IEEE International Conference on Intelligence and
Security Informatics (ISI), pages 7–12.
Nguyen, P., Rocco, J., Rubei, R., and Di Ruscio, D. (2020).
An automated approach to assess the similarity of
github repositories. Software Quality Journal, 28.
PoolC (2021). 1-fold-clone-detection-600k-5fold. Ac-
cessed on Jun 10th, 2024.
Rokon, M. O. F., Islam, R., Darki, A., Papalexakis, E. E.,
and Faloutsos, M. (2020). SourceFinder: Finding mal-
ware Source-Code from publicly available reposito-
ries in GitHub. In 23rd International Symposium on
Research in Attacks, Intrusions and Defenses (RAID
2020), pages 149–163, San Sebastian.
Sabottke, C., Suciu, O., and Dumitras, T. (2015). Vulner-
ability disclosure in the age of social media: Exploit-
ing twitter for predicting Real-World exploits. In 24th
USENIX Security Symposium (USENIX Security 15),
pages 1041–1056.
Sajnani, H., Saini, V., Svajlenko, J., Roy, C. K., and Lopes,
C. V. (2016). Sourcerercc: Scaling code clone de-
tection to big-code. In 2016 IEEE/ACM 38th Inter-
national Conference on Software Engineering (ICSE),
pages 1157–1168.
sangHa0411 (2022). Python clone detection. Accessed on
Jun 10th, 2024.
Schiappa, M., Chantry, G., and Garibay, I. (2019). Cy-
ber security in a complex community: A social me-
dia analysis on common vulnerabilities and exposures.
In 2019 Sixth International Conference on Social Net-
works Analysis, Management and Security (SNAMS),
pages 13–20.
Shrestha, P., Sathanur, A., Maharjan, S., Saldanha, E.,
Arendt, D., and Volkova, S. (2020). Multiple social
platforms reveal actionable signals for software vul-
nerability awareness: A study of github, twitter and
reddit. PLOS ONE, 15:e0230250.
Suciu, O., Nelson, C., Lyu, Z., Bao, T., and Dumitras, T.
(2022). Expected exploitability: Predicting the devel-
opment of functional vulnerability exploits. In 31st
USENIX Security Symposium (USENIX Security 22),
pages 377–394, Boston, MA. USENIX Association.
Wyss, E., De Carli, L., and Davidson, D. (2022). What
the fork? finding hidden code clones in npm. In
2022 IEEE/ACM 44th International Conference on
Software Engineering (ICSE), pages 2415–2426.
Yadmani, S. E., The, R., and Gadyatskaya, O. (2023). Be-
yond the surface: Investigating malicious cve proof of
concept exploits on github.
Zhang, Y. and Wang, T. (2021). Cceyes: An effective tool
for code clone detection on large-scale open source
repositories. In 2021 IEEE International Conference
on Information Communication and Software Engi-
neering (ICICSE), pages 61–70.
ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy
38