into computational overhead that dynamic analysis
originates (Chang et al., 2008).
8 CONCLUSIONS
Our work puts a spotlight on the problem of discover-
ing malicious and vulnerable browser extensions by
detecting duplication. To address the problem, we
presented DeDup.js, an approach that incorporates
similarity analysis for achieving two goals: detecting
potentially malicious extensions during the approval
process and discovering malicious extensions.
We implemented and deployed an instance of
DeDup.js and analyzed more than 422k browser exten-
sions stored in the Web Store over a year. In summary,
DeDup.js: 1) detected more than 7k extensions that
should not have been published in the Web Store. Also,
we found more than 1k malicious extensions still on-
line that send user’s queries to external servers without
the user’s knowledge, and; 2) detected 53 malicious
extensions of which 36 Google has already taken down
and the rest are investigated. We did so by using as
input 17 already known malicious extensions IDs, thus
demonstrating how DeDup.js can change the game of
malware detection in browser extensions.
ACKNOWLEDGMENTS
This work was partially supported by the Swedish
Foundation for Strategic Research (SSF), the Swedish
Research Council (VR), and Facebook.
REFERENCES
Avast (2021). Backdoored browser extensions hid malicious
traffic in analytics requests. https://decoded.avast.io/j
anvojtesek/backdoored-browser-extensions-hid-mal
icious-traffic-in-analytics-requests/.
Badihi, S., Akinotcho, F., Li, Y., and Rubin, J. (2020). Ardiff:
Scaling program equivalence checking via iterative
abstraction and refinement of common code. In FSE.
Bowman, B. and Huang, H. H. (2020). Vgraph: A robust
vulnerable code clone detection system using code
property triplets. In Euro S&P.
Calleja, A., Tapiador, J., and Caballero, J. (2019). The
malsource dataset: Quantifying complexity and code
reuse in malware development. IEEE Transactions on
Information Forensics and Security, 14(12).
Chang, W. and Chen, S. (2016). Extensionguard: Towards
runtime browser extension information leakage detec-
tion. In CNS, pages 154–162.
Chang, W., Streiff, B., and Lin, C. (2008). Efficient and
extensible security enforcement using dynamic data
flow analysis. In CCS, page 39–50.
Chen, Q. and Kapravelos, A. (2018). Mystique: Uncovering
information leakage from browser extensions. In CCS,
page 1687–1700.
Cheung, W. T., Ryu, S., and Kim, S. (2016). Development
nature matters: An empirical study of code clones in
javascript applications. Empirical Softw. Engg., 21(2).
Dev.Opera (2021). Publishing guidelines. https://dev.opera.
com/extensions/publishing-guidelines/.
Extension Monitor (2021). Breaking down the chrome web
store. https://extensionmonitor.com/blog/breaking-d
own-the-chrome-web-store-part-1.
Gabel, M. and Su, Z. (2010). A study of the uniqueness of
source code. In FSE.
Google (2021a). Frequently asked questions. https://develo
per.chrome.com/docs/webstore/faq/#faq-listing-108.
Google (2021b). Program policies. https://developer.chro
me.com/docs/webstore/program_policies/.
Google (2021c). Shared modules. https://developer.chrome
.com/apps/shared_modules.
Google (2021d). Unwanted software policy. https://www.go
ogle.com/about/unwanted-software-policy.html.
Google (2021e). What types of apps or extensions are not
allowed in the store? https://developer.chrome.com/d
ocs/webstore/faq/#faq-gen-22.
Kamiya, T., Kusumoto, S., and Inoue, K. (2002). Ccfinder:
a multilinguistic token-based code clone detection sys-
tem for large scale source code. IEEE Transactions on
Software Engineering, 28(7):654–670.
Kapravelos, A., Grier, C., Chachra, N., Kruegel, C., Vigna,
G., and Paxson, V. (2014). Hulk: Eliciting malicious
behavior in browser extensions. In USENIX.
Kim, S., Woo, S., Lee, H., and Oh, H. (2017). Vuddy: A
scalable approach for vulnerable code clone discovery.
In S&P.
Kornblum, J. (2006). Identifying almost identical files using
context triggered piecewise hashing. Digital Investiga-
tion, 3:91–97.
Li, Z., Zou, D., Xu, S., Jin, H., Qi, H., and Hu, J. (2016).
Vulpecker: An automated vulnerability detection sys-
tem based on code similarity analysis. In ACSAC.
Lopes, C. V., Maj, P., Martins, P., Saini, V., Yang, D., Zitny,
J., Sajnani, H., and Vitek, J. (2017). Déjàvu: A map of
code duplicates on github. ACM Program. Lang.
nic.cz (2021). Hledání škodlivého kódu mezi do-
pl
ˇ
nky. https://blog.nic.cz/2020/11/19/hledani-skodliv
eho-kodu-mezi-doplnky/.
Pantelaios, N., Nikiforakis, N., and Kapravelos, A. (2020).
You’ve Changed: Detecting Malicious Browser Exten-
sions through their Update Deltas. In CCS.
Roussev, V. (2010). Data fingerprinting with similarity di-
gests. In Chow, K.-P. and Shenoi, S., editors, Advances
in Digital Forensics VI, pages 207–226.
Roy, C. K., Cordy, J. R., and Koschke, R. (2009). Compari-
son and evaluation of code clone detection techniques
and tools: A qualitative approach. Science of Computer
Programming, 74(7):470 – 495.
Vislavski, T., Raki
´
c, G., Cardozo, N., and Budimac, Z.
(2018). Licca: A tool for cross-language clone de-
tection. In SANER, pages 512–516.
DeDup.js: Discovering Malicious and Vulnerable Extensions by Detecting Duplication
535