Ethayarajh, K. (2019). How Contextual are Contextualized
Word Representations? Comparing the Geometry of
BERT, ELMo, and GPT-2 Embeddings. In Proceed-
ings of the 2019 Conference on Empirical Methods
in Natural Language Processing (EMNLP-IJCNLP),
pages 55–65.
Ettinger, A. (2020). What bert is not: Lessons from a
new suiteof psycholinguistic diagnostics for language
models. TACL, 8:34–48.
Gamallo, P. (2017). The role of syntactic dependencies in
compositional distributional semantics. Corpus Lin-
guistics and Linguistic Theory, 13(2):261–289.
Gamallo, P. (2019). A dependency-based approach to word
contextualization using compositional distributional
semantics. Language Modelling, 7(1):53–92.
Gamallo, P. and Garcia, M. (2018). Dependency parsing
with finite state transducers and compression rules.
Information Processing & Management, 54(6):1244–
1261.
Gamallo, P., Sotelo, S., Pichel, J. R., and Artetxe,
M. (2019). Contextualized translations of phrasal
verbs with distributional compositional semantics and
monolingual corpora. Computational Linguistics,
45(3):395–421.
Goldberg, Y. (2019). Assessing bert’s syntactic abilities.
CoRR, abs/1901.05287.
Grefenstette, E. and Sadrzadeh, M. (2011a). Experimental
support for a categorical compositional distributional
model of meaning. In Conference on Empirical Meth-
ods in Natural Language Processing (EMNLP 2011),
pages 1394–1404.
Grefenstette, E. and Sadrzadeh, M. (2011b). Experimenting
with transitive verbs in a discocat. In Workshop on
Geometrical Models of Natural Language Semantics
(EMNLP 2011).
Hashimoto, K., Stenetorp, P., Miwa, M., and Tsuruoka,
Y. (2014). Jointly learning word representations
and composition functions using predicate-argument
structures. In Proceedings of the 2014 Conference on
Empirical Methods in Natural Language Processing
(EMNLP 2014), pages 1544–1555. ACL.
Hashimoto, K. and Tsuruoka, Y. (2015). Learning embed-
dings for transitive verb disambiguation by implicit
tensor factorization. In Proceedings of the 3rd Work-
shop on Continuous Vector Space Models and their
Compositionality, pages 1–11, Beijing, China. ACL.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). Roberta: A robustly optimized BERT pre-
training approach. CoRR, abs/1907.11692.
Mikolov, T., Yih, W.-t., and Zweig, G. (2013). Linguistic
regularities in continuous space word representations.
In Proceedings of the 2013 Conference of the North
American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies
(NAACL 2013), pages 746–751, Atlanta, Georgia.
Mitchell, J. and Lapata, M. (2008). Vector-based mod-
els of semantic composition. In Proceedings of the
Association for Computational Linguistics: Human
Language Technologies (ACL 2008), pages 236–244,
Columbus, Ohio.
Mitchell, J. and Lapata, M. (2009). Language models
based on semantic composition. In Proceedings of
Empirical Methods in Natural Language Processing
(EMNLP-2009), pages 430–439.
Mitchell, J. and Lapata, M. (2010). Composition in dis-
tributional models of semantics. Cognitive Science,
34(8):1388–1439.
Partee, B. (2007). Private adjectives: Subsective plus coer-
cion. In B
¨
auerle, R., Reyle, U., and Zimmermann,
T. E., editors, Presuppositions and Discourse. El-
sevier.
Pilehvar, M. T. and Camacho-Collados, J. (2019). WiC:
the word-in-context dataset for evaluating context-
sensitive meaning representations. In Proceedings of
the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Hu-
man Language Technologies (NAACL 2019), pages
1267–1273. ACL.
Polajnar, T., Rimell, L., and Clark, S. (2015). An explora-
tion of discourse-based sentence spaces for compos-
itional distributional semantics. In Proceedings of
the First Workshop on Linking Computational Models
of Lexical, Sentential and Discourse-level Semantics,
pages 1–11. Association for Computational Linguist-
ics.
Reimers, N. and Gurevych, I. (2019). Sentence-BERT:
Sentence embeddings using Siamese BERT-networks.
In Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing (EMNLP-
IJCNLP).
Rogers, A., Kovaleva, O., and Rumshisky, A. (2020). A
primer in bertology: What we know about how BERT
works. CoRR, abs/2002.12327.
Sachan, D. S., Zhang, Y., Qi, P., and Hamilton, W. (2020).
Do syntax trees help pre-trained transformers extract
information? arXiv preprint: 2008.09084.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2020).
Distilbert, a distilled version of bert: smaller, faster,
cheaper and lighter.
Turney, P. D. (2013). Domain and function: A dual-
space model of semantic relations and compositions.
Journal of Artificial Intelligence Research (JAIR),
44:533–585.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I.
(2017). Attention is all you need. In Advances in
Neural Information Processing Systems, volume 30,
pages 5998–6008. Curran Associates, Inc.
Weir, D. J., Weeds, J., Reffin, J., and Kober, T. (2016).
Aligning packed dependency trees: A theory of com-
position for distributional semantics. Computational
Linguistics, 42(4):727–761.
Yu, L. and Ettinger, A. (2020). Assessing phrasal repres-
entation and composition in transformers. In Pro-
ceedings of the 2020 Conference on Empirical Meth-
ods in Natural Language Processing (EMNLP), pages
4896–4907, Online. Association for Computational
Linguistics.
Comparing Dependency-based Compositional Models with Contextualized Word Embeddings
1265