
tag assigned by the original tagger would be kept. In
some cases, the rules could involve more generic pat-
terns and associated tag assignments. Consider, for
example, the phrase . . . a lousy quack ! (see Table 2).
In general, the word quack is either a noun or a verb,
and here evidently a noun. This case would be han-
dled correctly by a rule such as DET ADJ {NOUN
or VERB} ⇒ DET ADJ NOUN, where, in this case,
the rule should only be applied at the end of a sen-
tence. The curly brackets indicate that, from dictio-
nary information (rather than just training data), the
only possible tags for the last word are as listed. Simi-
larly, this rule would also handle phrases such as . . . a
bitter harvest (if it ends a sentence), where harvest
could be a verb or a noun, but in this case is a noun.
7 CONCLUSION
In this paper, we have introduced a novel, challenging
test set for POS tagging, with a single tagged word per
sentence. In the development of the data set, we delib-
erately chose cases where at least one of four standard
benchmark POS taggers fails to assign correct POS
tags. We then applied a state-of-the-art DNN-based
POS tagger to our data set, for a true out-of-sample
test, and found a considerable drop in accuracy, from
around 0.95 − 0.97 over standard POS data sets (in
line with reported values in the literature) to around
0.87 over our set, thus illustrating that POS tagging
still presents significant challenges. Importantly, in
our new data set, we explicitly removed ambiguous
cases, so that linguistic ambiguity cannot be applied
as an explanation for tagger failure. Indeed, as our
analysis shows, we find many cases where the POS
tagging is quite straightforward, but where both the
four benchmark taggers, and the DNN-based tagger
(albeit to a lesser degree), nevertheless fail to assign a
correct tag.
REFERENCES
Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual
string embeddings for sequence labeling. In Proceed-
ings of the 27th international conference on computa-
tional linguistics, pages 1638–1649.
Bird, S., Klein, E., and Loper, E. (2009). Natural language
processing with Python: analyzing text with the natu-
ral language toolkit. O’Reilly Media, Inc.
Brants, T. (2000). Tnt: A statistical part-of-speech tag-
ger. In Proceedings of the Sixth Conference on Ap-
plied Natural Language Processing, ANLC ’00, page
224–231, USA. Association for Computational Lin-
guistics.
Brill, E. (1992). A simple rule-based part of speech tag-
ger. In Proc. of the Third Conference on Applied Nat-
ural Language Processing, ANLC ’92, page 152–155,
USA. Association for Computational Linguistics.
Chiche, A. and Yitagesu, B. (2022). Part of speech tagging:
a systematic review of deep learning and machine
learning approaches. Journal of Big Data, 9(1):1–25.
Collins, M. (2002). Discriminative training methods for
hidden markov models: Theory and experiments with
perceptron algorithms. In Proceedings of the 2002
conference on empirical methods in natural language
processing (EMNLP 2002), pages 1–8.
Francis, W. N. and Kucera, H. (1979). Brown corpus
manual. Technical report, Department of Linguistics,
Brown University, Providence, Rhode Island, US.
Hal
´
acsy, P., Kornai, A., and Oravecz, C. (2007). Poster pa-
per: HunPos – an open source trigram tagger. In Proc.
of the 45th Annual Meeting of the Association for
Computational Linguistics Companion Volume Pro-
ceedings of the Demo and Poster Sessions, pages 209–
212, Prague, Czech Republic. Association for Compu-
tational Linguistics.
Ling, W., Lu
´
ıs, T., Marujo, L., Astudillo, R. F., Amir, S.,
Dyer, C., Black, A. W., and Trancoso, I. (2015). Find-
ing function in form: Compositional character mod-
els for open vocabulary word representation. arXiv
preprint arXiv:1508.02096.
Liu, S. and Ritter, A. (2022). Do CoNLL-2003 Named En-
tity Taggers Still Work Well in 2023? arXiv preprint
arXiv:2212.09747.
Manning, C. D. (2011). Part-of-speech tagging from 97%
to 100%: Is it time for some linguistics? In Gelbukh,
A. F., editor, Computational Linguistics and Intelli-
gent Text Processing, pages 171–189, Berlin, Heidel-
berg. Springer Berlin Heidelberg.
Marcus, M. P., Santorini, B., Marcinkiewicz, M. A., and
Taylor, A. (1999). Treebank-3. Linguistic Data Con-
sortium, Philadelphia.
Petrov, S., Das, D., and McDonald, R. (2012). A universal
part-of-speech tagset. In Proceedings of the Eighth In-
ternational Conference on Language Resources and
Evaluation (LREC’12), pages 2089–2096, Istanbul,
Turkey. European Language Resources Association
(ELRA).
Ramshaw, L. A. and Marcus, M. P. (1999). Text chunk-
ing using transformation-based learning. In Natural
language processing using very large corpora, pages
157–176. Springer.
Toutanova, K., Klein, D., Manning, C. D., and Singer, Y.
(2003). Feature-rich part-of-speech tagging with a
cyclic dependency network. In Proceedings of the
2003 Human Language Technology Conference of the
North American Chapter of the Association for Com-
putational Linguistics, pages 252–259.
Wu, Z., Deshmukh, A. A., Wu, Y., Lin, J., and Mou, L.
(2023). Unsupervised Chunking with Hierarchical
RNN. arXiv preprint arXiv:2309.04919.
Yasunaga, M., Kasai, J., and Radev, D. (2017). Robust mul-
tilingual part-of-speech tagging via adversarial train-
ing. arXiv preprint arXiv:1711.04903.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
86