articles. In this case, the CRF model outperforms the
other two models in terms of average precision, recall
and f-measure over all tag classes. Table 6 presents
performance results when the models are trained and
tested on both the title as well as the first paragraph
of each news article. As can be observed from the ta-
ble, the BERT model outperforms the BiLSTM and
the CRF models in terms of both average recall and
average f-measure.
Looking at the three tables together, the overall
performance for the CRF model and the BiLSTM one
is best when extracting the information from the ti-
tles only. This is intuitive given that titles are short
and concise compared to the full text of the article,
which typically contains a large body of information
that does not directly relate to the incident. In con-
trast, the best performance for the BERT model is at-
tained when the information is extracted from the full
text, except when it comes to precision. It is also evi-
dent that when testing on the title and first paragraph,
which typically reveal most of the relevant informa-
tion for an incident earlier on, the performance is bet-
ter than in the case of testing on the full content of the
article, except for the BERT model.
When investigating the individual tags, in the case
of using the full text (Table 4), the highest perfor-
mance is obtained for the LOC (location of the in-
cident) and DAT (date of the incident) tags for both
the BiLSTM and the CRF models. As for the BERT
model, the highest performance is obtained for the
NCV tag (number of non-civilians involved in the in-
cident), followed by the DAT tag. On the other hand,
when using the titles only or the titles and the first
paragraph (Table 5 and Table 6 ), the ACT (actor)
and the LOC tags are the ones that achieved the high-
est performance for the CRF model. For the BiL-
STM model, the tags that performed the best are the
CHD (number of children casualties) and LOC. Fi-
nally, in the case of the BERT model, the tags CIV
(number of civilian casualties) and NCV are the ones
that achieved the highest performance.
5 CONCLUSION
In this paper, we described TAGWAR, a dataset con-
sisting of 804 news articles about the Syrian war that
were manually sequence tagged. We then used TAG-
WAR to train and test three deep learning based se-
quence tagging models to automatically tag news ar-
ticles, which included a BiLSTM model, a BERT
model and a CRF model. Overall, the BERT model
performed best when trained and tested on TAGWAR.
Moreover, all models with the exception of BERT,
performed better when trained on the titles only, as
well as on the titles and first paragraph, as opposed to
the full content of the news articles. BERT, in con-
trast, was not significantly sensitive to this aspect of
selective training.
We perceive our work to be able to pave the way
towards automatic fake news detection around the
Syrian conflict. In particular, we plan to deploy, ex-
tend, and hone our information extraction models on
a large dataset of curated news articles around the
Syrian war in order to automatically extract various
pieces of information around war incidents. This can
in turn form the basis of a robust, end-to-end fact
checking pipeline that can allow validating sequence
tagged news articles against witness databases such as
the Violations Data Center (VDC)
1
, one of the lead-
ing repositories documenting the human burden of the
Syrian war.
REFERENCES
Abu Salem, F. K., Al Feel, R., Elbassuoni, S., Jaber, M., and
Farah, M. (2019). Fa-kes: A fake news dataset around
the syrian war. Proceedings of the International AAAI
Conference on Web and Social Media, 13(01):573–
582.
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak,
R., and Ives, Z. (2007). Dbpedia: A nucleus for a web
of open data. In The semantic web, pages 722–735.
Springer.
Chen, Y., Xu, L., Liu, K., Zeng, D., and Zhao, J. (2015).
Event extraction via dynamic multi-pooling convolu-
tional neural networks. In Proceedings of the 53rd
Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference
on Natural Language Processing (Volume 1: Long Pa-
pers), pages 167–176, Beijing, China. Association for
Computational Linguistics.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018).
BERT: pre-training of deep bidirectional transformers
for language understanding. CoRR, abs/1810.04805.
Doddington, G. R., Mitchell, A., Przybocki, M. A.,
Ramshaw, L. A., Strassel, S. M., and Weischedel,
R. M. (2004). The automatic content extraction (ace)
program-tasks, data, and evaluation. In Lrec, vol-
ume 2, pages 837–840. Lisbon.
Faiz, R. (2006). Identifying relevant sentences in news
articles for event information extraction. Interna-
tional Journal of Computer Processing of Oriental
Languages, 19(01):1–19.
Gashteovski, K., Wanner, S., Hertling, S., Broscheit, S., and
Gemulla, R. (2019). OPIEC: an open information ex-
traction corpus. CoRR, abs/1904.12324.
Hamborg, F., Lachnit, S., Schubotz, M., Hepp, T., and Gipp,
B. (2018). Giveme5w: main event retrieval from news
1
https://vdc-sy.net/en/
TAGWAR: An Annotated Corpus for Sequence Tagging of War Incidents
249