bag-of-words logistic regression, yet it failed to out-
perform deep learning models.
Further, in the LOGO-CV setup, we observed
that removing groups with large numbers of docu-
ments such as Al-Boraq or Alarabiya significantly
boosted the predictive performance of the opposite
class. However, assuming that we will not know the
class label of the testing group, we cannot determine
which groups to exclude from the training. We plan
to extend this work to explore different ways to au-
tomatically select training data such as selecting the
top k similar documents for every testing document
or the top k groups with highest in-group similarity
variance. We would also like to implement different
data-driven ensemble models such as learning a new
Logistic regression that take the predicted probabili-
ties of the individual models as predictors.
Predicting Violent Behavior using Language Agnostic Models