Table 3: Performance comparison of proposed TM framework with cluster and outlier-based novelty detection algorithms.
Algorithms 20 Newsgroup Spooky action author CMU movie BBC sports WOS
LOF 52.51 % 50.66 % 48.84 % 47.97 % 55.61 %
Feature Bagging 67.60 % 62.70 % 64.73 % 54.38 % 69.64 %
HBOS 55.03 % 48.55 % 48.57 % 49.53 % 55.09 %
Isolation Forest 52.01 % 48.66% 49.10 % 49.35% 54.70 %
Average KNN 76.35 % 57.76 % 56.21 % 55.54 % 79.22 %
K-Means clustering 81.00 % 61.30 % 49.20 % 47.70 % 41.31 %
One-class SVM 83.70 % 43.56 % 51.94 % 83.53 % 36.32 %
TM framework 82.50 % 63.15% 68.15 % 89.47 % 70.37 %
that our framework surpasses the other algorithms on
three of the datasets and performs competitively in the
remaining two. However, in datasets like Web of Sci-
ence, where there are many similar words shared be-
tween known and novel classes, our method is sur-
prisingly surpassed by the distance-based algorithm
(i.e., Average KNN). One-class SVM closely follows
the performance of our TM framework, which may be
due to its linear structure that prevents overfitting on
imbalanced and small datasets.
5 CONCLUSION
In this paper, we studied the problem of novelty de-
tection in multiclass text classification. We proposed
a score-based TM framework for novel class detec-
tion. We first used the clauses of the TM to pro-
duce a novelty score, distinguishing between known
and novel classes. Then, a machine learning classifier
is adopted for novelty classification using the novelty
scores provided by the TM. The experimental results
on various datasets demonstrate the effectiveness of
our proposed framework. Our future work includes
using a large text corpus with multiple classes for ex-
perimentation and studying the properties of the nov-
elty score theoretically.
REFERENCES
Basu, S., Bilenko, M., and Mooney, R. J. (2004). A proba-
bilistic framework for semi-supervised clustering. In
Proceedings of the Tenth ACM SIGKDD International
Conference on Knowledge Discovery and Data Min-
ing, KDD ’04, page 59–68, New York, NY, USA. As-
sociation for Computing Machinery.
Bendale, A. and Boult, T. E. (2016). Towards open set deep
networks. In The IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
Berge, G. T., Granmo, O.-C., Tveit, T. O., Goodwin, M.,
Jiao, L., and Matheussen, B. V. (2019). Using the
Tsetlin machine to learn human-interpretable rules for
high-accuracy text categorization with medical appli-
cations. IEEE Access, 7:115134–115146.
Chandola, V., Banerjee, A., and Kumar, V. (2009).
Anomaly detection: A survey. ACM Comput. Surv.,
41(3).
Chow, C. K. (1970). On optimum recognition error and re-
ject tradeoff. IEEE Trans. Information Theory, 16:41–
46.
Darshana Abeyrathna, K., Granmo, O.-C., Zhang, X., Jiao,
L., and Goodwin, M. (2020). The regression Tsetlin
machine: A novel approach to interpretable nonlinear
regression. Philosophical Transactions of the Royal
Society A: Mathematical, Physical and Engineering
Sciences, 378(2164):20190165.
Fei, G. and Liu, B. (2015). Social media text classifica-
tion under negative covariate shift. In Proceedings of
the 2015 Conference on Empirical Methods in Natu-
ral Language Processing, pages 2347–2356, Lisbon,
Portugal. Association for Computational Linguistics.
Granmo, O.-C. (2018). The Tsetlin machine - A game
theoretic bandit driven approach to optimal pattern
recognition with propositional logic. arXiv preprint
arXiv:1804.01508.
Hautamaki, V., Karkkainen, I., and Franti, P. (2004). Out-
lier detection using k-nearest neighbour graph. In Pro-
ceedings of the 17th International Conference on Pat-
tern Recognition, 2004. ICPR 2004., volume 3, pages
430–433 Vol.3.
Hendrycks, D. and Gimpel, K. (2017). A baseline for de-
tecting misclassified and out-of-distribution examples
in neural networks. In 5th International Conference
on Learning Representations, ICLR 2017, Toulon,
France, April 24-26, 2017, Conference Track Pro-
ceedings. OpenReview.net.
Kowsari, K., Brown, D., Heidarysafa, M., Meimandi, K.,
Gerber, M., and Barnes, L. (2017). Hdltex: Hierar-
chical deep learning for text classification. 2017 16th
IEEE International Conference on Machine Learning
and Applications (ICMLA), pages 364–371.
Pimentel, M. A. F., Clifton, D. A., Clifton, L., and
Tarassenko, L. (2014). Review: A review of novelty
detection. Signal Process., 99:215–249.
Pincus, R. (1995). Barnett, v., and lewis t.: Outliers in sta-
tistical data. 3rd edition. j. wiley & sons 1994, xvii.
582 pp., £49.95. Biometrical Journal, 37(2):256–256.
Scheirer, W. J., de Rezende Rocha, A., Sapkota, A., and
Boult, T. E. (2013). Toward open set recognition.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 35(7):1757–1772.
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
416