Measuring the Novelty of Natural Language Text using the

Conjunctive Clauses of a Tsetlin Machine Text Classiﬁer

Bimal Bhattarai

, Ole-Christoffer Granmo

and Lei Jiao

Department of Information and Communication Technology, University of Agder, Grimstad, Norway

Keywords:

Novelty Detection, Deep Learning, Interpretable, Tsetlin Machine.

Abstract:

Most supervised text classiﬁcation approaches assume a closed world, counting on all classes being present

in the data at training time. This assumption can lead to unpredictable behaviour during operation, whenever

novel, previously unseen, classes appear. Although deep learning-based methods have recently been used

for novelty detection, they are challenging to interpret due to their black-box nature. This paper addresses

interpretable open-world text classiﬁcation, where the trained classiﬁer must deal with novel classes during

operation. To this end, we extend the recently introduced Tsetlin machine (TM) with a novelty scoring mech-

anism. The mechanism uses the conjunctive clauses of the TM to measure to what degree a text matches the

classes covered by the training data. We demonstrate that the clauses provide a succinct interpretable descrip-

tion of known topics, and that our scoring mechanism makes it possible to discern novel topics from the known

ones. Empirically, our TM-based approach outperforms seven other novelty detection schemes on three out of

ﬁve datasets, and performs second and third best on the remaining, with the added beneﬁt of an interpretable

propositional logic-based representation.

1 INTRODUCTION

In recent years, deep learning-based techniques have

achieved superior performance on many text clas-

siﬁcation tasks. Most of the classiﬁers use super-

vised learning, assuming a closed-world environ-

ment (Scheirer et al., 2013). That is, the classes pre-

sented in the test data (or during operation) are also

assumed to be presented in the training data. How-

ever, when facing an open-world environment, new

classes may appear after training (Bendale and Boult,

2016). In such cases, assuming a closed world can

lead to unpredictable behaviour. For example, a chat-

bot interacting with a human user will regularly face

new user intents that it has not been trained to rec-

ognize. A chatbot for banking services may, for in-

stance, have been trained to recognize the intent of

applying for a loan. However, it will provide mean-

ingless responses if it fails to recognize that asking for

a lower interest rate is new and different user intent.

The problem with neural network-based supervised

classiﬁers that use the typical softmax layer is that

https://orcid.org/0000-0002-7339-3621

https://orcid.org/0000-0002-7287-030X

https://orcid.org/0000-0002-7115-6489

they erroneously force novel input into one of the pre-

viously seen classes by normalizing the class output

scores to produce a distribution that sums to 1.0. In-

stead, a robust classiﬁer should be able to ﬂag input as

a novel, rejecting to label it according to the presently

known classes. Recently, many important application

areas make use of novelty detection such as medi-

cal applications, fraud detection (Veeramreddy et al.,

2011), sensor networks (Zhang et al., 2010), and text

analysis (Basu et al., 2004). For a further study of

these classes of techniques, the reader is referred to

(Pimentel et al., 2014).

Outlier decision boundary

Outlier detection Novelty detection

trained example

distant

known class

novel example

Figure 1: Visualization of outlier detection and novelty de-

tection.

Figure 1 illustrates the problem of novelty detection,

i.e., recognizing when the data fed to a classiﬁer is

novel and somehow differs from the data that was

available during training. In brief, after training on

410

Bhattarai, B., Granmo, O. and Jiao, L.

Measuring the Novelty of Natural Language Text using the Conjunctive Clauses of a Tsetlin Machine Text Classiﬁer.

DOI: 10.5220/0010382204100417

In Proceedings of the 13th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2021) - Volume 2, pages 410-417

ISBN: 978-989-758-484-8

data with known classes (blue data points to the right

in Figure 1), the classiﬁer should be able to detect

novel data arising from new, and previously unseen

classes (yellow data points to the right in the ﬁgure).

We will refer to the known data points as positive ex-

amples and the novel data points as negative exam-

ples. Note that the problem of novelty detection is

closely related to so-called outlier detection (Chan-

dola et al., 2009; Pincus, 1995). However, as exempli-

ﬁed to the left in Figure 1, the latter problem involves

ﬂagging data points that are part of an already known

class, yet deviating from the other data points, e.g.,

due to measurement errors or anomalies (red point).

Problem Deﬁnition: Generally, in multiclass classi-

ﬁcation, we have a set of example data points X =

), x

∈ R

, where i = (1,2,3,...,N) indexes the

positive examples, s is the dimensionality of the data

point x

, and y

is the label of x

, assigning it a class.

For any data point x

, also referred to as a feature vec-

tor, a classiﬁer function ˆy = F (x; X) is to assign the

data point a predicted class ˆy, after the function has

been ﬁtted to the training data X. Additionally, a nov-

elty scoring function z(x; X ), also ﬁtted on X, calcu-

lates a novelty score, so that a threshold σ can be used

to recognize novel input. In other words, the classi-

ﬁer is to return the correct class label while simulta-

neously being capable of rejecting novel examples.

Under the above circumstances, standard super-

vised learning would fail, particularly for methods

based on building a discriminant function, such as

neural networks. As shown to the left of Figure 1,

a discriminant function captures the discriminating

“boundaries” between the known classes and cannot

readily be used to discern novel classes. Still, a tra-

ditional way of implementing novelty detection is to

threshold the entropy of the class probability distribu-

tion (Hendrycks and Gimpel, 2017). Such methods

are not actually measuring novelty but closeness to

the decision boundary. Such an approach thus leads

to undetected novel data points when the data points

are located far from the decision boundary. Another,

and perhaps more robust, approach is to take advan-

tage of the class likelihood function, which estimates

the probability of the data given the class.

Paper Contributions: Unlike traditional methods,

in this paper we leverage the conjunctive clauses

in propositional logic that a TM builds. The TM

clauses represent frequent patterns in the data, and

our hypothesis is that these frequent patterns charac-

terize the known classes succinctly and comprehen-

sively. Thus, we establish a novelty scoring mech-

anism based simply on counting how many clauses

match the input. This score can, in turn, be thresh-

olded manually to ﬂag novel input. However, for

more robust novelty detection, we train several stan-

dard machine learning techniques to ﬁnd accurate

thresholds. The main contributions of our work can

thus be summarized as follows:

• We devise the ﬁrst TM-based approach for nov-

elty detection, leveraging the clauses of the TM

architecture.

• We illustrate how the new technique can be used

to detect novel topics in text, and compare the

technique against widely used approaches on ﬁve

different datasets.

2 RELATED WORK

The perhaps most common approach to novelty de-

tection is distance-based methods (Hautamaki et al.,

2004), which assumes that the known or seen data

are clustered together while novel data has a high dis-

tance to the clusters. The major drawback of these

methods is computational complexity when perform-

ing clustering or nearest neighbor search in a large

dataset. Early work on novelty detection also includes

one-class SVMs (Sch

olkopf et al., 2001), which are

only capable of using the positive training examples

to maximize the class margin. This shortcoming

is overcome by the Center-Based Similarity (CBS)

space learning method (Fei and Liu, 2015), which

uses binary classiﬁers over vector similarities of train-

ing examples transformed into the center of the class.

To build a classiﬁer for detecting novel class distri-

butions, Chow et al. proposed a conﬁdence score-

based method that suggests an optimum input rejec-

tion rule (Chow, 1970). The method is relatively ac-

curate, but it does not scale well to high-dimensional

datasets.

The novelty detection method OpenMax (Bendale

and Boult, 2016) is more recent, and estimates the

probability of the input belonging to a novel class.

To achieve this, the method employs an extra layer,

connected to the penultimate layer of the original net-

work. However, the computational complexity of the

method is high, and the underlying inference cannot

easily be interpreted for quality assurance. Lately, Yu

et al. (Yu et al., 2017) adopted the Adversarial Sam-

ple Generation (ASG) framework (Hautamaki et al.,

2004) to generate positive and negative examples in

an unsupervised manner. Then, based on those exam-

ples, they trained an SVM classiﬁer for novelty detec-

tion. Furthermore, in computer vision, Scheirer et al.

introduced the concept of open space risk to recog-

nize novel image content (Scheirer et al., 2013). They

proposed a “1-vs-set machine” that creates a decision

Measuring the Novelty of Natural Language Text using the Conjunctive Clauses of a Tsetlin Machine Text Classiﬁer

411

space using a binary SVM classiﬁer, with two parallel

hyperplanes bounding the non-novel regions. In Sec-

tion 4, we compare the performance of our new TM-

based approach with the most widely used approaches

among those mentioned above.

3 TSETLIN MACHINE-BASED

NOVELTY DETECTION

In this section, we propose our approach to novelty

detection based on the TM. First, we explain the ar-

chitecture of the TM. Then, we show how novelty

scores can be obtained from the TM clauses. Finally,

we integrate the TM with a rule-based classiﬁer for

novel text classiﬁcation.

3.1 Tsetlin Machine (TM) Architecture

The TM is a recent approach to pattern clas-

siﬁcation (Granmo, 2018) and regression (Dar-

shana Abeyrathna et al., 2020). It builds on a classic

learning mechanism called a Tsetlin automaton (TA),

developed by M. L. Tsetlin in the early 1960s (Tsetlin,

1961). In all brevity, multiple teams of TA combine to

form the TM. Each team is responsible for capturing a

frequent sub-pattern in high precision by composing a

conjunctive clause. In-built resource allocation prin-

ciples guide the teams to distribute themselves across

the underlying sup-patterns of the problem. Recently,

the TM has performed competitively with the state-

of-the-art techniques including deep neural networks,

for text classiﬁcation (Berge et al., 2019), and aspect-

based sentiment analysis (Yadav et al., 2021). Further,

the convergence of TM has been studied in (Zhang

et al., 2020). In what follows, we propose a new

scheme that extends the TM with the capability to rec-

ognize novel patterns.

Figure 2 describes the building blocks of a TM.

As seen, a vanilla TM takes a vector X = (x

,...,x

)

of binary features as input (Figure 3). We binarize

text by using binary features that capture the pres-

ence/absence of terms in a vocabulary, akin to a bag

of words, as done in (Berge et al., 2019). However,

as opposed to a vanilla TM, our scheme does not out-

put the predicted class. Instead, it calculates a novelty

score per class.

Together with their negated counterparts, ¯x

¬x

= 1 − x

, the features form a literal set, L =

,...,x

, ¯x

,..., ¯x

}. A TM pattern is formu-

lated as a conjunctive clause C

or C

−

, where j =

(1,...,m/2) denotes an index of a clause, and the su-

perscript describes the polarity of a clause. In more

detail, the total number of clauses, m, is divided into

two parts, where half of the clauses are assigned a

positive polarity and the other half are assigned a neg-

ative polarity. Any clause, regardless of the polarity,

is formed by ANDing a subset of the literal set. For

example, the j

clause with positive polarity, C

(X),

can be expressed as:

(X) =

∈L

∏

∈L

. (1)

where L

⊆ L is the set of literals that are involved in

the expression of C

(X) after training. For instance,

given clause C

(X) = x

, it consists of the literals

= {x

} and outputs 1 if x

= x

= 1. The output

of a conjunctive clause is determined by evaluating it

on the input literals. When a clause outputs 1, this

means that it has recognized a pattern in the input.

Conversely, the clause outputs 0 when no pattern is

recognized. The clause outputs, in turn, are combined

into a classiﬁcation decision through summation and

thresholding using the unit step function u:

ˆy = u

m/2

∑

j=1

(X) −

m/2

∑

j=1

−

(X)

. (2)

That is, the classiﬁcation is performed based on a

majority vote, with the positive clauses voting for

y = 1 and the negative for y = 0. The classiﬁer

ˆy = u (x

¯x

+ ¯x

− x

− ¯x

¯x

), e.g., captures the

XOR-relation.

A clause is composed by a team of TA, each TA

deciding to Exclude or Include a speciﬁc literal in

the clause. The TA learns which literals to include

based on reinforcement: Type I feedback is designed

to produce frequent patterns, while Type II feedback

increases the discriminating power of the patterns (see

(Granmo, 2018) for details).

3.2 Novelty Detection Architecture

For novelty detection, however, we here propose to

treat all clause output as positive, disregarding clause

polarity. This is because both positive and nega-

tive clauses capture patterns in the training data, and

thus can be used to detect novel input. We use this

sum of absolute clause outputs as a novelty score,

which denotes the resemblance of the input to the

patterns formed by clauses during training. The re-

sulting modiﬁed TM architecture is captured by Fig-

ure 3, showing how four different outputs (i.e., two

per class) are produced by the TM. These outputs

form the basis for novelty detection.

The overall novelty detection architecture is shown

in Figure 4. Each TM (one per class) produces two

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

412

Novelty Score

Figure 2: Optimization of clauses in TM for generating novelty score.

Text Input

Binary Encoded Input

Class 1 Class 2

Positive

Novelty Score

Positive

Novelty Score

Input

Clauses

Output

Negative

Novelty Score

Negative

Novelty Score

Figure 3: Multiclass Tsetlin Machine (MTM) framework to

produce the novelty score for each class.

novelty scores for its respective class, one based on

the positive polarity clauses and another based on the

negative polarity clauses. The novelty scores are nor-

malized and then given to a classiﬁer, such as de-

cision tree (DT), k-nearest neighbor (KNN), support

vector machine (SVM), and logistic regression (LR).

The output from these classiﬁers are “Novel” or “Not

Novel”, i.e. 0 or 1 in the ﬁgure.

For illustration purposes, instead of using a ma-

chine learning algorithm to decide upon novelty, one

can instead use a simple rule-based classiﬁer, intro-

Known Class Novel Class

Novelty Score Novelty Score

Classifier

Output (0 or 1)

Figure 4: Novelty detection architecture.

ducing a classiﬁcation threshold T . Then the novelty

score for the input sentence can be compared with T

to detect whether a sentence is novel or not. That is, T

decides how many clauses must match to qualify the

input as non-novel. The classiﬁcation function F (X)

for a single class can accordingly be given as:

F (X) =











1, if

∑

m/2

j=1

(X) > T,

1, if

∑

m/2

j=1

−

(X) < −T,

0, otherwise.

Measuring the Novelty of Natural Language Text using the Conjunctive Clauses of a Tsetlin Machine Text Classiﬁer

413

4 EXPERIMENTS AND RESULTS

In this section, we present the experimental evaluation

of our proposed TM model and compare it with other

baseline models across ﬁve datasets.

4.1 Datasets

We used the following datasets for evaluation.

• 20-newsgroup Dataset: The dataset contains 20

classes with a total of 18,828 documents. In

our experiment, we consider the two classes

“comp.graphics” and “talk.politics.guns” as

known classes and the class “rec.sport.baseball”

as novel. We take 1000 samples from both known

and novel classes for novelty classiﬁcation.

• CMU Movie Summary Corpus: The dataset con-

tains 42,306 movie plot summaries extracted from

Wikipedia and metadata extracted from Freebase.

In our experiment, we consider the two movie cat-

egories “action” and “horror” as known classes

and “fantasy” as novel.

• Spooky Author Identiﬁcation Dataset: The

dataset contains 3,000 public domain books from

the following horror ﬁction authors: Edgar Allan

Poe (EAP), HP Lovecraft (HPL), and Mary Shel-

ley (MS). We train on written texts from EAP and

HPL while treating texts from MS as novel.

• Web of Science Dataset (Kowsari et al., 2017):

This dataset contains 5,736 published papers, with

eleven categories organized under three main cat-

egories. We use two of the main categories as

known classes, and the third as a novel class.

• BBC Sports Dataset: This dataset contains 737

documents from the BBC Sport website, orga-

nized in ﬁve sports article categories and collected

from 2004 to 2005. In our work, we use “cricket”

and “football” as the known classes and “rugby”

as novel.

4.2 A Case Study

To cast light on the interpretability of our scheme, we

here use substrings from the 20 Newsgroup dataset,

demonstrating novelty detection on a few simple

cases. First, we form an indexed vocabulary set V ,

including all literals from the dataset. The input text

is binarized based upon the index of the literals in V .

For example, if a word in the text substring has been

assigned index 5 in V , the 5

position of the input

vector is set to 1. If a word is absent from the sub-

string, its corresponding feature is set to 0. Let us

consider substrings from the two known classes and

the novel class from the 20 Newsgroup dataset.

• Class: comp.graphics (known)

Text: Presentations are solicited on all aspects

of Navy-related scientiﬁc visualization and virtual

reality.

Literals: “Presentations”, “solicited”, “as-

pects”, “Navy”, “related”, “scientiﬁc”, “visual-

ization”, “virtual”, “reality”.

• Class: talk.politics.guns (known)

Text: Last year the US suffered almost 10,000

wrongful or accidental deaths by handguns alone.

In the same year, the UK suffered 35 such deaths.

Literals: “Last ”, “year”, “US”, “suffered”,

“wrongful”, “accidental”, “deaths”, “hand-

guns”, “UK”, “suffered”.

• Class: rec.sport.baseball (Novel)

Text: The top 4 are the only true contenders in

my mind. One of these 4 will deﬁnitely win the

division unless it snows in Maryland.

Literals: “top ”, “only”, “true”, “contenders”,

“mind”, “win”, “division”, “unless”, “snows”,

“Maryland”.

After training, the two known classes form conjunc-

tive clauses that capture literal patterns reﬂecting the

textual content. For the above example, we get the

following clauses:

• C

= “Presentations” ∧ “aspects” ∧ “Navy” ∧

“scientiﬁc” ∧ “virtual” ∧ “reality” ∧ “year” ∧

“US” ∧ “mind” ∧ “division”

• C

−

= ¬(“suffered” ∧ “accidental” ∧ “unless” ∧

“snows”)

• C

= “last” ∧ “year” ∧ “US” ∧ “wrongful” ∧

“deaths” ∧“accidental” ∧“handguns” ∧“Navy”

∧ “snows” ∧ “Maryland” ∧ “divisions”

• C

−

= ¬(“presentations” ∧ “solicited” ∧

“virtual” ∧ “top” ∧ “win”)

Table 1: Novelty score example when known text sentence

is passed to the model.

Class C

−

Known +6 -3 +3 -5

Novel +2 -1 +1 -2

Here, the clauses from each class captures the fre-

quent patterns from the class. However, it may also

contain certain literals from other classes. The posi-

tive polarity clauses provide evidence on the presence

of a class, while negative polarity clauses provide ev-

idence on the absence of the class. The novelty score

for each class is calculated based on the propositional

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

414

Table 2: Accuracy of different machine learning classiﬁers to detect novel class in various dataset.

Dataset DT KNN SVM LR NB MLP

20 Newsgroup 72.5 % 82 % 78.0 % 72.75 % 69.0 % 82.50 %

Spooky action author 53.42 % 57.89 % 63.15 % 52.63 % 58.68 % 63.15 %

CMU movie 61.05 % 64.73 % 62.10 % 55.00 % 58.94 % 68.68 %

BBC sports 84.21 % 85.96 % 75.43% 70.17 % 73.68 % 89.47 %

Web of Science 64.70 % 67.97 % 69.93 % 67.10 % 62.09 % 70.37 %

formula formed by the clauses (Figure 2). In general,

for input from known classes, the novelty scores are

higher. This is because the clauses have been trained

to vote for or against input from the known class. For

example, when we pass a known class to our model,

the clauses might produce scores as in Table 1 (for

illustration purposes).

The scores are then used as features to prepare a

dataset for employing machine learning classiﬁers to

enhance novelty detection. As exempliﬁed in the ta-

ble, novelty scores for known classes are relatively

higher than those of novel classes. This allows the

ﬁnal classiﬁer to robustly recognize novel input, as

explored empirically below.

4.3 Empirical Evaluation

We divided the task into two experiments, i.e., 1)

Novelty score calculation 2) Novelty/Known Class

classiﬁcation. In the ﬁrst experiment, we employ the

known classes to train the TM. The TM runs for 100

epochs with a hyperparameter setting of 5000 clauses,

a threshold T of 25, and a sensitivity s of 15.0. Then,

we use the clauses formed by the trained TM model to

calculate the novelty scores for both known and novel

classes. We adopt an equal number of examples to

gather the novelty score from both known and novel

classes. In the second experiment, the novelty score

generated from the ﬁrst experiment is forwarded as

input to standard machine learning classiﬁers, such

as DT, KNN, SVM, LR, NB, and MLP to classify

whether a text is novel or known.

The experimental results for all datasets are shown

in Table 2. As seen, multilayer perceptron (MLP)

(hidden layer sizes 100,30 and ReLU activation with

stochastic gradient descent) is superior for all of the

datasets. In our experiments, the 20 Newsgroups and

BBC sports datasets yielded better results than the

three other data sets, arguably because of the sharp

distinctiveness of examples in the known and novel

classes. We further believe that the novelty scores

are clustered based on known and novel classes; thus,

distance-based methods seem effective in classiﬁca-

tion. We plotted the novelty score of thousand text

samples from known and novel classes using our

framework, which shows how scores differ signif-

2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

S c o r e

S a m p l e

K n o w n c l a s s

N o v e l c l a s s

Figure 5: Visualization of differences in novelty score for

known and novel classes.

icantly for each sample as visualized in Figure 5.

Moreover, we can see from the graph that when the

texts from the novel and known classes are discrimi-

native, the TM produces distinct novelty scores, thus

improving the ability of machine learning classiﬁers

to detect novel texts. In all brevity, we believe that the

clauses of a TM capture frequent patterns in the train-

ing data, and thus novel samples will reveal them-

selves by not ﬁtting with a sufﬁciently large number

of clauses. The number of triggered clauses can there-

fore be utilized to measure novelty score. Also, the

clauses formed by trained TM models should only to

a small degree trigger on novel classes, producing dis-

tinctively low scores.

We compared the performance of our TM frame-

work with different clustering and outlier detection

algorithms, such as Cluster-based Local Outlier Fac-

tor (CBLOF), Feature Bagging (neighbors, n = 35),

Histogram-base Outlier Detection (HBOS), Isolation

Forest, Average KNN, K-Means clustering, and One-

class SVM. The evaluation was performed on the

same preprocessed datasets for a fair comparison.

To make comparison more robust, we preprocessed

the data for the baseline algorithms using count vec-

torizer, term frequency-inverse document frequency

(TF-IDF), and Principle component analysis (PCA).

Additionally, we utilized the maximum possible out-

lier fraction (i.e., 0.5) for these methods. The perfor-

mance comparison is given in Table 3, which shows

Measuring the Novelty of Natural Language Text using the Conjunctive Clauses of a Tsetlin Machine Text Classiﬁer

415

Table 3: Performance comparison of proposed TM framework with cluster and outlier-based novelty detection algorithms.

Algorithms 20 Newsgroup Spooky action author CMU movie BBC sports WOS

LOF 52.51 % 50.66 % 48.84 % 47.97 % 55.61 %

Feature Bagging 67.60 % 62.70 % 64.73 % 54.38 % 69.64 %

HBOS 55.03 % 48.55 % 48.57 % 49.53 % 55.09 %

Isolation Forest 52.01 % 48.66% 49.10 % 49.35% 54.70 %

Average KNN 76.35 % 57.76 % 56.21 % 55.54 % 79.22 %

K-Means clustering 81.00 % 61.30 % 49.20 % 47.70 % 41.31 %

One-class SVM 83.70 % 43.56 % 51.94 % 83.53 % 36.32 %

TM framework 82.50 % 63.15% 68.15 % 89.47 % 70.37 %

that our framework surpasses the other algorithms on

three of the datasets and performs competitively in the

remaining two. However, in datasets like Web of Sci-

ence, where there are many similar words shared be-

tween known and novel classes, our method is sur-

prisingly surpassed by the distance-based algorithm

(i.e., Average KNN). One-class SVM closely follows

the performance of our TM framework, which may be

due to its linear structure that prevents overﬁtting on

imbalanced and small datasets.

5 CONCLUSION

In this paper, we studied the problem of novelty de-

tection in multiclass text classiﬁcation. We proposed

a score-based TM framework for novel class detec-

tion. We ﬁrst used the clauses of the TM to pro-

duce a novelty score, distinguishing between known

and novel classes. Then, a machine learning classiﬁer

is adopted for novelty classiﬁcation using the novelty

scores provided by the TM. The experimental results

on various datasets demonstrate the effectiveness of

our proposed framework. Our future work includes

using a large text corpus with multiple classes for ex-

perimentation and studying the properties of the nov-

elty score theoretically.

REFERENCES

Basu, S., Bilenko, M., and Mooney, R. J. (2004). A proba-

bilistic framework for semi-supervised clustering. In

Proceedings of the Tenth ACM SIGKDD International

Conference on Knowledge Discovery and Data Min-

ing, KDD ’04, page 59–68, New York, NY, USA. As-

sociation for Computing Machinery.

Bendale, A. and Boult, T. E. (2016). Towards open set deep

networks. In The IEEE Conference on Computer Vi-

sion and Pattern Recognition (CVPR).

Berge, G. T., Granmo, O.-C., Tveit, T. O., Goodwin, M.,

Jiao, L., and Matheussen, B. V. (2019). Using the

Tsetlin machine to learn human-interpretable rules for

high-accuracy text categorization with medical appli-

cations. IEEE Access, 7:115134–115146.

Chandola, V., Banerjee, A., and Kumar, V. (2009).

Anomaly detection: A survey. ACM Comput. Surv.,

41(3).

Chow, C. K. (1970). On optimum recognition error and re-

ject tradeoff. IEEE Trans. Information Theory, 16:41–

46.

Darshana Abeyrathna, K., Granmo, O.-C., Zhang, X., Jiao,

L., and Goodwin, M. (2020). The regression Tsetlin

machine: A novel approach to interpretable nonlinear

regression. Philosophical Transactions of the Royal

Society A: Mathematical, Physical and Engineering

Sciences, 378(2164):20190165.

Fei, G. and Liu, B. (2015). Social media text classiﬁca-

tion under negative covariate shift. In Proceedings of

the 2015 Conference on Empirical Methods in Natu-

ral Language Processing, pages 2347–2356, Lisbon,

Portugal. Association for Computational Linguistics.

Granmo, O.-C. (2018). The Tsetlin machine - A game

theoretic bandit driven approach to optimal pattern

recognition with propositional logic. arXiv preprint

arXiv:1804.01508.

Hautamaki, V., Karkkainen, I., and Franti, P. (2004). Out-

lier detection using k-nearest neighbour graph. In Pro-

ceedings of the 17th International Conference on Pat-

tern Recognition, 2004. ICPR 2004., volume 3, pages

430–433 Vol.3.

Hendrycks, D. and Gimpel, K. (2017). A baseline for de-

tecting misclassiﬁed and out-of-distribution examples

in neural networks. In 5th International Conference

on Learning Representations, ICLR 2017, Toulon,

France, April 24-26, 2017, Conference Track Pro-

ceedings. OpenReview.net.

Kowsari, K., Brown, D., Heidarysafa, M., Meimandi, K.,

Gerber, M., and Barnes, L. (2017). Hdltex: Hierar-

chical deep learning for text classiﬁcation. 2017 16th

IEEE International Conference on Machine Learning

and Applications (ICMLA), pages 364–371.

Pimentel, M. A. F., Clifton, D. A., Clifton, L., and

Tarassenko, L. (2014). Review: A review of novelty

detection. Signal Process., 99:215–249.

Pincus, R. (1995). Barnett, v., and lewis t.: Outliers in sta-

tistical data. 3rd edition. j. wiley & sons 1994, xvii.

582 pp., £49.95. Biometrical Journal, 37(2):256–256.

Scheirer, W. J., de Rezende Rocha, A., Sapkota, A., and

Boult, T. E. (2013). Toward open set recognition.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 35(7):1757–1772.

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

416

Sch

olkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., and

Williamson, R. (2001). Estimating support of a

high-dimensional distribution. Neural Computation,

13:1443–1471.

Tsetlin, M. L. (1961). On the behavior of ﬁnite au-

tomata in random media. Avtomatika i Telemekhanika,

22:1345–1354.

Veeramreddy, J., Prasad, V., and Prasad, K. (2011). A

review of anomaly based intrusion detection sys-

tems. International Journal of Computer Applica-

tions, 28:26–35.

Yadav, R. K., Jiao, L., Granmo, O.-C., and Goodwin,

M. (2021). Human-level interpretable learning for

aspect-based sentiment analysis. In The Thirty-Fifth

AAAI Conference on Artiﬁcial Intelligence (AAAI-21).

AAAI.

Yu, Y., Qu, W.-Y., Li, N., and Guo, Z. (2017). Open

category classiﬁcation by adversarial sample genera-

tion. In Proceedings of the Twenty-Sixth International

Joint Conference on Artiﬁcial Intelligence. Interna-

tional Joint Conferences on Artiﬁcial Intelligence Or-

ganization.

Zhang, X., Jiao, L., Granmo, O.-C., and Goodwin, M.

(2020). On the convergence of Tsetlin machines

for the identity-and not operators. arXiv preprint

arXiv:2007.14268.

Zhang, Y., Meratnia, N., and Havinga, P. (2010). Out-

lier detection techniques for wireless sensor networks:

A survey. IEEE Communications Surveys Tutorials,

12(2):159–170.

Measuring the Novelty of Natural Language Text using the Conjunctive Clauses of a Tsetlin Machine Text Classiﬁer

417