benchmark, only the S
model failed to achieve per-
fect scores. Clearly, there are no significant differ-
ences between classifiers as they performed equally
well regardless of the model.
Table 5: Best accuracy and corresponding F1-score metrics
obtained by the NFAs for the analyzed classifiers.
Accuracy F1-score
Model C
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.91 0.91 0.91 0.91 0.90 0.90 0.90 0.90
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
In this paper, we have proposed a method to transform
an NFA with three types of states (accepting, rejecting
and non-conclusive) to a weighted frequency automa-
ton, which could be further transformed into a prob-
abilistic NFA. The developed transformation process
is generic since it allows to control the relative impor-
tance of the different types of states and/or transitions
by customizable weights.
We have evaluated the proposed probabilistic au-
tomata on the classification task performed over two
distinct benchmarks. The first one, based on real-life
samples of peptide sequences proved to be quite chal-
lenging, yielding relatively low quality metrics. The
second benchmark, based on a random sampling of
a language described by a regular expression enabled
us to show the power of probabilistic NFA, producing
accuracy scores of 0.81–1.00 with F1-score ranging
between 0.69 up to 1.00. The second benchmark al-
lowed us to prove that given a representative sample
of an underlying language, the probabilistic NFA can
achieve very good classification quality, even without
sophisticated parameter tuning.
In the future, we plan to apply some heuristics to
tune the weights so that the classifiers perform even
better, especially for real-life benchmarks. Given the
generic nature of the proposed weighted-frequency
automata we also plan to consider using a parallel en-
semble of classifiers, differing not only in terms of
weights, but also in how probabilities are combined.
