benchmark, only the S
k
model failed to achieve per-
fect scores. Clearly, there are no significant differ-
ences between classifiers as they performed equally
well regardless of the model.
Table 5: Best accuracy and corresponding F1-score metrics
obtained by the NFAs for the analyzed classifiers.
Accuracy F1-score
Model C
MM
C
MA
C
SM
C
SA
C
MM
C
MA
C
SM
C
SA
P
k
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
P
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
P
⋆
k
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
P
⋆
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S
k
0.91 0.91 0.91 0.91 0.90 0.90 0.90 0.90
S
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S
⋆
k
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S
⋆
(k+2)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
5 CONCLUSIONS
In this paper, we have proposed a method to transform
an NFA with three types of states (accepting, rejecting
and non-conclusive) to a weighted frequency automa-
ton, which could be further transformed into a prob-
abilistic NFA. The developed transformation process
is generic since it allows to control the relative impor-
tance of the different types of states and/or transitions
by customizable weights.
We have evaluated the proposed probabilistic au-
tomata on the classification task performed over two
distinct benchmarks. The first one, based on real-life
samples of peptide sequences proved to be quite chal-
lenging, yielding relatively low quality metrics. The
second benchmark, based on a random sampling of
a language described by a regular expression enabled
us to show the power of probabilistic NFA, producing
accuracy scores of 0.81–1.00 with F1-score ranging
between 0.69 up to 1.00. The second benchmark al-
lowed us to prove that given a representative sample
of an underlying language, the probabilistic NFA can
achieve very good classification quality, even without
sophisticated parameter tuning.
In the future, we plan to apply some heuristics to
tune the weights so that the classifiers perform even
better, especially for real-life benchmarks. Given the
generic nature of the proposed weighted-frequency
automata we also plan to consider using a parallel en-
semble of classifiers, differing not only in terms of
weights, but also in how probabilities are combined.
REFERENCES
Beerten, J., van Durme, J. J. J., Gallardo, R., Capriotti,
E., Serpell, L. C., Rousseau, F., and Schymkowitz, J.
(2015). WALTZ-DB: a benchmark database of amy-
loidogenic hexapeptides. Bioinform., 31(10):1698–
1700.
de la Higuera, C. (2010). Grammatical Inference: Learn-
ing Automata and Grammars. Cambridge University
Press.
Denis, F., Lemay, A., and Terlutte, A. (2004). Learning
regular languages using RFSAs. Theor. Comput. Sci.,
313(2):267–294.
Jastrz ˛ab, T. (2017). Two parallelization schemes for the in-
duction of nondeterministic finite automata on PCs. In
Proc. of PPAM 2017, volume 10777 of LNCS, pages
279–289. Springer.
Jastrz ˛ab, T., Lardeux, F., and Monfroy, É. (2022). Taking
advantage of a very simple property to efficiently in-
fer NFAs. In 34th IEEE International Conference on
Tools with Artificial Intelligence, ICTAI 2022, pages
1355–1361. IEEE.
Jastrz ˛ab, T., Lardeux, F., and Monfroy, É. (2023). Inference
of over-constrained NFA of size k + 1 to efficiently
and systematically derive NFA of size k for grammar
learning. In Proceedings of the International Confer-
ence on Computational Science – ICCS 2023, Part I,
volume 14073 of LNCS, pages 134–147. Springer.
Lardeux, F. and Monfroy, É. (2021). Optimized models
and symmetry breaking for the NFA inference prob-
lem. In 33rd IEEE International Conference on Tools
with Artificial Intelligence, ICTAI 2021, pages 396–
403. IEEE.
Lecoutre, C. and Szczepanski, N. (2020). PYCSP3: mod-
eling combinatorial constrained problems in python.
CoRR, abs/2009.00326.
Louros, N., Konstantoulea, K., De Vleeschouwer, M.,
Ramakers, M., Schymkowitz, J., and Rousseau, F.
(2019). WALTZ-DB 2.0: an updated database con-
taining structural information of experimentally deter-
mined amyloid-forming peptides. Nucleic Acids Re-
search, 48(D1):D389–D393.
Rossi, F., van Beek, P., and Walsh, T., editors (2006).
Handbook of Constraint Programming, volume 2 of
Foundations of Artificial Intelligence. Elsevier.
Stützle, T. and Ruiz, R. (2018). Iterated Local Search,
pages 579–605. Springer International Publishing,
Cham.
Tomita, M. (1982). Dynamic construction of finite-state au-
tomata from examples using hill-climbing. Proc. of
the Fourth Annual Conference of the Cognitive Sci-
ence Society, pages 105–108.
Vázquez de Parga, M., García, P., and Ruiz, J. (2006). A
family of algorithms for non deterministic regular lan-
guages inference. In Proc. of CIAA 2006, volume
4094 of LNCS, pages 265–274. Springer.
Wieczorek, W. (2017). Grammatical Inference – Algo-
rithms, Routines and Applications, volume 673 of
Studies in Computational Intelligence. Springer.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
1188