
tested. However, the vast majority of the puzzles
tested were not locally minimal and the number of
prediction iterations they required were equal to the
number of missing values. In our study, we not only
performed experiments on locally minimal puzzles
with 10, 11, or 12 missing values - the hardest well
posed order 2 puzzles - and attained a completion rate
greater than 99 percent, but did so in a single predic-
tion step, albeit with a more intricate model.
Yang et al. (Yang et al., 2023) trained a generative
pre-trained transformer (GPT) based model with the
dataset used by Palm et al. (Palm et al., 2018) and
tested it with iterative predictions. They had supe-
rior results, solving more than 99 percent of the puz-
zles, although when restricted to the hardest puzzles
achieved a 96.7 percent completion rate. However,
Yang et al. (Yang et al., 2023) required a sequence
of prediction steps to reach the solution and did not
provide a solution in one end-to-end prediction. They
needed 32 prediction iterations to achieve their results
in order 3 puzzles. The baseline for our study is the
model used by Yang et al. (Yang et al., 2023) modi-
fied for order 2 puzzles. We show that our model has
competitively good results in less prediction iterations
than required by their model.
As can be seen above, the significant results so
far have been achieved only in systems that integrate
sequences of interdependent prediction stages. In this
paper we propose a model that achieves competent
results in a single prediction stage.
3 DEEP LEARNING METHODS
Our paper introduces a deep learning approach specif-
ically targeted at tackling order 2 Sudoku puzzles.
These are 4x4 puzzles that, while smaller in scale
compared to the standard order 3 Sudokus, present a
unique appeal for scientific investigation. Given their
relative simplicity, both in terms of representation and
analysis, focusing our research on order 2 puzzles en-
ables more rapid training and facilitates quicker at-
tainment of results.
We consider Sudoku to be akin to a sophisticated,
multi-layered sequence completion problem. With
this perspective, we developed a deep learning neu-
ral network that leverages LSTM modules designed
for sequence completion. This approach has yielded
results that are on par with current leading models.
While our demonstrated results are limited to or-
der 2 puzzles, we maintain the belief that these puz-
zles are sufficiently complex to serve as a sound foun-
dation for creating a successful model for higher-
order Sudoku problems. The importance of study-
Table 1: Well Posed Order 2 Puzzle Count.
H WP LM H WP LM
4 25728 25728 10 2204928 0
5 284160 58368 11 1239552 0
6 1041408 1536 12 522624 0
7 2141184 0 13 161280 0
8 2961024 0 14 34560 0
9 2958336 0
H - The number of hints in the puzzle.
WP - Well Posed, LM - Locally Minimal.
ing 4x4 puzzles lies in the opportunity they provide to
build, test and refine models that could be efficiently
scaled to more intricate Sudoku variants. This makes
them an essential stepping stone in advancing deep
learning methodologies for solving larger and more
complex problems.
In this section we bring the technical information
of our methods, in particular the data composition and
the structure of our models.
3.1 Datasets
Since most of the research into Sudoku was on order
3 puzzles, we did not find an existing dataset of order
2 puzzles so we created our own.
There exist exactly 288 unique order 2 solved Su-
doku boards. Those boards represent 85632 puzzles
which are both well posed and locally minimal, each
containing only 4, 5, or 6 hints. It is possible to create
a larger number of well posed puzzles by adding more
hints, although those puzzles are not locally minimal.
Figures 2 and 3 demonstrate examples for the various
possible types of puzzles and their solutions. Table 1
shows the full number of well posed puzzles with 4 to
14 hints and how many of them are locally minimal.
Our primary dataset consists of all 85632 well
posed and locally minimal order 2 puzzles. Details
on the generation process of the puzzles is provided
in the appendix. The training of our models was per-
formed on a subset of 77069 puzzles using 9-fold
cross validation (We divided the puzzles into 10 sub-
sets and left one out of the process).
The puzzles and their solutions are composed of
strings of digits, where missing values are denoted as
zeros. Since the values are discrete and categorical,
we one-hot encoded them into five digit binary vec-
tors in order to make processing easier.
3.2 Machine Learning Models
Below we describe the models used in this study. The
first section describes our main model, a neural net-
work architecture we call the Multiverse, which is
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
18