The categorisation function corresponds to the
purpose of the test and can be deduced directly from
its test statistic. In the case of the Monobit test, the
categorisation function was applied to individual bits,
but in general it can process parts of the examined
bitstream or its transformed equivalent. In general,
we also assume that the number of categories is arbi-
trary (not only two), and so is the related degree of
freedom.
C Parameters and Settings
Size of the population was changed to one, since
only one P-value can be taken for the correct statis-
tical interpretation (see Section 3.4). This compen-
sates for the fact that the whole process must be re-
peated several times (for the correct interpretation of
results). The fitness value of the GA was changed to
the P-value since the P-value clearly represents the
most relevant value of the testing procedure.
Note 4. It is possible to use two-sample χ
2
test statis-
tic as the fitness value but it does not reflect degrees
of freedom.
Since the population consists of only one circuit,
the crossover probability is automatically set to zero.
To choose appropriate circuit settings, a deeper ana-
lysis of the evolution is needed. Let us consider a
fixed setting of the circuit parameters (number of lay-
ers, number of gates in the layer, . . . ). Since the cir-
cuit resources are limited, they should be used ef-
fectively. This means that the pool of available op-
erations should consist of complex operations. The
reason for that is that complex operations constructed
from trivial ones consume a lot of available resources.
On the other hand, too many defined operations sig-
nificantly enlarge the space where GA works and
could mislead the evolution process. Therefore the set
of operations should consist of complex but meaning-
ful operations. In the case of stream cipher, operations
used in the cipher design can be considered meaning-
ful. Therefore we use only simple Byte “boolean”
operations like AND, OR, NOR, NOT, etc.
The main problem of the previous approach is that
the output from the last layer of the circuit is inter-
preted as a single bit. Clearly, this leads to a loss
of distinguishing ability of circuits, since results of
many gates are often discarded. To avoid this, more
bits from the last layer should be used for the interpre-
tation. This perfectly fits to our framework since the
categorisation function of the test can work with arbi-
trary many categories. It can be expected that we get
the most sensitive test of randomness if all categories
are defined by single output value of the circuit, i.e.,
if C with 2
m
categories is represented by a circuit C
0
with m output bits. Unfortunately, application of the
χ
2
test (Section A of this Appendix) requires that the
frequency in each category should be at least 5. This
means that for C
0
with 8 output bits there should be
either more test vectors (more than used k = 1000) or
the number of categories should be smaller. We have
reduced the number of categories. For 1000 test vec-
tors, it must be smaller than 200. In such case each
category could be defined by 7 bits of a circuit output
byte. Of course, the GA does not fill the categories
evenly and therefore we need to use less categories.
For our experiments we have chosen 8 categories de-
fined by the last 3 bits of all 8 output bits.
D Implementation and Model Testing
In this part we describe tests that confirm correct-
ness of the statistical model and correctness of its
implementation. We want to confirm that P-values
computed by two-sample χ
2
(from category frequen-
cies) are distributed uniformly on the interval [0, 1].
Besides the statistical model we also need to check
our implementation of statistical tests (two-sample χ
2
test, KS test).
Firstly, we have tested the implementation of the
KS test. We analysed 10
7
sets P of t = 300 uniformly
distributed randomly generated real numbers from the
interval [0,1]. Using the KS test we have obtained
the total of 497496 test statistics values that were lo-
cated in the critical region defined by α = 0.05. This
value represents 4.97% of all tested sets and is in good
agreement with the expected 5% value.
Secondly, to check the χ
2
test implementation we
simulate the circuit generation process. We generated
300 pairs of number samples S
i,1
,S
i,2
,i ∈{1,··· ,300}
from the set {0, ··· ,7}. For each i, both sets S
i,1
,S
i,2
were randomly generated according to the given ran-
dom distribution. We applied the two-sample χ
2
test
to compare samples S
i,1
,S
i,2
for each i ∈{1,··· ,300}.
We obtained the set P of 300 P-values P
i
. This set was
tested by KS test for uniformity on the [0, 1]. We re-
peated the whole process 10
4
times (runs) and realised
that 5.1% of KS test statistic values were located in
the 5% critical region. This observation is also in a
good agreement with the model. All performed tests
indicate that the statistical model, our implementa-
tions of the KS test and the two-sample χ
2
test are
correct.
ConstructingEmpiricalTestsofRandomness
237