4 NEW ANALYSIS
In this part, we combine the data mining algo-
rithm with cryptanalysis of RC4. We imple-
mented this idea via one of the data mining tools;
namely, WEKA, which could be found in (Com-
puter Science Department of University of Waikato,
http://www.cs.waikato.ac.nz/ml/weka/). For data
preparation we gathered our data based on the weak-
ness of IVs in RC4 (S. R. Fluhrer and Shamir, 2001).
To implement this, we make little changes in RC4 al-
gorithm. These changes are mentioned in the next
section.
4.1 Data Preparation
In the real world the key entered by the user is con-
verted into ASCII code. Because of this property, we
chose our keys from 32 to 126 of ASCII codes which
are more normal in the real world and could be found
at (http://www.asciitable.com).
As seen in Figure 2, we added the zeroth, first, sec-
ond and third permutation to our data, based on the
idea of FMS. Because of the need for more realistic
results, we used the Dot Net (.Net) random function
to generate more random keys. We concatenated IVs
to the key as it happens in real transmission. These
IVs can be obtained easily from the packet because, as
we explained in previous sections, IVs do not encrypt
in the header. We also chose the keys with a 5-byte
length to be more realistic, as done in making ad hoc
networks in Windows XP. Another change we made
was to eliminate the XOR part from PRGA which is
line number 14 in Figure 1. This is done for several
reasons. First: the plaintext does not play any role
except in the XOR part. The second and the most im-
portant reason is that many attacks are made based on
the fixed or known content of some part of the cypher
such as Snap Field, or known plaintext such as email
content.
For the analysis of RC4, it is convenient to replace
the original algorithm that works on bytes (Z/256Z)
with Z/64Z. We did this because of hardware limita-
tions. After all, the pseudo-code which is used for
data generation is shown in Figure 2. On a large
scale and as a complementary, we tested the idea on
limited samples in module 256. As shown in Fig-
ure 3, we tested 4000 different seeds (line number
1). These seeds (line number 8) are composed of
fixed IVs (3, 63, 1) based on FMS which are concate-
nated to ASCII codes of different keys. These keys
are generated by random function of Dot Net (.NET).
As shown in Figure 2, we produced five-byte keys
through lines 3 to 7.
The word ’save’ in Figure 2 shows that the data in
front of ’save’ are saved so that they are used later in
data mining input file. As shown in Figure 2, line 8
saves the seeds, and the seed in Figure 3 is shown by
key (@attribute key). And lines 11 to 35 are com-
posed of two parts: KSA and PRGA. In KSA, we
saved the first, second, third and 4th permutations
(line 19 to 22) named atb0, atb1, atb2, and atb3, re-
spectively. PRGA output, before XORing by plain-
text, is saved in line 33 and is shown in Figure 3 by
PRGAout (@attribute PRGAout).
The output of our program in Figure 2 is converted
into the format expected by WEKA. The input data of
WEKA is shown in Figure 3. The part under ’@data’
is our data produced by the program in Figure 2. As
an example, we just showed two out of 256000 rows.
64 (our data in module 64) multiplied by 4000
(seeds) equals 256000. This is the total number of
the rows of our data generated by program in Figure
2.
The data were given to WEKA and we used the
supervised learning and classified the data. The Key
attribute is our target which means the software and
algorithm should predict the key value based on other
attributes. We used algorithm J48 for our classifica-
tion. The WEKA classifier package has its own ver-
sion of C4.5 known as J48. We used 256000 rows of
data for training and other 256000 rows (of data) for
testing in WEKA. These two training and testing files
had different data but in the same format. By adding
up these two sets, we had 512000 rows of data com-
posed of 8000 seeds.
@relation ’RC4-weka.filters.unsupervised.
attribute.Remove-R1,6-7-weka.filters.
unsupervised.attribute.
NumericToNominal-Rfirst-last’
@attribute atb0 {0,1,2,3,4,5,6,7,,61,62,63}
@attribute atb1 {0,1,2,3,4,5,6,7,,61,62,63}
@attribute atb2 {0,1,2,3,4,5,6,7,,61,62,63}
@attribute atb3 {0,1,2,3,4,5,6,7,,61,62,63}
@attribute key {1,3,32,33,34,...,123,124,125}
@attribute prgaout {0,1,2,3,4,5,6,7,. . .,60,
61,62,63}
@data
3,3,3,3,3,60
1,0,0,0,63,48
., ., ., ., .
Figure 3: Input data for WEKA.
4.2 Results of WEKA
These are the results (outputs) of WEKA:
SECRYPT 2009 - International Conference on Security and Cryptography
216