=
,
,
,
,
+
,
,
,
,
+,
(
)
,
(
)
,
()
,
()
+
,
,
,
,
(3)
As we see as a final state we will have n parts of our
evidence X file. In the best possibility, n can have
size of evidence logical file, this is case where file
was deleted but not overwritten. More often we will
get less than 512 Byte part of X file in slack space,
and other parts in an unallocated space. We take
assumption that border of left and right part of each
basic cell of the cluster model is the end of sector.
3 N-BYTE HASH
From a mathematical model we can see that standard
hashing algorithms will not work when dealing with
partially erased files. We cannot predict which part
of the file we will be able to recover, that is main
reason why reducing input data length to less than
length of file is necessary. There is algorithm
(Kornblum, 2006) which takes blocks as input
H(X
p(1-512)
), in most cases file system block has 512
Bytes but there are disadvantages of this
option(Menezes, 1996). Blocks can have other
values than 512 bytes depending on file system used,
it’s very hard to convert algorithm and correlated
hash tables to work with file systems with different
block bit length (Henson, 2003). The next
disadvantage is that we can miss a part of evidence
file in its last block because we cannot surely predict
ram slack data entry. The Ram slack we can explain
in mathematical model. in figure 1 Y file ends just in
the end of 5 block in a cluster. More likely it would
end in the middle of the block. In this case there will
be ram slack space created to the end of the block
(most of Operation Systems to deal with this
problem, makes a wiping till end of the block,
however in older MS systems it could be random
data from Random Access Memory, this is actually
why it’s called RAM slack). The third disadvantage
of using block input is performance. Taking 512
Bytes blocks force us to make hash for every byte on
hard disk. Hashing every byte on disk is essential
when we use hash function to preserve evidence, this
is one of the most important item in creating chain
of custody. However in our apply it is unnecessary
and not efficient. Solution performance is depending
on two main factors, the first is number of I/O
operations on hard drive. Hard disk read/write
operations, and interface for connecting drives is
still bottleneck in computers. The second
performance factor is computation time of hash
function. Cryptographic hash functions are designed
to be fast in both hardware and software
implementation, but it is obvious that they have
impact on computer performance. That is why we
focus on n<512 versions of block hashing. In
computer forensics there are widely used two
cryptographic hash function MD5 (Ronald Rivest,
Message-Digest algorithm 5) with a 128-bit hash
value and SHA-1 designed by the National Security
Agency (NSA) which creates a 160 bit message
output based on principles similar to those used in
MD5. In this research we will focus on Message
Digest algorithm (White, 2005).
The MD5 cryptographic function algorithm first
divides the data input into 512 bits blocks (Menezes,
1996). At the end of the last block 64 Bits are
inserted to record the length of the original input. If
input is smaller, bit value is filled with 0 to 448 bit
block. Padding function is performed in the
following way: a single "1" bit is appended to the
message, and then "0" bits are appended so that the
length in bits of the padded message becomes
congruent to 448, modulo 512.
This effects minimum input of N-Byte sector
hashing considered 448 bits (56 bytes) for one full
round of algorithm. And that is why we choose 56
bytes as input length in our algorithm. We
considered also 120 Bytes input (two full round of
md5). Standard full block input will have 512 Bytes
of data + 64bits of length record + 448 bit padding,
this results 576 Bytes MD5 input (9 full rounds).
Creating the hash tables compatible with
presented algorithm should take into consideration
that the same records can be ascribed to several
different files. And that there will be several hash
records to each file depending on its length. This
characteristic is described wider in research
implementation section.
4 PRACTICAL RESEARCH
IMPLEMENTATION
We have implemented function h(X
p(1...n)
) based on
Massage Digest 5 cryptographic hash function
algorithm. We have performed several tests using
the same software and hardware environment with n
equal 48, 112 and 512. Tests were carried out to
show efficiency of each method. Tests where
repeated 20 times to determine and reduce error rate.
EFFICIENT N-BYTE SLACK SPACE HASHING IN RETRIEVING AND IDENTIFYING PARTIALLY RECOVERED
DATA
311