header with those of the observer. The correlation
between the TCP timestamps and the measured
machine’s time is based on a linear programming
technique, using the Graham’s convex hull
algorithm, for variable network delay renders simple
linear regression insufficient (Graham, 1972; Moon
et al., 1999). For the experiment, the authors tested
the algorithm using 69 machines in a campus
computer laboratory and ran the measurement for 38
days, computing clock skews on 12 and 24 hours
intervals. The experiment succeeded in
demonstrating the validity of this approach, for the
clock skew estimates for any given machine are
approximately constant over time and different
machines have detectably different clock skews.
The work of these authors has been commented
by Fink (Fink, 2007) who, conversely, treats the
problem as one of statistics and regression rather
than linear programming and optimization. Though
similar, the author proposes a solution for computing
the sample size required to produce a clock skew
that is within a fixed margin of the true population
clock skew. The sample size formula has been
further validated introducing network delays and
analysing correlations with hardware characteristics.
Also (Polčák and Franková, 2014) explores
remote computer identification based on the
estimation of clock skew computed from network
packets, but measurements were difficult to take, as
they needed to analyze network traffic, and required
an external reference time to compare with. Salo
(Salo, 2007) proposed a solution to this problem by
comparing two different clocks: the one used by the
CPU and the independent one used to maintain the
internal timer. The proposed methodology required a
long execution time to generate fingerprint. In
(Sanchez-Rola et al., 2018), authors look at code
execution time as a way to precisely identify
different devices, considering that the time that a
computer spends to execute an instruction depends
on how many clock cycles the instruction requires,
and on the duration of each cycle.
3 METHOD PROPOSAL
Our underlying idea is to observe timestamps taken
from the internal clock and computing the offset of a
timestamp in reference to the previous one. The
assumption is that, if the clock shows a constant
skew, then a regular pattern in clock signals is
present.
The main differences between this proposal and
that present in literature are:
The skew is locally computed. That is, it is not
computed by a fingerprinter through a remote
observation of the fingerprint. Conversely, it is
computed by a server-side program, observing
the timestamps of the internal clock of the host
on which the program is running.
The granularity of the timestamp is the
microsecond, in order to bring out the subtle
differences in clock signals. In our server-side
implementation, we adopted the microtime PHP
function (Sklar and Trachtenberg, 2003).
The skew is quickly computed and does not
require a long-running observation of the
fingerprint. To do so, the algorithm uses a CPU-
intensive cycle, which takes few seconds to be
completed. The number of cycles is a parameter
that has been tuned in an empirical way. We
discuss about the number of CPU-cycles in the
next implementation sub-section.
The skew is not computed by comparing the
timestamps with those derived from another
source (i.e., the observer or a NTP server) but it
is self-referential. Therefore, the fingerprint is
autonomously computed.
Let T
i
be the timestamp observed at the i-th CPU-
cycle, for i = 0,…, n
∈.
We define
offset
i
= T
i
T
i-1
, for i = 1,…, n, and
O = { offset
i
: i ∈ {1,…, n} where n ∈ }.
Here, an offset is the incremental step in clock
signals, that is, the difference between sequential
timestamps. Of course, this step can present different
values, for the observation process is strongly
affected by other processes that may slow down the
system and cause abnormal delays (i.e., outliers) in
observations. However, if we consider an arbitrarily
high value for n, the incremental steps converge to a
stable offset.
Let
D = {x
i
| x
i
∈O and x
i
≠ x
j
if i≠j, 1≤ i, j ≤ m, m ∈}
be the set of the distinct values of the observed
offsets. Of course, m ≤ n because the cardinality of
D is usually lower than that of O, for numerous
repeated offset can be observed.
So, let f(x
i
) be the frequency of x
i
or the number
of times that x
i
appears in O. Finally, given the host
H, the fingerprint of H is defined as
fp
H
= k, such that k ∈D and f (k) = max(f (x
i
)).
For the sake of simplicity, the fingerprint of H is that
offset having the highest frequency.