analysis which in the past could have taken months
of physical surveillance. It also gives unprecedented
access to someone’s innermost thoughts from the con-
tent of conversations, or search histories. If policing
is to use this ability, it is vital it does so responsi-
bly and sensitive to the ethical issues that arise. As
well as new investigative opportunities, advances in
technology offer opportunities to expand DF services.
Rapid growth in cloud services will allow us to sim-
plify and rationalise DF data storage. These same
cloud services allow investigations access to more
processing power, to harness the power of automation
and explore the potential of new and evolving tech-
nologies such as machine learning.”
Until now, forensic practitioners are very hesitant to
rely to much on ML/AI in trustworthy forensic pro-
cesses. The reasons lie on one hand in issues of accu-
racy and proficiency and on the other hand in concerns
regarding explainability and interpretability.
The accuracy of such methods is discussed in
(UNICRI and INTERPOL, 2023) as: “Accuracy
corresponds to the degree to which an AI system
can make correct predictions, recommendations or
decisions. It is important that agencies verify that
any system they are developing and/or intend to use is
highly accurate, as using inaccurate AI systems can
result in various types of harm.” [...] The accuracy of
an AI system is dependent on the way the system was
developed, and in particular the data that was used
to train it. In fact, training the system with sufficient
and good quality data is paramount to building a
good AI model. [...] In most cases, it is preferable
that the training data relates to the same or a similar
context as the one where the AI system will be used.”
The definitions for explainability and inter-
pretability have already been discussed in Section 1
above. As response to the issue of explainability re-
quirements, (UNICRI and INTERPOL, 2023) points
toward the research field of ‘explainable AI’, which
“[...] aims to ensure that even when humans cannot
understand ‘how’ an AI system has reached an out-
put, they can at least understand ‘why’ it has pro-
duced that specific output. This field distinguishes
explainability in a narrow sense, as different from
interpretability. [...] In the context of criminal in-
vestigations, the explainability of AI systems used to
obtain or analyze evidence is particularly important.
In fact, in some jurisdictions, criminal evidence ob-
tained with the support of AI systems has been chal-
lenged in courts on the basis of a lack of understand-
ing of the way the systems function. While the re-
quirements for evidence admissibility are different in
each jurisdiction, a sufficient degree of explainability
needs to be ensured for any AI system used to obtain
and examine criminal evidence. This helps guaran-
teeing, alongside the necessary technical competen-
cies, that law enforcement officers involved in investi-
gations and forensic examinations have sufficient un-
derstanding of the AI systems used to be able to as-
certain and demonstrate the validity and integrity of
criminal evidence in the context of criminal proceed-
ings.”
2.2 Multi-Class Steganalysis with
Stegdetect
In their seminal paper, (Provos and Honeyman, 2002)
criticise the current state-of-the-art in steganalysis ap-
proaches at the point of time of their publication in
2002 as being practically irrelevant, due to faulty ba-
sic assumptions (modelling as a two-class problem
and statistical over-fitting to the training sets). In con-
trast to these publications Provos and Honeyman con-
struct a multi-class pattern recognition based image
steganalysis detector called Stegdetect: Each input
image for Stegdetect is considered to be member of
one of four classes, either it is an unmodified cover or
it is the result of the application of one out of three dif-
ferent steganographic tools (JSteg, JPHide and Out-
Guess 0.13b) which have been amongst the state-of-
the-art at this point of time. Stegdetect is then ap-
plied blindly (without knowledge about the true class)
to two million images downloaded from eBay auc-
tions and one million images obtained from USENET
archives. As a result, Stegdetect classifies over 1% of
all images seem to have been steganographically al-
tered (mostly by JPHide) and therefore contain hidden
messages. Based on these findings, the authors de-
scribe in (Provos and Honeyman, 2002) also a second
tool called Stegbreak for plausibility considerations,
i.e., for verifying the existence of messages hidden by
JPHide in the images identified by Stegdetect. Their
verification approach is based on the assumption that
at least some of the passwords used as embedding
key for the steganographic embedding are weak pass-
words. Based on this assumption, they implement for
Stegbreak a dictionary attack using JPHide’s retrieval
function and large (about 1,800,000 words) multi-
language dictionaries. This attack is applied to all im-
ages that have been flagged as stego objects by the sta-
tistical analyses in Stegdetect. To verify the correct-
ness of their tools, Provos and Honeyman insert tracer
images into every Stegbreak job. As expected the dic-
tionary attack finds the correct passwords for these
tracer images. However, they do not find any single
genuine hidden message. Even though the result of
this large scale investigation is negative, the method-
Explainability and Interpretability for Media Forensic Methods: Illustrated on the Example of the Steganalysis Tool Stegdetect
587