new, previously unknown, information by
automatically extracting information from different
written resources” (Fan, 2006).
For our data exploration we chose to use a
combination of two visually appealing discovery
techniques, known as Formal Concept Analysis
(FCA) (Stumme, 2002) and Emergent Self
Organizing Maps (ESOM) (Ultsch, 2005). Formal
Concept Analysis (FCA) arose twenty-five years ago
as a mathematical theory (Stumme, 2002). FCA was
for the first time used as an exploratory data analysis
and knowledge enrichment technique for analysing
domestic violence cases in the Netherlands
(Poelmans, 2008). In this setup, FCA is used as a
concept generation engine, distilling formal concepts
from the unstructured documents. We complement
the knowledge discovery based on FCA with the
ESOM. Emergent Self Organizing Maps are a
special class of topographic maps, which are
particularly suited for high-dimensional data
visualization.
In this paper we aim at making the reader
familiar with how we used these tools for browsing
through the data in search of new knowledge for
classifying new cases. The result of the research is a
case labelling system that automatically and
correctly assigns the domestic violence or non-
domestic violence label to a large portion of the
incoming cases.
The rest of this paper is structured as follows. In
section 2, the dataset is discussed. In section 3, the
use of FCA and ESOM for knowledge discovery is
presented and applied to the data at hand. In section
4, the ensuing detection process is summarized.
Section 5 concludes the paper.
2 THE DATASET
According to the Dutch police authorities and the
department of Justice, domestic violence can be
characterized as serious acts of violence committed
by someone of the domestic sphere of the victim.
Violence includes all forms of physical assault. The
domestic sphere includes all partners, ex-partners,
family members, relatives and family friends of the
victim. Family friends are those persons who have a
friendly relationship with the victim and who
(regularly) meet the victim in his/her home (Van
Dijk, 1997).
The dataset we report on in this paper consists
of a selection of 4814 police reports describing a
whole range of violent incidents from the year 2007.
The domestic violence cases for that period are a
subset of this dataset. This selection came about by,
among other things, filtering from a larger set those
police reports that did not contain the reporting of a
crime by a victim, which is necessary for
establishing domestic violence. This happens, for
example, when a police officer is sent to an incident
and later on writes a report in which he/she mentions
his/her findings, while the victim has not made an
official statement to the police. The follow-up
reports referring to previous cases were also
removed from the initial set of reports. Ultimately,
this gave rise to a set of 4814 reports that were used
as input for our investigation. From these reports,
the person who reported the crime, the suspect, the
persons involved in the crime, the witnesses, the
project code and the statement made by the victim to
the police were extracted. Of the 4814 reports, 1657
were classified by police officers as domestic
violence; the others were not.
3 HUMAN-CENTERED
KNOWLEDGE DISCOVERY
In the literature, the need for exploratory data
analysis has often been described (Marchionini,
2006). When beginning the analysis of a new dataset
of which very little is known a priori, the first step is
to explore the data. Data mining should be primarily
concerned with making it easy, convenient and
practical to explore very large databases for
organizations with a lot of users but without
requiring years of training into data analysis
(Fayyad, 2002). Unfortunately, much attention and
effort has been focused on the development of data
mining techniques but only a minor effort has been
devoted to the development of tools that support the
analyst in the overall discovery task (Brachman,
1996). A human-centered approach is proposed. A
significant part of the art of data mining is the user’s
intuition with respect to the tools (Smyth, 2002). We
argue that the combined use of FCA and ESOM
fulfils this need. The visual representations of both
tools provide a clear guide to the user for exploring
the data.
Additionally, we aim at developing a classifier
for automatically classifying cases as domestic or as
non-domestic violence. Comprehensibility of the
performed classification is a key requirement,
requiring that the user understands the motivations
behind the model’s prediction (Martens, 2004). In
the domain of police investigations, the lack of
comprehensibility is a major issue and causes a
ICEIS 2009 - International Conference on Enterprise Information Systems
12