Efficient processing and data analysis, using
video analytics, text mining, data mining and
distributed “smart edge” video processing;
Unified data representation, storage and access,
using Semantic Web technologies for unified
representation and access;
CCTV operator and intelligence analyst
decision support, using data visualisation and
geo-localisation, data analytics and production
rule engines.
Components in each of the three areas are
implemented as independent software systems. The
system components are loosely coupled and
communicate using SOAP and RESTful Web
Services depending on suitability and security
requirements. The architecture, design and
validation of the MOSAIC system is supported by
the UI-REF Framework which has provides the basis
for methodologically-guided holistic requirements
elicitation and prioritisation including the resolution
of socio-ethical, legal and security requirements
(Badii, 2008). This has underpinned the design of all
use-cases as well as data representation, modelling
and validation of system performance. Figure 1
depicts a summary overview of the individual
components of the MOSAIC system.
MOSAIC uses text and data mining to integrate
realistic example police information databases
containing structured information (e.g. databases of
criminal convictions) as well as ones containing
unstructured information (e.g. full-text police officer
notes describing observations made), and MOSAIC
uses video analytics to identify trajectories and
events in CCTV security video camera footage. A
central data storage component represents data
extracted using all of these methods using Semantic
Web standards. A criminal social network analysis
tool, an advanced geospatial visualisation system
and data-driven decision support functionalities
support system users and can be used to suggest
specific actions to the system users where
appropriate.
3 TEXT AND DATA MINING
3.1 Text Mining
The Text Mining (TM) component identifies
relevant knowledge from sanitised Police reports
and from sanitised free text fields in crime-related
databases, by detecting entities and entity
relationships. Named Entity Recognition (NER) and
TM are applied through a pipeline of linguistic and
semantic processors that share a common knowledge
base with crime patterns, abbreviations, police
terminology, acronyms and jargon. The shared
knowledge base guarantees a uniform interpretation
layer for the diverse information from different
sources.
The automatic linguistic analysis of textual
documents is based on morpho-syntactic, semantic,
Semantic Role Labelling (SRL) and NER criteria.
At the heart of the TM system is McCord’s theory of
Slot Grammar (McCord, 1980; McCord, 1990). The
system analyses each sentence, cycling through all
its possible constructions and trying to assign the
context-appropriate meaning – the “right” sense – to
each word. Each slot structure can be partially or
fully instantiated and can be filled with
representations from one or more statements to
incrementally build the meaning of a statement.
This includes most of the treatment of coordination,
which uses a method of “factoring out” unfilled slots
from elliptical coordinated phrases. The parser – a
bottom-up chart parser – employs a parse evaluation
scheme for pruning away unlikely analyses during
parsing, as well as for ranking final analyses, which
incrementally builds a syntactical tree. By including
the semantic information directly in dependency
grammar structures, the system relies on semantic
information combined with semantic role
relationships (agent, object, where, when, how,
cause, etc.). The Word Sense Disambiguation
(WSD) algorithm also considers possible super-
subordinate-related concepts in order to find the
appropriate senses in lemmas being analysed.
Appropriate heuristics have been implemented
for MOSAIC in order to identify specific
pre/suffixes, linguistic patterns and data formats for
the English language so as to recognise key entities
in text: dates, addresses, person names, locations,
licence plate numbers, brands and manufacturer
names, web entities, bank accounts and phone
numbers. Once entities and semantic roles have
been retrieved, the TM component then extracts
entity relationships from the text. Two entities that
are linked by a direct relationship such as agent-
object/agent will have a very strong bond. Entities
that have a relationship of proximity are also
extracted. This type of approach allows police
analysts to discover important relationships between
entities, even though these are not linked by a
syntactic dependency, for instance a person’s name
followed by a phone number in parentheses –
proximity in the sentence – or a date and the author
MOSAIC-MultimodalAnalyticsfortheProtectionofCriticalAssets
313