Clearly, some items will be more important in
the framework than others. Importance might be
due to how accurately it measures the framework
construct, the authority of the intelligence (e.g. a
formal judgement carries more weight than a data
analysis) and its “real world” value (e.g. mortality
measures should outweigh bureaucratic process
measures). Items are weighted accordingly as part
of the mapping process, and can even be given a
“super-weighting” that allows them to “trump”
everything else in group.
2.3 Analysing Information
2.3.1 Item Level
The analysis process brings together many different
types of data. For each item of information, we
assess the difference between the observed result for
a particular organisation and an expected level of
performance on a common scale using the most
appropriate analysis for that item. The outcome of
this analysis is an “oddness” score which is a
statistical measure of how far each organisation’s
performance is from the expected level for that
measure. None of our methods penalise (or reward)
organisations simply for being at the bottom (or top)
of a list - they are designed to look for genuine
differences from our expectation. It is entirely
possible that all organisations will be performing
similarly to expectation on a data item.
We make a number of “stock” analysis methods
available to the user, who will be heavily guided by
information type towards an appropriate choice.
The system will also suggest analysis settings based
on characteristics of the data.
Our analysis methods are tailored to data type
(proportions, ratios etc) and take account of the
possibility that an organisation’s results may be
affected by chance variation. To do this we use a
modified Z score (Spiegelhalter, 2005).
The expected level of performance against which
a organisation is compared can be calculated in
several ways. For some items, organisations are
compared against the national average of all
organisations. In other cases - such as waiting times
for example - an expected level of performance has
been set down for organisations in government
policies. For some data items we recognise that
organisations’ performance may be significantly
influenced by factors beyond their control. There are
two main ways we adjust for this. Either the ‘raw’
data are standardised (for example by age and sex)
before import or we may set our expectation for that
organisation as the average performance of a group
of other organisations with similar local
circumstances (referred to as the ‘benchmark
group’). We use various benchmarking groups in
our analysis, including deprivation, population
turnover and disease prevalence.
Where data are categorical, we achieve analysis
results on the common risk scale by assuming an
underlying normal distribution in the frequency data
and assessing distance between each observation and
the expectation (either an imposed target or set as
the ordinal category that contains the median
observation).
Our free text comments are scored by analysts as
discussed in section 2.1.4. These factors are then
translated into a score that is nominally equivalent to
the scores on the common risk scale.
2.3.2 Pattern Detection at Group Level
For organisations whose performance over a range
of items appears to be “oddly” poor, we infer that
there may be a risk of failure against the given
framework. However, there are many reasons why
an organisation that raises concerns in our analysis
might be found legitimately to be compliant by
inspection. The organisation will have access to
much better local sources of evidence than are
available to the Commission at a national level for
risk assessment. They will also have the benefit of
the most up-to-date information. It might also be
that, while the organisation is not performing well
compared with other organisations, they are still
meeting the minimum needed for acceptable
performance against the framework.
For each item of information, we assess whether
the organisation’s result was in line with what we
would expect, as outlined in section 2.3.1 above.
The results for all items mapped to an item group
(including qualitative information) are then
aggregated together. This produces an overall group
“oddness” score that is directly comparable to the
item oddness.
Our main method of combining the results from
each item of information is not to calculate a simple
average, but instead enables us to highlight patterns
of poor performance. For example, an item group
may be assessed as being at high risk where several
items of information are worse or tending towards
worse than expected, but none exceed the threshold
to be notable in their own right.
When combining this volume of information,
rules-based or directly weighted aggregation models
that finely balance every item against each other
become unsustainably complex. Our model uses
broad weights discussed in section 2.2 and then
automatically avoids double counting by adjusting
for the degree of auto-correlation within the item
AUTOMATED RISK DETECTION - What are the Key Elements Needed to Create a Multi-source, Pattern-based Risk
Detection System?
395