matching ground truth images. However, it is well-
known that that such an approach will often fail
under conditions where the required segmentation is
related to the meaning of the segmented image.
Indeed, if a reliable unsupervised objective method
had existed then it would have formed the basis of
one of the best (if not the best) image segmentation
algorithms to date. That is not the case. Hence,
researchers still utilize either subjective evaluation,
which requires ground truth or supervised objective
evaluation, which also requires ground truth. Our
method comes under supervised means of objective
evaluation, and it is assessed – as it ought to be – by
subjective visual inspection.
2 METHODOLOGY
ISAT assesses the quality of segmentation of any
image. To do so, ISAT does not require the original
image, but two other images representing the ideal
and actual segmentation of the original image. As a
matter of terminology, the ideally segmented image,
which is usually drawn by hand, is called the ground
truth image (or GT). The other image represents the
result of a segmentation procedure, which is usually
executed by machine, and is called the Machine
Segmented image (or MS). Both of these images are
binary images, in that they exhibit the boundaries of
the segmented regions as black curves on a white
background. In all following calculations, it is the
GT that functions as a reference of presumed truth
against which a MS image is judged.
To carry out any kind of segmentation quality
assessment, connected regions in both GT and MS
images must be established then, crucially, every
region in GT must – if possible – be matched with
one or more regions in MS. Note that one region in
GT may match one region in MS; that region in GT
would then be correctly segmented if the overlap
between the two regions is great enough or missed if
the overlap is insufficient. Also, more than one
region in GT may be matched with one region in
MS; that region in GT would be under-segmented.
On the other hand, multiple regions in MS may
correspond to one region in GT; that region in GT
would then be over-segmented. Finally, every region
that exists in MS but does not correspond to any
region(s) in GT is considered noise. Region-based
accuracy is calculated as a ratio of the number of
correctly matched regions in MS to the sum of all
the regions in GT, plus the number of noise regions
(which come from MS). All of the above measures
were based on equivalent measures proposed by
Hoover et al. (Hoover, 1996).
As such, an ideally segmented image, from a
region-based perspective, entails that every region in
GT is exclusively matched with exactly one region
in MS, with zero noise (i.e., unmatched regions in
MS). And in fact, ISAT will return a region-based
accuracy of 100%, for this case. Note that matching
requires an overlap between the two matched
regions exceeding a pre-set threshold, which we
currently set to 66% and should not be set to 100%.
This ensures that the number of correctly segmented
regions reflects human conceptions of region-based
segmentation, where the number of approximately
matched regions (e.g., red blood cells) matter more
than the precise fit of every matched region (e.g.,
one blood cell).
Once region identification in both GT and MS is
completed, and matching of regions between GT and
MS is done, it is possible to compute all region-
based segmentation quality measures. But also, this
makes it possible to compute the other set of pixel-
based segmentation quality measures. These
measures sound familiar, but they are applied
differently than the well-known True Positive, False
Negative, True Negative and False Positive
measures used in innumerable studies in image
processing (Bushberg, 2002). We will describe the
final pixel-based measures here intuitively, as the
following sub-sections describe all the measures, in
full detail. In brief, the final pixel-based measures
provide a normalized image-wide quantitative
assessment of the quality of the fit between the
regions of GT and those they were matched with in
MS. As such, our sensitivity is the percentage of
pixels of regions of GT that were matched with
regions in MS. Specificity is the percentage of pixels
of the backgrounds of the various regions in GT that
were in fact assigned to backgrounds of the
matching regions in MS. We define the background
of a region as those pixels that belong to the image
but not to that region, and we exclude the pixels of
the edges between regions from all calculations.
An ideally segmented image, from a pixel-based
perspective is similar to an ideally segmented image,
from a region-based point of view, but for one
exception. Using the red blood cells example, every
blood cell boundary in the MS image must fit
perfectly the boundary of every corresponding blood
cell in the GT image; any deviation no matter how
small will reduce either sensitivity or specificity and
hence the overall pixel-based measure of accuracy,
which is a weighted average of the two.
AnImageSegmentationAssessmentToolISAT1.0
437