Clupea Harengus: Intraspecies Distinction using Curvature Scale Space
and Shapelets
Classification of North-sea and Thames Herring using Boundary Contour
of Sagittal Otoliths
James Mapp
1
, Mark Fisher
1
, Anthony Bagnall
1
Jason Lines
1
, Sally Songer
2
and Joe Scutt Phillips
2
1
School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, England, U.K.
2
CEFAS Laboratory, Pakefield Road Lowestoft, Suffolk NR33 0HT, U.K.
Keywords:
CSS, Curvature, Scale-space, Shapelets, Otolith, Intraspecies, Classification, Random-forrest, Image-
processing, LOOCV, Cross-validation.
Abstract:
We present a study comparing Curvature Scale Space (CSS) representation with Shapelet transformed data
with a view to discriminating between sagittal otoliths of North-Sea and Thames Herring using otolith bound-
ary and boundary metrics. CSS transformed boundaries combined with measures of their circularity, eccen-
tricity and aspect-ratio are used to classify using nearest-neighbour selections with distance being computed
using CSS matching methods. Shapelet transformed data are classified using a number of techniques (Nearest-
Neighbour, Naive-Bayes, C4.5, Support Vector Machines, Random and Rotation Forest) and compared to CSS
classification results. Both methods use Leave One Out Cross Validation (LOOCV). We describe the method
of encoding and the matching algorithm used during CSS classification and give an overview of Shapelet
transforms and the classifiers that are used on the data. It is shown that whilst CSS forms part of the MPEG-7
standard and performs better than random selection, it can be significantly out-performed by recent additions
to machine-learning methods in this application. Shapelets also show that with regard to intra-species dis-
tinction, the discriminatory features of otolith boundaries may lie not in the major inflection points, but the
boundary points between them.
1 INTRODUCTION
Otoliths are calcium carbonate structures present in
many vertebrates and found within the sacculus of the
pars inferior. Whilst there are three types of otoliths;
sagittae, lapilli and asterisci, found in each chamber
of fishes, it is primarily the sagittal otoliths that are
used in studies as they are larger and thus easier to
prepare and observe. They vary markedly in shape
and size between species, but are of similar shape
compared to other stocks of the same species (Fig-
ure 1). Otoliths hold information that can be used by
‘expert readers’ to determine several key factors.
Some of these determinations however are harder
to distinguish and (arguably) more critical to fisheries
scientists or fisheries authorities (Begg et al., 2005).
Such management requires that stocks be accurately
determined (Stransky, 2005) so that decisions on their
management can be made. Analysis of otolith bound-
aries may allow estimation of stock composition, de-
termining whether the samples obtained from an area
or areas are all in fact from one stock, or from mul-
tiple stocks mixed together (Duarte-Neto et al., 2008;
Campana and Casselman, 1993; DeVries et al., 2002).
Figure 1: Otoliths from North-Sea Herring (a), Thames
Herring (b) and two distinct populations of Plaice (c and
d).
138
Mapp J., Fisher M., Bagnall A., Lines J., Warne S. and Scutt Phillips J. (2013).
Clupea Harengus: Intraspecies Distinction using Curvature Scale Space and Shapelets - Classification of North-sea and Thames Herring using
Boundary Contour of Sagittal Otoliths.
In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, pages 138-143
DOI: 10.5220/0004226101380143
Copyright
c
SciTePress
Currently for more advanced classification tasks
such as age determination or stock identification,
otoliths samples are often prepared by segmenting
and polishing (Begg et al., 2001; Bolle et al., 2004).
Our study focuses on whether fish from two stocks
(North-Sea and Thames) of the same species (Her-
ring) can be classified using otolith boundaries alone.
Collections of otolith specimens were prepared by
the Centre for Environment, Fisheries and Aquacul-
ture Science (CEFAS). The set of images includes
populations of Herring from two distinct stocks,
North-Sea and Thames Herring and have been la-
belled by CEFAS. The set also includes a number of
images of Plaice otoliths which we use when testing
the system. The captured images were received with
no information other than the population to which
they belong.
This paper compares two methods of image
boundary representation: Curvature scale space, part
of the MPEG-7 standard (MPEG-7, 2003); and
Shapelets, a new method first proposed by Ea-
monn Keogh’s research group (Ye and Keogh, 2009)
and recently extended by UEAs machine learning
group (Lines et al., 2012). The CSS-transformed data
are classified using the standard CSS matching al-
gorithm, whereas the Shapelet data are classified us-
ing a number of methods; Nearest-Neighbour, Naive-
Bayes, C4.5, Support Vector Machines (linear and
Quadratic), Random Forest and Rotation Forest.
2 SHAPE REPRESENTATION
2.1 Image Segmentation and Boundary
Extraction
Before encoding can be implemented the boundary
of each otolith must be identified. Images are seg-
mented using simple supervised thresholding scripted
in MATLAB R2010b (MATLAB, 2010). The bound-
ary is then determined by performing a logical ‘OR’
function between the denoised mask and a morpho-
logical dilation of the mask using a square 3x3 struc-
turing element. Each boundary is encoded as a set of
boundary pixel coordinate pairs and stored in a data
structure, with elements representing the coordinate
arrays, image path, and the class. The upper-leftmost
point/pixel on the boundary is used as the start point
and coordinates are extracted in a counter-clockwise
direction in all cases. Boundaries are normalised so
that the origin is located at the boundary centroid, and
subsampled to five-hundred equally spaced boundary
points.
2.2 CSS Image Construction
Curvature Scale-Space (CSS) (Mokhtarian and Mack-
worth, 1992) forms the basis for contour-based shape
descriptors within MPEG-7 (Bober, 2001; Zhang and
Lu, 2003b; Zhang and Lu, 2003a). As such it forms
an ideal starting point for boundary based shape clas-
sification of otoliths and has been used for several
other studies in this field (Parisi-Baradad et al., 2005;
Abbasi et al., 1999; Jalba et al., 2006). Research has
shown that CSS encoding and its attributed matching
algorithm can be an effective and robust (to noise,
scale, rotation) method of matching query images
to database models when combined with global pa-
rameters such as Circularity and Eccentricity (Abbasi
et al., 1999; Amanatiadis et al., 2011).
In addition to intraspecies distinction, we also dis-
criminate between two species of fish (Herring and
Plaice) and on classes from the MPEG-7 SHAPE
database (MPEG7, 1999) with a view to benchmark-
ing our CSS implementation. The SHAPE database
contains 20 images per class for 70 classes, and was
developed for testing MPEG-7 shape descriptors. For
use in our study we remove the ‘device’ classes as
they hold deliberate within class variance. We also
exclude the ‘spring’ class as the shape itself causes
problems when creating CSS data.
The CSS images are constructed using the proce-
dure laid out in previous work (Abbasi et al., 1999).
The process involves iteratively smoothing the origi-
nal boundary curve of the image using a 1-D Gaussian
function of increasing kernel width and, after each it-
eration, computing the curvature of the boundary and
where the curvature ‘crosses’ zero. This process con-
tinues until there are no remaining zero-crossings on
the boundary, and stored as coordinate pairs (maxima)
representing the points along the boundary and the it-
eration or “evolution” at which zero-crossings con-
verge.
As in previous work we initiated the process us-
ing a kernel width (σ) of 1, increasing by 0.1 after
each evolution, and compute a small number of global
metrics to assist in the matching process (Mokhtarian
et al., 1996; Abbasi et al., 1999): Circularity and Ec-
centricity (of the boundary); and Aspect-Ratio (of the
CSS image created during the process).
2.3 Time Series Shapelets
A Shapelet is a time series data mining primitive
that can be used to measure similarity between series
based on small common shapes that occur at any point
in the series. The original work on Shapelets was pre-
sented in (Ye and Keogh, 2009), where a recursive
ClupeaHarengus:IntraspeciesDistinctionusingCurvatureScaleSpaceandShapelets-ClassificationofNorth-seaand
ThamesHerringusingBoundaryContourofSagittalOtoliths
139
decision tree algorithm is presented using Shapelets
as the branching criteria. Of particular interest is the
work in (Lines et al., 2012), where it is proposed
that Shapelets can be used to construct a filter for
transforming time series data. Transforming data in
this manner moves away from the previous emphasis
of tree-based classification, allowing any traditional
classification algorithm to be used with Shapelets.
We use the Java implementation of (Lines et al.,
2012) to build a filter for transforming the Herring
boundary data. The implementation utilises infor-
mation gain to test the discriminatory power of a
Shapelet. To initialise the filter, three parameters are
required: minimum/maximum Shapelet lengths, and
the number of Shapelets to be extracted. We ex-
tracted the best 100 Shapelets between the lengths of
40 and 120 from the dataset for the transformation
process. This transformation is then carried out for
each boundary by calculating the distance from the
series to each of the extracted Shapelets, and each dis-
tance is used as an attribute in the transformed data.
3 CLASSIFICATION
3.1 CSS Classification
With each boundary stored as the locations of its
curvature-crossing maxima, images can be compared
to one another using previously defined methods (Ab-
basi et al., 1999; Mokhtarian et al., 1996). This re-
sults in a measure of similarity between the two sets
of maxima and can be used to compare one image
to a number of models in a database and find the
model with the greatest similarity. We are able to
dismiss a number of models by comparing the CSS
image/model aspect-ratios as well as the circularity
and eccentricity metrics, dismissing as dissimilar any
pairs with metric ratios greater than a given thresh-
old (T).
Tests are performed using multiple values for T
on both the Herring and SHAPE datasets. Initially
it was thought that T may be pattern independent.
The results show that optimum T values are signifi-
cantly different between the two data sets, and use of
one on the other set would result in significant under-
performance. As in (Abbasi et al., 1999) we set al-
lignments for all maxima 80% of the largest evolu-
tion level, however we use a more generous matching
distance at 40% of maximum distance.
CSS classification is performed using Nearest-
Neighbour (NN) selection, with model distances
equal to their dissimilarity. The set of images is
tested using LOOCV and results returned as a per-
centage of selections that resulted in the correct class.
Classification using the same methods are carried out
on images from the MPEG-7 SHAPE database to
benchmark our CSS implementation. For compari-
son we also process otoliths of Plaice (Pleuronectes
platessa) alongside Herring otoliths (HvP) to deter-
mine whether inter-species distinction is possible us-
ing the same method.
3.2 Boundary and Shapelet
Classification
We use LOOCV when classifying both the Uni-
variate Boundaries (UV-Bs) and Shapelet transform
thereof, using a variety of available classifiers so that
results of transformed and non-transformed bound-
aries can be compared. CSS maxima (as Boundary-
Point/Evolution pairs) are tested using the same clas-
sifiers in addition to using the standard CSS matching
procedure in order to evaluate the matching method
itself. Two datasets are created for this testing; CSSa
(maxima sorted by distance along the boundary) and
CSSb (by evolution magnitude). The classifiers used
on these sets are implemented in the Java Machine
Learning toolkit WEKA (Hall et al., 2009) and are;
1-Nearest Neighbour with Euclidean distance (NN),
Naive Bayes (NB), C4.5 Decision Tree (C4.5), Sup-
port Vector Machines with a linear/quadratic ker-
nels (SVML/SVMQ), Random Forest ensemble with
100 base classifiers (RaF), Rotation Forest ensemble
with 30 base classifiers (RoF), 1-Nearest Neighbour
with dynamic time warping distance, performed on
UV-OL data only (NNDTW).
4 RESULTS
4.1 CSS Classification
Results from classification using the CSS matching
implementation can be seen in Figures 2 and 3. The
figures show results given eccentricity, circularity and
aspect-ratio thresholds in the range 0 to 0.40. It can be
seen that the thresholds returning peak performance in
these cases are significantly different to one another,
and use of one problem’s optimised threshold value
for the other would result in under-performance.
In all cases the inclusion of a threshold for the
global parameters improve, or at least do not hinder,
accuracy. Table 1 shows results of the two classifica-
tions using no threshold (where T=1), and when us-
ing the peak performance global threshold (value for
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
140
Figure 2: Graph showing results of NorthSea/Thames Her-
ring classification using 1, 3 and 5-NN over multiple thresh-
old values.
Figure 3: Graph showing results of SHAPE database image
classification using 1, 3 and 5-NN over multiple threshold
values.
each case given in table). Alongside are shown results
when classifying Herring/Plaice otoliths (interspecies
distinction) both with and without peak values for T.
Table 2 shows sensitivity and specificity (or Sen-
sitivity NS and Sensitivity Th) of the CSS matching
algorithm. The figures show that the CSS matching
technique is generally more sensitive to Thames Her-
ring otoliths than to North Sea otoliths. The table
shows results for using both peak performance thresh-
old, and no threshold (divided by the double vertical
lines).
The increase in accuracy, sensitivity and speci-
ficity when global thresholds are implemented sup-
ports the idea that the two classes of boundary have
significant overlap in scale space, where boundaries
that show differences while unprocessed may have
much the same representation.
Table 1: The LOOCV classification accuracies using our
CSS matching implementation on all three tests, North-Sea
Vs Thames (NSvTh), Classes from the SHAPE database
(SHAPE) and Herring Vs Plaice (HvP).
1-NN 3-NN 5-NN
NSvTh T = 1 55% 56% 50%
peak T = 0.01 61% 61% 45%
SHAPE T = 1 87% 69% 59%
peak T = 0.25 91% 74% 67%
HvP T = 1 100% 99% 100%
peak T = 0.20 100% 100% 100%
Table 2: Confusion matrices including sensitivity (Se) and
Specificity (Sp) for NSvTh classification using 1,3 and 5-
NN selection (Rows - Query, Columns - Result). Table to
the Left using peak T value, to the right using no threshold
(T=1).
NS Th Se/Sp NS Th Se/Sp
1-NN NS 26 24 0.52 21 29 0.42
Th 15 35 0.70 16 34 0.68
3-NN NS 29 21 0.58 25 25 0.50
Th 18 32 0.64 19 31 0.62
5-NN NS 24 26 0.48 25 25 0.50
Th 29 21 0.42 25 25 0.50
4.2 Shapelet Classification
The results in Table 3 demonstrate promising classi-
fication results using the Shapelet-transformed data.
The best performing classifier on this data is the Ran-
dom Forest algorithm, which was over 20% more ac-
curate than using a simple one-nearest neighbour ap-
proach. It is also interesting to note that the stan-
dard decision tree implementation using C4.5 is out-
performed by the simple Naive Bayes classifier. The
Table also shows results of the same classification al-
gorithms using Univariate Boundary-data (UV-B) and
CSS maxima ‘coordinate’ pairs (CSSa and CSSb).
Table 3: The LOOCV classification accuracies of a range
of algorithms using Univariate Boundary data (UV-B),
Shapelet transformed data (Shapelet) and CSS maxima
(CSSa/CSSb).
Classifier UV-B Shapelet CSSa CSSb
NN 58% 64% 55% 49%
NB 63% 77% 55% 50%
C4.5 56% 74% 46% 56%
SVML 59% 75% 65% 70%
SVMQ 58% 71% 59% 67%
RaF 68% 87% 54% 54%
RoF 61% 78% 52% 52%
NN - DTW 65% N/A N/A N/A
ClupeaHarengus:IntraspeciesDistinctionusingCurvatureScaleSpaceandShapelets-ClassificationofNorth-seaand
ThamesHerringusingBoundaryContourofSagittalOtoliths
141
From the top five Shapelets it was seen that the dis-
criminating features determined by Shapelet encod-
ing fall in areas of very low curvature. This may
explain the comparatively poor results produced by
the CSS method on HvH tests, where the same sys-
tem performs well on other datasets. If discrimina-
tory portions of the boundary mainly fall in areas of
low curvature, then curvature modelling techniques
will have difficulty distinguishing between the two
classes. Also where discriminatory features depend
on angle of section with respect to the boundary cen-
troid or other sections of the curve, CSS methods
will be unable to distinguish between them as curva-
ture maxima give no indication of angle with respect
to non-regional areas. It is noted however that us-
ing curvature maxima did offer some improvement on
random selection, therefore maxima must offer some
level of discriminatory power.
5 CONCLUSIONS
Our study has shown that curvature scale space works
well for classification where class boundaries are
significantly different (between species or between
classes of the SHAPE database). Nearest-Neighbour
classification of Herring performed better using UV-
Bs than the CSS transformed data alone (CSSa, CSSb
in Table 3), the use of CSS’s own matching algo-
rithm and global thresholds increase accuracy fur-
ther, to around the same accuracy as other classifica-
tion methods (of UV-Bs). Even the best CSS encod-
ing/classifying method is significantly outperformed
by Shapelet method of encoding regardless of which
classifier is used to process data. The method offers
accuracy (3-26% better), sensitivity and specificity far
above the CSS matching methods used.
Our results compare well with previous studies
of stock discrimination using otoliths. Studies of
dolphinfish otoliths (Duarte-Neto et al., 2008) us-
ing fourier desciptors of the boundary show results
in the region 57-70% which is comparable to our
CSS matching implementation. Campana and Cas-
selman (Campana and Casselman, 1993) produce re-
sults of 67% using otolith boundary alone, however
other results in the same study fall far below those
discussed in this work. Results from mackerel classi-
fication (DeVries et al., 2002) show 80-86% accuracy
which while significantly above our CSS implemen-
tation, compares well with our shapelet classification
results. Our shapelet approach also matches the ac-
curacies of studies which use microstructure analy-
sis (Petursdottir et al., 2006), and where shape alone
returned results equal to or lower than our CSS meth-
ods.
Results from tests on the SHAPE database com-
pare well with previous work (Abbasi et al., 1999;
Latecki et al., 2000). Whilst our results have shown
a moderate (15%) improvement on those results it
should be noted that we have excluded several classes
from our SHAPE set, and some previous results are
given with restricted sets themselves.
6 FURTHER WORK
For the purpose of this study we have returned re-
sults using multiple values for the global threshold.
However this threshold itself is used for three dif-
ferent global measures; circularity, eccentricity and
maxima aspect-ratio, rather than each having their
own limit. Further work should attempt to better
utilise the three metrics. Existing and subsequent
metrics should be assessed individually for effective-
ness rather than using a single threshold. Whilst we
have determined that the threshold cannot be set using
separate (SHAPE) datasets, the threshold itself should
be set using some form of double-cross validation if
possible (Stone, 1974) to avoid overfitting the thresh-
old or thresholds to the specific classification prob-
lem.
As Shapelets appear to show that important in-
formation is available in areas of low curvature, it
is possible that methods to enhance CSS for shapes
with shallow concavities (Abbasi et al., 2000) may
increase classification accuracy. We plan to compli-
ment the method used with these enhancements to de-
termine whether they are suitable for our problem. It
may be further possible that other enhancement meth-
ods such as those that encode convex shapes (Kopf
et al., 2005) improve results using the CSS matching
technique.
REFERENCES
Abbasi, S., Mokhtarian, F., and Kittler, J. (1999). Curvature
scale space image in shape similarity retrieval. Multi-
media systems, 7(6):467–476.
Abbasi, S., Mokhtarian, F., and Kittler, J. (2000). Enhanc-
ing css-based shape retrieval for objects with shallow
concavities. Image and Vision Computing, 18(3):199–
211.
Amanatiadis, A., Kaburlasos, V., Gasteratos, A., and Pa-
padakis, S. (2011). Evaluation of shape descriptors
for shape-based image retrieval. Image Processing,
IET, 5(5):493–499.
Begg, G., Campana, S., Fowler, A., and Suthers, I. (2005).
Otolith research and application: current directions in
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
142
innovation and implementation. Marine and Freshwa-
ter Research, 56(5):477–483.
Begg, G., Overholtz, W., Munroe, N., et al. (2001). The
use of internal otolith morphometrics for identifica-
tion of haddock (melanogrammus aeglefinus) stocks
on georges bank. Fishery Bulletin, 99(1).
Bober, M. (2001). Mpeg-7 visual shape descriptors. Cir-
cuits and Systems for Video Technology, IEEE Trans-
actions on, 11(6):716–719.
Bolle, L., Rijnsdorp, A., van Neer, W., Millner, R., van
Leeuwen, P., Ervynck, A., Ayers, R., and Ongenae, E.
(2004). Growth changes in plaice, cod, haddock and
saithe in the north sea: a comparison of (post-) me-
dieval and present-day growth rates based on otolith
measurements. Journal of sea research, 51(3):313–
328.
Campana, S. and Casselman, J. (1993). Stock discrimina-
tion using otolith shape analysis. Canadian Journal of
Fisheries and Aquatic Sciences, 50(5):1062–1083.
DeVries, D., Grimes, C., and Prager, M. (2002). Using
otolith shape analysis to distinguish eastern gulf of
mexico and atlantic ocean stocks of king mackerel.
Fisheries Research, 57(1):51–62.
Duarte-Neto, P., Lessa, R., Stosic, B., and Morize, E.
(2008). The use of sagittal otoliths in discriminating
stocks of common dolphinfish (coryphaena hippurus)
off northeastern brazil using multishape descriptors.
ICES Journal of Marine Science: Journal du Conseil,
65(7):1144–1152.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reute-
mann, P., and Witten, I. (2009). The weka data min-
ing software: an update. ACM SIGKDD Explorations
Newsletter, 11(1):10–18.
Jalba, A., Wilkinson, M., and Roerdink, J. (2006). Shape
representation and recognition through morphologi-
cal curvature scale spaces. Image Processing, IEEE
Transactions on, 15(2):331–341.
Kopf, S., Haenselmann, T., and Effelsberg, W. (2005).
Enhancing curvature scale space features for robust
shape classification. In Multimedia and Expo, 2005.
ICME 2005. IEEE International Conference on, pages
4–pp. IEEE.
Latecki, L., Lakamper, R., and Eckhardt, T. (2000). Shape
descriptors for non-rigid shapes with a single closed
contour. In Computer Vision and Pattern Recognition,
2000. Proceedings. IEEE Conference on, volume 1,
pages 424–429. IEEE.
Lines, J., Davis, L., Hills, J., and Bagnall, A. (2012). A
shapelet transform for time series classification. In
Proc. 18th ACM SIGKDD.
MATLAB (2010). version 7.11.0 (R2010b). The Math-
Works Inc., Natick, Massachusetts.
Mokhtarian, F., Abbasi, S., and Kittler, J. (1996). Robust
and e cient shape indexing through curvature scale
space. In Proceedings of the 1996 British Machine
and Vision Conference BMVC, volume 96.
Mokhtarian, F. and Mackworth, A. (1992). A theory of mul-
tiscale, curvature-based shape representation for pla-
nar curves. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 14(8):789–805.
MPEG-7 (2003). Multimedia content description interfaces.
part 6: Reference software. ISO/IEC 15938-6.
MPEG7 (1999). Ce shape-1 part b, 1400 binary shape im-
ages.
Parisi-Baradad, V., Lombarte, A., Garc
´
ıa-Ladona, E.,
Cabestany, J., Piera, J., and Chic, O. (2005). Otolith
shape contour analysis using affine transformation in-
variant wavelet transforms and curvature scale space
representation. Marine and freshwater research,
56(5):795–804.
Petursdottir, G., Begg, G., and Marteinsdottir, G. (2006).
Discrimination between icelandic cod (gadus morhua
l.) populations from adjacent spawning areas based
on otolith growth and shape. Fisheries research,
80(2):182–189.
Stone, M. (1974). Cross-validatory choice and assessment
of statistical predictions. Journal of the Royal Statis-
tical Society. Series B (Methodological), pages 111–
147.
Stransky, C. (2005). Geographic variation of golden redfish
(sebastes marinus) and deep-sea redfish (s. mentella)
in the north atlantic based on otolith shape analysis.
ICES Journal of Marine Science: Journal du Conseil,
62(8):1691–1698.
Ye, L. and Keogh, E. (2009). Time series shapelets: A
new primitive for data mining. In Proc. 15th (ACM
SIGKDD).
Zhang, D. and Lu, G. (2003a). A comparative study of cur-
vature scale space and fourier descriptors for shape-
based image retrieval. Journal of Visual Communica-
tion and Image Representation, 14(1):39–57.
Zhang, D. and Lu, G. (2003b). Evaluation of mpeg-7 shape
descriptors against other shape descriptors. Multime-
dia Systems, 9(1):15–30.
ClupeaHarengus:IntraspeciesDistinctionusingCurvatureScaleSpaceandShapelets-ClassificationofNorth-seaand
ThamesHerringusingBoundaryContourofSagittalOtoliths
143