a number of studies conducted on intrarater and inter-
rater reliability (Kuhlemeier et al., 1998; McCullough
et al., 2001; Stoeckli et al., 2003; Scott et al., 1998).
Evidently, the evaluation process demands more
objective methods for quantifying various measures
involved with the evaluation process. However, as of
the writing of this article, only a few attempts have
been made to meet this demand. Chen et al. pro-
posed a computer aided method that measures and
quantifies oral movement (Chen et al., 2001). Aung et
al. proposed automatic identification of a number of
anatomical landmarks using a 16-point active shape
model (Aung et al., 2010b). In a different study, Aung
et al. introduced a semi-automatic approach to deter-
mine the transit time of the bolus (Aung et al., 2010a).
Kellen et al. proposed a semi-automatic method to
track the hyoid bone (Kellen et al., 2010). It is worth
mentioning here that in the work of Kellen et al., the
region-of-interest is identified manually by means of
user interaction.
This research concentrates on the problem of
quantifying the movement of the hyoid bone. In this
work, a semi-automatic method is introduced which
attempts to identify and track the hyoid bone in fluo-
roscopic videos. At the same time, the cervical ver-
tebrae are also identified which establish a relative
referencing system. In order to limit image process-
ing procedures to the relevant area of the image, the
regions-of-interest are automatically identified before
identifying the hyoid bone and the cervical vertebrae.
The rest of the paper is organized as follows. Sec-
tion 2 presents the proposed method. The results are
presented in Section 3. Section 4 concludes the article
by commenting on the results. A number of directions
to future work are pointed out in Section 5.
2 PROPOSED METHOD
The proposed method attempts to quantify the move-
ment of the hyoid bone in fluoroscopic videos. Addi-
tionally, a referencing system relative to the patient is
established by identifying the cervical vertebrae (see
Section 2.3). Using a classification-based approach,
the regions-of-interest are automatically identified in
order to limit image-processing operations on a sub-
region of the image. By matching user defined tem-
plates, objects inside the regions-of-interest are iden-
tified.
2.1 Identifying the Region-of-Interest
The proposed method identifies the regions-of-
interest using a method similar to the one proposed
by Huang et al. where the lumbar vertebrae are de-
tected using a learning-based method (Huang et al.,
2009). Such a method is fast, requires no user inter-
action and can be tuned to achieve high accuracy. In
this research, the regions-of-interestare automatically
identified using the Haar classifier. The Haar classi-
fier uses Haar features to classify sub-regions in the
image and search the image for target objects (Viola
and Jones, 2001). Instead of using the original fea-
tures, an extended feature-set is used in this research
which includes tilted features (Lienhart and Maydt,
2002).
The classifier is trained to identify the region-of-
interest containing the cervical vertebrae. For training
purpose, two sets of example images are prepared.
The cervical vertebrae are present in one set (set of
positive samples), and absent from the other (set of
negative samples). As of the writing of this article,
there is no conclusive study that dictates the optimum
number of samples. However, Lienhart et al. con-
ducted an empirical study on the training process with
5000 positive samples and 3000 negative samples and
the positive samples are derived from 1000 images
(Lienhartet al., 2003). In this research, the same num-
ber of samples is used. For the negative samples, high
resolution random images are utilized.
The training process utilizes the adaboost method
to iteratively classify the samples into their corre-
sponding classes, minimizing the classification error
at each step (Freund and Schapire, 1995). A single
Haar feature performs as an input to a weak classifier.
At each step, the adaboost method combines multiple
weak classifiers in order to generate a boosted classi-
fier. To speed up the detection process, a cascade of
boosted classifiers is used.
It is not required to train a separate classifier for
the purpose of identifying the region-of-interest for
the hyoid bone. In the fluoroscopic videos, the hyoid
bone is always located on the left side of the region-
of-interest for the cervical vertebrae. This observation
suggests that the region-of-interest for the hyoid bone
can be inferred from the region-of-interestfor the cer-
vical vertebrae by mirroring the latter to the left. Fig-
ure 2 shows the identified regions-of-interest for the
hyoid bone and the cervical vertebrae in one of the
frames from the videos.
2.2 Tracking
After the regions-of-interest are identified, it is re-
quired to identify the objects of interest (each cervi-
cal vertebra and the hyoid bone) and track the objects
throughout the video. Template matching is used to
accomplish this task. Before tracking can be started,
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
758