Boosted Tree Classifier for in Vivo Identification of Early Cervical
Cancer using Multispectral Digital Colposcopy
Nilgoon Zarei
1
, Dennis Cox
2
, Pierre Lane
1
, Scott Cantor
3
, Neely Atkinson
2
, Jose-Miguel
Yamal
4
, Leonid Fradkin
5
, Daniel Serachitopol
2
, Sylvia Lam
1
, Dirk Niekerk
6
, Dianne Miller
7
,
Jessica McAlpine
7
, Kayla Castaneda
5
, Felipe Castaneda
5
, Michele Follen
5
and Calum MacAulay
1
1
Integrative Oncology, BC Cancer Research Centre and University of British Columbia,
Vancouver, British Columbia, Canada
2
Rice University, Houston, U.S.A.
3
University of Texas, MD Anderson Cancer Center, Houston, U.S.A.
4
University of Texas, Houston, U.S.A.
5
Brookdale Hospital and Medical Center, New York, U.S.A.
6
BC Cancer Agency, Vancouver, Canada
7
Vancouver General Hospital, Vancouver, Canada
Keywords: Boosted Tree Classifier, Machine Learning, Image Processing, Multispectral Digital Colposcopy, Cervical
Cancer.
Abstract: Background: Cervical cancer develops over several years; screening and early diagnosis have decreased the
incidence and mortality threefold over the last fifty years. Opportunities for the application of imaging and
automation in the screening process exist in settings where resources are limited. Methods: Patients with
high-grade squamous intraepithelial lesions (SIL) underwent imaging with a Multispectral Digital
Colposcopy (MDC) prior to have a loop excision of the cervix. The image taken with white light was
annotated by a clinician. The excised specimen was mapped by the study histopathologist blinded to the
MDC data. This map was used to define areas of high grade in the excised tissue. Eleven reviewers mapped
the histopathologic data into the MDC images. The reviewers’ maps were analyzed and areas of agreement
were calculated. We compared the result of a boosted tree classifier with a previously developed ensemble
classifier. Results: Using a boosted tree classifier we obtained a sensitivity of 95%, a specificity of 96%, and
an accuracy of 96% on the training sets. When we applied the classifier to a test set, we obtained a
sensitivity of 82%, a specificity of 81%, and an accuracy of 81%. The boosted tree classifier performed
better than the previously developed ensemble classifier. Conclusion: Here we presented promising results
which show that a boosted tree analysis on MDC images is a method that could be used as an adjunct to
colposcopy and would result in greater diagnostic accuracy compared to existing methods.
1 INTRODUCTION
Cervical cancer is a preventable disease. However,
approximately 500,000 patients with cervical cancer
are diagnosed every year and about half that many
succumb to the disease. Cervical cancer has
decreased in incidence and mortality in all countries
with organized screening and detection programs.
These programs are costly and require a great deal of
trained personnel. Automated detection of cervical
cancer and its precursors could improve cancer
management in low and middle income countries
where resources do not permit large screening
infrastructure (world cancer research, 2012).
Cervical intraepithelial neoplasia (CIN) or SIL
are cervical cancer precursors which can develop
over three to twenty years into cancer. This long
transition period makes cervical cancer an ideal
cancer for early detection and treatment. Optical
technologies such as fluorescence and reflectance
spectroscopy have been extensively investigated
as effective and non-invasive methods for cancer