work studies consider two categories of tissue only
(tumor and non tumor) which is not representative of
the complex structure of histological images. A typ-
ical section of solid tumor is a very heterogeneous
structure. Also a single sub-type of breast cancer car-
cinoma which is Invasive carcinoma (IC) (Cruz-Roa
et al.). Previous studies do not take into account the
non-invasive breast cancer type called ”in situ car-
cinoma” despite its frequency (20 to 25% of newly
diagnosed breast cancers). Reporting the presence
of both invasive and/or in situ carcinoma is a chal-
lenging part of a diagnostic pathology workup since
there is a significant difference of treatment options
of the disease. ion might be crucial to identify areas
where a full resolution analysis should be performed.
There are very few whole slide breast cancer datasets
with pixel-level annotations. Regarding breast can-
cer pathological dataset Spanhol et al introduced The
Breast Cancer Histopathological Image Classification
(BreakHis) wich is composed of 2,480 benign and
5,429 malignant samples of microscopic images of
breast tumor tissue (Spanhol et al.). However, these
two categories of tissues are not enough because it
does not reflect the complexity of tissue diversity. To
tackle this shortcoming, Grand Challenge on Breast
Cancer Histology Images (BACH) had launched an
annotated Whole-slide images dataset (Aresta et al.,
2018). The organization provided 10 pixel-wise anno-
tated regions for the benign, in situ and invasive carci-
noma classes present in a entire sampled tissue which
represent a partially annotated masks. In recent years,
deep learning models, especially convolutional neural
networks (CNNs) (LeCun et al.) have emerged as a
new and more powerful model for automatic segmen-
tation of pathological images. The power of a CNN
based model lies in its deep architecture which allows
for learning relevant features at lower levels of ab-
straction. (Hou et al., 2016) proposed a patch-based
CNN and to train a decision fusion model as a two-
level model: patch-based and image-based model to
classify WSIs into tumor subtypes and grades. Chen
et al. proposed an encoder-decoder architecture to
gland segmentation in benign and malignant (Chen
et al., 2016a). Cruz et al. presented a classification
approach for detecting presence and extent of inva-
sive breast cancer on WSIs using a ConvNet classifier
(Cruz-Roa et al.).
The greatest challenge in the medical imaging do-
main especially in pathology is to deal with small
datasets and limited amount of annotated samples,
especially when employing supervised convolutional
learning algorithms that require large amounts of la-
beled data for the training process. Previous studies
that investigated the problem of breast cancer patho-
logical images analysis, did not provide a proper
quantitative and qualitative parameters evaluation for
training deep CNN from scratch with few annotated
samples only.
Contributions
The contribution of this paper is two folds: first since
there is no publicly available annotated data for this
task we developed a new dataset; second we con-
ducted a set of experiments to evaluated several CNN
architectures and settings on that new type of data.
More precisely, we:
• developped a new dataset of WSIs with different
subtypes of breast cancer. The data set consists in
11 whole-slide images fully annotated.
• proposed a fully automatic framework. We ap-
plied machine learning algorithms to extract the
predictive model, and more precisely, we applied
and adapted a patch-based deep learning approach
on our new dataset. While our model relies on ex-
isting architectures (SegNet (Badrinarayanan and
Kendall, 2017), U-Net (Ronneberger et al.), FCN
(Long et al., 2015) and DeepLab (Chen et al.,
2016b)), the originality of our work resides in a
deep analysis of the parameters of the model.
• conducted several experiments to evaluate the set-
tings of each step of the proposed framework in
order to get the optimal set of parameters when
dealing with this new data for a tissue-level seg-
mentation task.
The paper is organized as follows: in Section 2, we
present the new data set that we built. Section 3
presents the framework we developed as well as an
overview of the experiments and evaluation measures.
Section 4 presents the details of the experiments and
their results. Section 5 provides the main recommen-
dations related to the influence of the model parame-
ters.Section 6 concludes this paper and discusses fu-
ture work.
2 NEW ANNOTATED DATASET
This work involved anonymized breast cancer slides
from the archives of the pathology department of the
Toulouse University Cancer Institute. The breast can-
cer images waere acquired with a Panoramic Digital
Slide Scanners 3DHISTECH. This selection was re-
viewed by an expert pathologist to confirm the pres-
ence of at least one of the two cited categories of car-
cinoma considered in this study. To describe the com-
plexity of the tissue structures present in the image
Deep Analysis of CNN Settings for New Cancer Whole-slide Histological Images Segmentation: The Case of Small Training Sets
121