CXR were shown to have learned correlations in the
data rather than clinical information (DeGrave et al.,
2020). Because most models for COVID-19 detec-
tion are trained with a mixture of negative COVID-19
pre-pandemic CXRs and positive COVID-19 cases, it
becomes simpler to learn shortcuts such as the dataset
from where the image comes from than more complex
features such as lung opacities. While these shortcuts
lead to excellent performance in datasets similar to
the train dataset, catastrophic failure occurs once the
model is tested on a different dataset. Among others,
the markers for laterality, patient positioning and hos-
pital system were identified as features strongly in-
fluencing the decision of algorithms (DeGrave et al.,
2020).
The goal of this study was thus to develop and
validate an automatic method to detect markers and
written labels in CXR images. Such a method could
then be used for automatic obscuration of markers in
large datasets, promoting the learning of generic and
meaningful features and thus improving performance
and robustness.
2 METHODS
2.1 Datasets
Four different datasets were used in this study,
obtained from different sources. The first dataset,
hereinafter referred to as the Mixed dataset (1,395
CXRs) is composed of a combination of multiple pub-
lic CXR datasets, namely from the CheXpert (Irvin
et al., 2019) (7 CXRs), ChestXRay-8 (Wang et al.,
2017) (226 CXRs), Radiological Society of North
America Pneumonia Detection Challenge (RSNA-
PDC) (Kaggle, 2018) (639 CXRs) and COVID
DATA SAVE LIVES
1
(199 CXRs) datasets as
well as from COVID-19 CXR public repositories,
namely COVID-19 IDC (Cohen et al., 2020) (265
CXRs), COVIDx (Wang and Wong, 2020) (4 CXRs),
Twitter
2
(9 CXRs) and the Sociedad Espa
˜
nola de
Radiologia M
´
edica (SERAM) website
3
(46 CXRs).
The second and third datasets, hereinafter referred
to as the BIMCV and COVIGR datasets (289 and
300 CXRs respectively) are each from a single
hospital system public dataset, namely the BIMCV-
COVID19-PADCHEST (Bustos et al., 2020) (248
1
https://www.hmhospitales.com/coronavirus/
covid-data-save-lives
2
https://twitter.com/ChestImaging
3
https://seram.es/images/site/TUTORIAL CSI RX
TORAX COVID-19 vs 4.0.pdf
CXRs) and BIMCV-COVID-19+ (Vay
´
a et al., 2020)
(41 CXRs) datasets and the COVIDGR (Tabik et al.,
2020) dataset. The fourth dataset is a private col-
lection of 597 CXRs collected retrospectively at the
Centro Hospitalar de Vila Nova de Gaia e Espinho
(CHVNGE) in Vila Nova de Gaia, Portugal between
the 21st of March and the 22nd of July of 2020. All
data was acquired under approval of the CHVNGE
Ethical Committee and was anonymized prior to any
analysis to remove personal information.
All CXRs were selected randomly from both nor-
mal and pathological cases after exclusion of views
other than postero-anterior and antero-posterior.
2.2 CXR Annotation
In order to set a ground truth for training and evalu-
ation of the algorithms, manual annotation of all la-
bels was performed using an in-house software. The
software presented CXRs from a randomly selected
subset and allowed for window center/width adjust-
ment, zooming and panning. The software allowed
for rectangles of any size to be drawn on the image,
covering the labels, and saved the corresponding co-
ordinates. Figure 1 shows examples of manually an-
notated bounding boxes.
2.3 Automatic Label Detection
The automatic label detection model is based on
YOLOv3 (You Only Look Once, Version 3) (Red-
mon and Farhadi, 2018). The network is composed
of a feature extraction backbone, DarkNet-53 (Red-
mon and Farhadi, 2018), which is used to obtain
a M ×M ×N feature map F, where M is the spa-
tial grid used and N is the number of feature maps.
This feature map F is then convolved to obtain an
M×M×B×6 output tensor where B is a predefined
number of objects to predict per grid point and which
contains the predicted objects’ confidence score, class
probability and bounding box position and dimen-
sions. One particular characteristic of YOLOv3 is that
the bounding box dimensions are not explicitly pre-
dicted by the network but are defined in relation to
pre-defined bounding box templates, commonly re-
ferred to as anchors. The anchors are learned a pri-
ori before the training of YOLOv3 and correspond
to the cluster centers from a k-means that maximizes
the IoU of these anchors with the training set ground
bounding boxes. The network then learns to predict
the deviation (in length and width) from each of these
pre-defined anchors, thus defining each predicted ob-
ject.
BIOIMAGING 2022 - 9th International Conference on Bioimaging
64