removed, but MS lesions come in different locations
and sizes. Moreover, brain MRI images suffer the
presence of noise artifacts, non-uniformities, and are
affected by the intrinsic differences in the anatomy of
human brains; the lack of annotated data (MRI im-
ages without a corresponding lesion mask) is also se-
rious limitation that can cause misclassification prob-
lems, and result in a reduction of performances in
lesion identification, both in Machine Learning and
Deep Learning approaches. To overcome these prob-
lems, Artificial Intelligence requires very large train-
ing sets to correctly ”learn” to identify the relevant
features (tissues/lesions) on the image. Unfortunately,
on top of the physiological variability of the human
brains and MS lesions, different data-sets also come
in different formats, have been generated by differ-
ent equipment that introduce different artifacts, have
different dimensions, number of slices, file formats.
On the other hand, Deep learning methods require, as
input, images with a certain standard images’ file for-
mat (such as PNG), so it is not possible to feed a deep
learning architecture directly with an image stored
in a typical medical image format (such as NIfTI or
DICOM). Existing software and tools for image pre-
processing have some limitations. Firstly, they are of-
ten maintained by separated groups: when updating to
a new version of one of these tools, versioning prob-
lems and inconsistencies may occur if such update is
not supported by the other tools or plugins, such as
CBS Tools, JIST, TOADS-CRUISE, and BrainSuite.
Secondly, they lack in easy customization, and have
often strict requirements in terms of settings, making
it difficult to simply obtaining homogenous and weel
structured data-sets. All the mentioned problems of-
Figure 1: Steps of the proposed pipeline.
ten force scientists to rely on a single data-set, hope-
lessly affecting the quality of the Machine Learning
or Deep Learning pipeline performance.
3 MATERIALS AND METHODS
Pre-processing is crucial for any data-analysis
pipeline, and this is particularly true for medical im-
ages. The proposed pipeline (summarized in Figure
1) includes the following typical pre-processing steps
for brain MRI images (which are usually performed
by separate and independent tools):
• MRI sequence selection: consists in selecting
the MRI modality that will be processed. Our
pipeline focuses on T2w images, as MS lesions
are mostly visible in such modality;
• Image registration: registering an image means,
in this case, matching it with a reference model.
It is a key step and a prerequisite for all applica-
tions that want to compare data-sets among sub-
jects or across time (Toga, 2019); raw MRI im-
ages are not registered and may have different
spacing and slice resolution (Alam, 2016). The
registration step consists in a set of transformation
of the raw image that optimises a similarity index
with the reference image. The registered image
is obtained linearly interpolating the initial image
domain into the new domain, as image files con-
sist of a variable number of slices, each slice cor-
responding to a different longitudinal brain sec-
tion: when the number of slices in the original im-
age is not consistent with the number of images in
the atlas, the missing slices are interpolated. Our
pipeline registers each MRI image and then saves
it back as NIfTI files; moreover, this step makes
sure the MRI image contains the same number of
slices as the reference image by (if necessary) in-
terpolating missing slices;
• Brain-extraction: also known as skull-stripping,
it is a step which removes tissues that are not
of interest, such as skull and dura mater. In-
cluding non-brain tissues is a known source of
errors (Rehman, 2020); there are several possi-
ble brain-extraction methods, for instance rely-
ing on deep learning techniques or on traditional
morphological operations (Kalavathi, 2016). The
brain-extraction method proposed in this paper is
an adaptation of the method proposed by Gam-
bino et al. (Gambino, 2011), and uses a combina-
tion of morphological operations;
• Bias field correction and noise reduction: it cor-
rects the bias field, that is a low frequency inten-
sity nonuniformity present in the image data as
inhomogeneity and illumination nonuniformity.
• Final data-set creation: images (and correspond-
ing masks) are saved back as NIfTI files and as
BIOINFORMATICS 2022 - 13th International Conference on Bioinformatics Models, Methods and Algorithms
116