Visual Insights in Human Cancer Mutational Patterns: Similarity-Based

Cancer Classiﬁcation Using Siamese Networks

Rocco Zaccagnino, Clelia De Felice, Marco Russo and Rosalba Zizza

Dip. di Informatica, University of Salerno, Salerno, Italy

Keywords:

Cancer Detection, Siamese Neural Networks, Mutational Signature, Explainable AI, Information

Visualization.

Abstract:

In recent years, a number of innovations concerning the diagnosis and treatment of diseases through the ap-

plication of genomics have opened the door to the detailed analysis of somatic mutation patterns in human

cancers. Several AI-based systems have been proposed to identify correlations between mutations and type of

cancer. However, the use of AI in Bioinformatics still presents two main limitations: (i) the explainability, i.e.,

the ability of the methods to partially explain and motivate their behavior, and (ii) the usability, i.e., about the

strong limitations that are found in the actual use of such methods in real bio-medical contexts and scenarios.

In this work, we propose a novel ML-based cancer-type detection system which integrates explainability and

usability techniques. To this aim, we ﬁrst formulate the cancer-type detection problem using the similarity-

based classiﬁcation paradigm. Then, given a cancer sample, we assume to have a set of somatic mutation

features available which can be interpreted as cancer mutational view of the sample itself. Finally, we propose

the use of a special Machine Learning model deﬁned for learning similarity functions, namely the Siamese

Neural Network (SNN). The proposed SNN learns to take a pair of cancer mutational views as input, and to

compute a similarity score that can be used to verify whether such samples are similar or not.

Preliminary experiments carried out to assess the effectiveness of the proposed system show high performance

reaching f1 score 97.61%, and highlight how the similarity-based classiﬁcation paradigm could be more suit-

able than the commonly used classiﬁcation paradigm for the formulation of the cancer-type detection problem.

1 INTRODUCTION

1.1 Cancer and Somatic Mutations

In recent years, a number of technical innovations

have been developed regarding the diagnosis and

treatment of diseases through the application of ge-

nomics. The most evident result is the standardiza-

tion of tumor proﬁling techniques based on recur-

rent targeted mutations analysis. This has led to an

evident efﬁcacy of molecularly targeted therapies on

distinct types of tumor by exploiting information re-

garding shared genetic features. Today, based on re-

cent large-scale exome and genome-sequencing stud-

ies, we know that major tumour types present speciﬁc

patterns of somatic mutations (Kandoth et al., 2013;

Lawrence et al., 2013; Ciriello et al., 2013).

In this direction, several research initiatives have

developed recently. As an example, at Memorial

Sloan Kettering Cancer Center

, a NGS panel named

msk-impact has been developed to show the feasi-

bility and utility of large-scale prospective clinical

sequencing of tumors to guide clinical management.

msk-impact has been used to detect all protein-

coding mutations, copy number alterations, and se-

lected promoter mutations and structural rearrange-

ments in 410 cancer-associated genes, for a total of

62 sequenced principal tumors from more than 10,000

patients. The result is a comprehensive and detailed

catalog of somatic mutations for every tumor se-

quenced, publicly available online

1.2 Contribution of this Work

Explainable and Usable AI. Artiﬁcial Intelligence

(AI) and in particular Machine Learning (ML) sys-

tems are increasingly used in Bioinformatics. This

because the massive amounts of bio-medical data, in-

https://www.mskcc.org/

http://cbioportal.org/msk-impact

462

Zaccagnino, R., De Felice, C., Russo, M. and Zizza, R.

Visual Insights in Human Cancer Mutational Patterns: Similarity-Based Cancer Classiﬁcation Using Siamese Networks.

DOI: 10.5220/0012399600003657

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2024) - Volume 1, pages 462-470

ISBN: 978-989-758-688-0; ISSN: 2184-4305

cluding heterogeneous high-dimensional data, intro-

duce challenges to existing ML methods (Karim et al.,

2021), which are increasingly being used successfully

for data analysis and interpretation.

To date, the use of AI techniques in Bioinformat-

ics has two main limitations. The ﬁrst is the so-called

explainable AI (XAI), i.e., the ability of the methods

to partially explain or motivate their behavior, while

the second is about the usable AI, i.e., the actual use

of such systems in real-world scenarios.

While ML models are able to address complex

problems, their “black-box” nature raises concerns

about transparency and accountability, which also

overshadow their ability to solve the problems them-

selves. The ﬁeld of XAI aims to make AI systems

more transparent by explaining how they make deci-

sions and so to enhance the human-comprehensibility,

reasoning, transparency, and accountability.

As mentioned earlier, another strong limitation of

the use of AI in Bioinformatics is about the actual “us-

ability” of such systems in real-world scenarios. Ad-

vanced ML models facing really complex problems

often suffer from scalability problems. In some cases,

the motivation could be found in the “classiﬁcation”

paradigm used to formulate the problem faced: there

are n classes of samples, and the model is trained on a

training set to classify a new sample in one of such n

classes. This approach, especially in Bioinformatics,

could suffers from some issues, including the enor-

mous amount of data on which the model must be

trained, the strong imbalance of the classes that can

arise when working on real data, and above all the

problem of scaling the model when new classes of

samples must be classiﬁed. In this case, the model

must be retrained on the whole set of data, with se-

vere impact on the computational effort, but also in

contexts where a timely response can be crucial.

Proposed Strategy. We propose a novel ML-based

cancer-type detection system with the the aim of in-

tegrating it with explainability and usability tech-

niques. We ﬁrst formulate such a problem in terms

of similarity-based classiﬁcation (Chen et al., 2009).

Given a cancer sample, we assume to have a set

of somatic mutation features available which can be

interpreted as a cancer mutational view of the sam-

ple itself. Then, according to the central idea of the

similarity-based classiﬁcation paradigm, we deﬁne a

model which does not simply learn to classify a can-

cer sample by observing its cancer mutational view,

but which is able to learn, starting from a set of sam-

ple pairs, a similarity function and which therefore is

able to tell whether two samples are similar or not.

Clearly, the more the starting set of samples is repre-

sentative of the problem, the more accurate the func-

tion is. The advantage of this approach is that once

the similarity function has been calculated, the model

can also be used on new samples (even of a cancer-

type never seen during the training) of which to ﬁnd

out which classes are more similar to. Furthermore,

to make the system scalable on large amounts of data,

we keep track, for each cancer-type class, of one sin-

gle representative view, and using them to ﬁnd out

which classes are more similar to a test view, with

great beneﬁts both in terms of memory and privacy.

There are numerous examples of works in Bioin-

formatics based on the similarity-based classiﬁcation

paradigm (Mathai and Kirchmair, 2020). In this pa-

per, we propose the usage of special ML models de-

ﬁned for learning similarity functions, i.e., Siamese

Neural Networks (SNN). We deﬁne a novel SNN

which given a pair of cancer mutational views out-

puts a similarity score that can be used to verify that

they are similar. The proposed solution is based on the

following two main ideas that, in our opinion, could

limitate the issues discussed above. First, the somatic

mutation features of a cancer sample could be used

as “similarity view” that can be exploited as effec-

tive feature embedding for ML methods. Second, we

show that the SNN increases the level of discrimina-

tion strength within the proposed cancer mutational

views (Bell and Bala, 2015).

Several studies have been proposed in the litera-

ture to face the problem of using ML techniques to

determine tumour organ of origin and histology using

the patterns of somatic mutation identiﬁed by whole

genome DNA sequencing, such as (Jiao et al., 2020).

However, most of these are based on the classiﬁcation

paradigm. Furthermore, several works use SNNs in

Bioinformatics (Bechar et al., 2023; Narmatha et al.,

2023), but to the best of our knowledge this is the

ﬁst attempt to propose a similarity-based classiﬁca-

tion paradigm based on SNNs exploiting somatic mu-

tation features for the cancer-type detection problem.

Our Contributions:

• A novel cancer-type detector integrating explain-

ability and usability techniques, and based on can-

cer mutational views for training SNNs at verify-

ing the similarity between cancer samples.

• Preliminary experiments to assess the effective-

ness of the proposed method; results obtained on

a dataset of somatic mutation features show ac-

curacy 89.25%, precision 97.60%, recall 97.63%,

and f1 score 97.63%, highlighting the advantages

of the similarity-based classiﬁcation paradigm.

Visual Insights in Human Cancer Mutational Patterns: Similarity-Based Cancer Classiﬁcation Using Siamese Networks

463

Source code and ﬁles are available online

2 THE PROPOSED SYSTEM

In this section, we describe a novel ML-based cancer-

type detection system. We assume that the reader is

familiar with ML notions. For further details, refer

to (Tan et al., 2016).

2.1 Overview

Here, we provide an overview of the scenario in which

the proposed system can be placed (Figure 1).

• Usability. The system must be designed to be

able to manage views in a scalable and efﬁcient

way. To this aim, the typical scenario in which we

imagine it could be used is the one in which it is

used to store cancer mutational views to be com-

pared from time to time with new test cancer sam-

ples that are analyzed to ﬁnd out their type. More

in detail, at every moment it has in memory a rep-

resentative view of each type of cancer analyzed

up to that moment. Each time a new cancer sam-

ple c

must be detected, the corresponding cancer

mutational view s

, named test view, is provided

to the system; during the search, s

is compared

with every stored enrollment view; then, the sys-

tem returns the type of cancer corresponding to

the enrollment view s

(corresponding to a spe-

ciﬁc cancer sample c

) which is most similar to s

formally denoted with c

∼ c

. We assume that if

this level of similarity does not exceed a threshold

(established during the training of the S ), then s

is a sample of a new type of cancer and therefore

will be memorized as a view of this new type.

The advantages of such a system are numerous.

First, there is no need to keep in memory a huge

amount of data relating to samples to be used for

a re-training of the ML model, but for each type

of cancer only the view of a representative sample

is stored. Furthermore, a signiﬁcant implication is

that of data privacy, which in this case must focus

on the privacy of a very small set of data.

• Explainability. S has been designed to integrate

the attemption mechanism, through special lay-

ers. “Attention” was ﬁrst used in computer vision,

inspired by the idea to mimic the attention abil-

ity of the human brains to deal with the massive

amount of visual input. Attention layers mainly

https://github.com/FLaTNNBio/few-shot-learning-f

or-cancer-detection/tree/master

consist in a weighted mean reduction, where each

element is weighted in proportion to its contribu-

tion to the mean. One way to interpret the atten-

tion weights is to plot them as a feature heatmap,

where each row corresponds to an output item and

each column corresponds to an input feature, and

the color or intensity of each cell indicates the

level of the attention weight. This can help you

visualize which parts of the input are more im-

portant for each output. Thus, by showing the vi-

sualization of the feature heatmap of the attemp-

tion layer we can interpret the relation between

the features and better understand the key issues

which affect the performance of S . As we will

see in Section 3, such heatmaps can be used to

highlight the most relevant somatic mutations in

the several cancer types.

2.2 Cancer Mutational View

The dataset used for our experiments is extracted from

the msk-impact (Kübler et al., 2019), a genomic pro-

ﬁling dataset generated by Memorial Sloan Kettering

Cancer Center. It contains molecular proﬁling data of

10,945 successfully sequenced tumor samples from

10,336 individuals, for 62 principal tumor types. The

dataset, generated using NGS technologies, includes

molecular features that are relevant for cancer diagno-

sis, prognosis, and treatment, such as protein-coding

mutations, copy number alterations (CNAs), and se-

lected promoter mutations and structural rearrange-

ments in 410 cancer-associated genes.

To extract the data used for our experiments, ﬁrst

we downloaded such a dataset

, and then we merged

the following ﬁles to into a csv ﬁle: data_cna.txt,

data_sv.txt, data_clinical_sample.txt,

data_clinical_patient.txt. The dataset ob-

tained consists of 433 features, organized into:

• Clinical info (13): Sample ID, Cancer Type,

Mutation Count, Sex, Sample Type, DNA

Input, Matched Status, Oncotree Code,

Overall Survival Status, Patient’s

Vital Status, Sample Collection Source,

Smoking History, Somatic Status.

• Structural variations info (10): Site1 Chr,

Site1 Region, Site1 Hugo Symbol, Site2

Chr, Site2 Region, Site2 Hugo Symbol,

Class, Connection Type, Tumor Variant

Count, Breakpoint Type.

• Copy Numbers (410).

https://www.cbioportal.org/study/summary?id=msk_

impact_2017

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

464

view

Figure 1: The overall scenario. The system manages cancer mutational views. To store a view s

, it must be entered, and if

the corresponding cancer type is not in the system then to save it together with the cancer type; to detect a view s

, it must

compared with all the stored views; a similarity ranking s

, . . . , s

is built by using the SNN S , and if the similarity score

between s

and s

is greater or equal to a threshold, then the cancer-type of s

is the same of s

At the end of this extraction phase, each cancer

sample was represented by a set of 433 features.

Then, the dataset underwent a normalization pro-

cedure for numeric features, and a one-hot encoding

for non-numeric features, so reaching a number of

features equal to 2181. The reason for the increased

number of features is due to the use of one-hot coding

which notoriously could generate huge vectors since

the size of a generated feature vector is equal to the

number of possible values. To reduce the high di-

mensionality of the input data, several techniques in

the literature can be applied, such as the feature se-

lection procedure. However, in order to limit the loss

of information that could occur by choosing which

input features to keep and which to discard, in this

work we have decided to use the Principal Compo-

nent Analysis with several values for the number of

components parameter. Results showed that best re-

sults have been obtained with 1403 components. For

each cancer sample, this set of 1403 components is

the cancer mutational view.

Finally, since one of the goals of this work is

to compare detection by classiﬁcation with that by

similarity-based classiﬁcation, we tried to maintain,

of the 62 types of cancer managed in the starting

dataset, only those that have a minimum number of

instances that maximize the capability of classiﬁca-

tion models. This is because, as is known, a strong

class imbalance is a problem when training classiﬁ-

cation models. From empirical observations and pre-

liminary experiments, we have observed that by guar-

anteeing a minimum number of instances equal to 30,

this allows us to obtain a classiﬁcation model, with

which we will compare ourselves, with excellent per-

formance (see Section 3) for further details).

Table 1 reports the 16 types of cancer, i.e., the

classes of our problem, which have at least 30 in-

stances, by indicating for each of them the exact num-

ber of instances (#instances).

Table 1: Number of instances for each cancer-type class.

Cancer type #instances

Prostate Cancer 336

Non-Small Cell Lung Cancer 313

Breast Cancer 242

Soft Tissue Sarcoma 104

Colorectal Cancer 100

Glioma 158

Hepatobiliary Cancer 70

Melanoma 66

Esophagogastric Cancer 63

Pancreatic Cancer 56

Bone Cancer 54

Cancer of Unknown Primary 43

Bladder Cancer 42

Ovarian Cancer 40

Head and Neck Cancer 35

Endometrial Cancer 32

2.3 The Proposed Siamese NN

In this section, we ﬁrst describe the SNN S trained

to compute the similarity between two cancer muta-

tional views, and then details of the pseudo-code.

Siamese Architecture and One-Shot Learning.

Given a pair of cancer mutational views s(c

) and

s(c

), where c

and c

are cancer samples, S com-

putes a similarity score S (s(c

), s(c

)). Then, to ver-

ify that c

and c

are of the same type, the following

rule is used by the system:

S (s(c

), s(c

)) ≥ δ =⇒ c

∼ c

Visual Insights in Human Cancer Mutational Patterns: Similarity-Based Cancer Classiﬁcation Using Siamese Networks

465

where δ ∈ [0, 1] is the cancer mutational view thresh-

old empirically estimated during the training of S . In

the following, we provide details about the architec-

ture and the training of S . S consists of three sec-

tions: the branches, the info, and the similarity.

The branches section consists of two identical

subnetworks, each one deﬁned as follows. It starts

with a Linear layer using ReLu activation, which

takes as input the cancer mutational view and returns

a vector of size 1754. Such a layer is then followed by

5 blocks each one consisting of: (i) one Linear with

ReLu activation function and returning a vector of size

750, (ii) one Dropout layer with probability 0.1, and

(iii) one BatchNormalization layer. The info section

essentially consists of two layers, each taking as input

the concatenation of the outputs o

and o

of the two

identical subnetworks described above: one Attention

layer used to integrate S with the attention mecha-

nism described in Section 2.1, and one Lambda layer

used to compute the Euclidean distance between o

and o

. As for the similarity section, the concatena-

tion of the Attention layer output and of the Lambda

layer output is given as input to 3 blocks where each

block consists of: (i) one Linear layer with ReLu ac-

tivation function and returning a vector of size 320,

(ii) one Drouput layer with probability 0.1, (iii) one

BatchNormalization layer. Then, the blocks are fol-

lowed by Linear layer with output of size 1 (“similar

or not similar”) and Sigmoid activation function.

One of the most interesting advantages of using

SNNs is the ability to adopt the One-Shot Learn-

ing strategy, shown to be effective in identifying new

classes based on one (or only a few) examples. The

idea is to learn patterns and similarities on previously

seen classes instead of ﬁtting the ML model to ﬁxed

classes, in order to be able of classifying previously

unseen classes using one instance. This strategy is

very helpful in the scenario described in Section 2.1.

Indeed, it allows us to deﬁne a detection system “cal-

ibrated” on a signiﬁcant initial set of cancer-types,

i.e, with a SNN trained on an initial set of cancer

mutational views corresponding to a “representative”

set of cancer-types; a new cancer-type can be added

to the system without having to retrain the network,

but simply by saving a reference cancer mutational

view, used every time during the detection tasks. S is

trained using the One-Shot learning (Algorithm 1).

Pseudocode:

• One-Shot Learning (Algorithm 1).

It takes as input the dataset S of cancer samples

organized into N cancer-type classes, and the cho-

sen cancer mutational view similarity threshold.

First, the algorithm initializes the weights of S

(line 1), and an empty list one-shot-accuracy

which will contain the accuracy obtained at each

evaluation step (line 2). Then, for each cancer-

type class t

∈ C, t

is split in t

(labelled samples),

and t

(unlabelled samples) (line 5). Each of the

remaining N − 1 classes is split into two balanced

subsets (line 11): the ﬁrst one using the methods

GetSimilarPairs and GetDissimilarPairs to

generate the training set of similar and dissimi-

lar pairs, while the second one used as evaluation

pool (lines 12 and 13). Thus, the training pro-

cess (line 17) and the testing process (line 18)

run, by excluding s

. For the evaluation, the

method GetOtherPairs (line 17) is used to build

a set of evaluation pairs P

. Then, using the

method Voting, each instance e

∈ P

is classi-

ﬁed using the class with the highest votes. Fi-

nally, the trained S and the average accuracy

one-shot-accuracy are returned.

• The overall detection system (Algorithm 2). It

takes as input a cancer_sample, and the type of

request (“storage” or “detect”). At the begin-

ning, the type of request is checked. If a “stor-

age” is required, then the system ﬁrst check if

a the cancer-type of the sample is already stored

in the database using the method GetCancerType

(line 4). If a cancer type has been found, then the

system communicates a cancer mutational view

for the cancer-type of the input sample is already

stored. Otherwise, this means that the the cancer-

typer of the input sample is not stored. Then,

the system saves the cancer mutational view of

cancer_sample as enrollment view through the

method SaveCancerView (line 8). Instead, if

a “detect” is required, the most similar view is

searched within the system (line 12).

3 PRELIMINARY EXPERIMENTS

Here, we report the results obtained during prelimi-

nary experiments carried out to assess the effective-

ness of the proposed detection system. To this aim,

we have compared the performance obtained by the

proposed SNN S described in Section 2.3, with that

obtained by a baseline Deep Neural Network (DNN)

trained for classify the cancer-type of cancer samples.

In these experiments, such a baseline DNN has been

obtained by extracting only one of the subnetworks of

the branches section of S .

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

466

Algorithm 1: S One-Shot Learning.

Input : C = {t

, . . . , t

}, threshold

Output: ⟨ S ,one-shot-accuracy⟩

1 S ← InitializeSiamese( S );

2 one-shot-accuracy ← [];

3 for i = 1 to N do

4 /

Select “new” speaker s

5 ⟨t

, t

⟩ ←

SplitSamplesByCancerType(t

,0.5);

6 training_set

←

7 testing_set

←

8 /

Build training/testing sets

without t

9 for j = 1 to N do

10 if j ̸= i then

11 ⟨t

, t

⟩ ←

SplitSamplesByCancerType(t

,0.5);

12 training_set

← training_set

∪ t

;

13 testing_set

← testing_set

∪

;

14 P

← GetSimilarPairs(training_set

);

15 P

←

GetDissimilarPairs(training_set

);

16 /

Train and Test Siamese NN

17 S ←

Train( S ,P

,SV_threshold,“Triplet

Loss”);

18 accuracy ←

Test( S ,testing_set

,SV_threshold);

19 /

One-Shot Evaluation

20 P

←

GetOtherPairs(t

,{t

, . . . , t

i−1

, t

i+1

, . . . , t

};

21 correct ← 0;

22 for k = 1 to |P

| do

23 x ← Voting(P

[k], S );

24 if x == i then

25 /

Correct classification

26 correct ← correct +1;

27 accuracy

←

correct

100

;

28 one-shot-accuracy.append(accuracy

);

29 return ⟨ S ,Average(one-shot-accuracy)⟩;

3.1 Results

We have split, using a stratiﬁed approach, the dataset

into training set, consisting of the 70% of cancer sam-

ples of the dataset (1,403 samples), and testing set

consisting of the 30% (351 samples). Then, the train-

ing set has been split into two subsets: (i) the ﬁrst

one consisting of the 80% (1,122 samples) and used

to train both S and the baseline DNN, and (ii) the sec-

ond one consisting of the 20% (281 samples) and used

to validate both S and the baseline DNN. Finally, the

testing set has been used to test the two networks.

Algorithm 2: The proposed detection system.

Input : cancer_sample, request

Output: outcome

1 /

Check type of request

2 if request == “storage” then

3 /

Storage request

4 test-view ←

GetCancerType(cancer_sample);

5 if test-view != null then

6 return “cancer-type already

exists!”;

7 else

8 SaveCancerView(cancer_sample);

9 return “cancer-type stored!”;

10 else

11 /

Detection request

12 most_similar_view ←

Back-End(cancer-sample);

13 if most_similar_view != None then

14 return most_similar_view.type();

15 else

16 return “Cancer type not

found!”;

Table 2 (resp. Table 3) reports the average per-

formance achieved during the testing of the baseline

DNN (resp. S ). As we can see, the average perfor-

mances achieved by S are evidently superior to those

achieved by the baseline DNN.

Table 2: Baseline DNN average testing performance.

Accuracy Precision Recall F1 score

0.7380 0.8499 0.7977 0.7879

Table 3: S average testing performance.

Accuracy Precision Recall F1 score

0.8925 0.9760 0.9763 0.9761

This is even more evident if we look at the data

reported in Table 4, which the accuracy achieved by

both the models for each of the 16 cancer-type class.

Notice that for 6 classes (Bladder Cancer, Bone

Cancer, Breast Cancer, Cancer of Unknown

Primary, Hepatobiliary Cancer, Non-Small

Cell Lung Cancer) the baseline DNN shows

performances superior to those achieved by S , while

for the remaining 10 classes S proves to be more

efﬁcient. However, the maximum gap between

the performance by the baseline DNN and that by

S when the baseline DNN is better than S , i.e,

1.0000 − 0.9069 = 0.0931 for the class Bladder

Cancer, is lower of the the gap calculated in the

opposite case, i.e., 0.6949 − 0.1818 = 0.5131 for the

Visual Insights in Human Cancer Mutational Patterns: Similarity-Based Cancer Classiﬁcation Using Siamese Networks

467

class Endometrial Cancer.

Furthermore, the baseline DNN tends to overﬁt for

the classes that have a higher number of instances,

while the S network has a more stable behavior, try-

ing to distribute the accuracy more uniformly among

the various classes. This can be deduced from the

performances achieved in the worst cases, which are

much lower for the baseline DNN (0.1818 for the

Endometrial Cancer) than for S (0.6949 for the

Endometrial Cancer).

Table 4: Accuracy achieved by the baseline DNN and S for

each of the 16 cancer-type class.

Cancer type DNN accuracy S accuracy

Prostate Cancer 0.8513 0.9605

Non-Small Cell Lung Cancer 0.9824 0.9524

Breast Cancer 1.0000 0.9392

Soft Tissue Sarcoma 0.3030 0.7037

Colorectal Cancer 0.8823 0.9581

Glioma 0.7391 0.8280

Hepatobiliary Cancer 0.9230 0.8666

Melanoma 0.8666 0.9318

Esophagogastric Cancer 0.7000 0.7900

Pancreatic Cancer 0.7000 0.8333

Bone Cancer 0.9166 0.9537

Cancer of Unknown Primary 0.8333 0.8314

Bladder Cancer 1.0000 0.9069

Ovarian Cancer 0.4285 0.7984

Head and Neck Cancer 0.5000 0.9296

Endometrial Cancer 0.1818 0.6949

3.2 Attention Feature Heatmaps

As explained in Section 2.1, one of the main goals of

this work is to design a cancer-type detection system

one that is explainability oriented. To this aim, in the

structure of S has been integrated an attention layer

used to produce special feature heatmaps which can

help to visualize which parts of the input are more

important for the detection. Figure 2 shows the fea-

ture heatmaps generated using the attention layer of

. We remark that to facilitate the viewing and in-

terpretation of the heatmaps, we have we have super-

imposed special dotted rectangles whose color is that

indicated by the heatmap and the size is proportional

to the intensity of the highlighted areas.

As we can see, for each cancer-type class C

, the

corresponding heatmap has size 750 ×|C

| where 750

is the size of the input vector of the attention layer,

and |C

| indicates the number of instances of C

. The

most evident aspect that emerges from the visualiza-

tion of the heatmaps is that each class activates a spe-

ciﬁc set of features of the vector given in input to

the attention layer. This allows to identify a sort of

https://github.com/FLaTNNBio/few-shot-learning-f

or-cancer-detection/tree/master

visual pattern extracted from the cancer mutational

views given in input. However, it is important to un-

derline that, in this preliminary version of the work,

this explainability component still needs a lot of work

so that it can be proﬁtably used for analysis. What is

missing at the moment is a correspondence between

the areas highlighted in the heatmaps and the corre-

sponding features in the view which in fact determine

the activation of the various areas.

In the same way, however, it is important to under-

line how the production of visual information to sup-

port the analysis of this type of problem, as well as

orienting the system towards the question of explain-

ability, makes it open to the possibility of integrating

Information Visualization (IV) techniques. IV tech-

niques consist in computerized methods that involve

selecting, transforming and representing data in a vi-

sual form that facilitates human interaction for ana-

lyzing and understanding the data (Tao et al., 2004).

IV techniques have been used in many areas of Bioin-

formatics. Although they have been successfully used

in many biological domains, such as structure visual-

ization, expression proﬁle analysis, sequence analy-

sis, visualization of genome, pathway and hierarchi-

cal data, in our opinion the study of the speciﬁc pat-

terns of somatic mutations in the major cancer types

is still challenging. We believe that a system such as

the one proposed in this paper, i.e., oriented towards

an explainable and usable approach, although still in-

complete and in a preliminary form, can provide inter-

esting starting points for future work in this direction.

4 DISCUSSION AND

CONCLUSION

Although the obtained results are interesting, there are

some obvious limitations that need to be addressed.

The proposed method is a preliminary attempt

to simultaneously satisfy explainability and usabil-

ity needs when applying AI techniques in Bioinfor-

matics. In our opinion, the potential in the use of

feature maps, on which however to date there is in-

sufﬁcient evidence to demonstrate their effectiveness

in terms of explainability, is ampliﬁed by the use of

SNNs whose advantage in terms of usability is evi-

dent. However, we plan to use explainability tech-

niques that can return a heatmap with respect to the

input sequence, which is easier to interpret.

We used the term “view” and not “signature” as

the latter was already introduced in the literature, and

there are different methods to calculate them. We can-

not consider what we obtained as a real “signature”

since on the downloaded data set we only considered

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

468

Figure 2: Attention feature heatmaps generated by S for each of the cancer-type classes.

molecular features (of which 13 variables with clini-

cal information) and performed PCA.

Further investigations will be carried out with the

aim to collect larger datasets to evaluate the perfor-

mance of the model in a wider range of contexts.

REFERENCES

Bechar, M. E. A., Guyader, J.-M., El Bouz, M., Douet-

Guilbert, N., Al Falou, A., and Troadec, M.-B. (2023).

Highly performing automatic detection of structural

chromosomal abnormalities using siamese architec-

ture. Journal of Molecular Biology, 435(8):168045.

Bell, S. and Bala, K. (2015). Learning visual similarity for

product design with convolutional neural networks.

ACM transactions on graphics (TOG), 34(4):1–10.

Chen, Y., Garcia, E. K., Gupta, M. R., Rahimi, A., and Caz-

zanti, L. (2009). Similarity-based classiﬁcation: Con-

cepts and algorithms. Journal of Machine Learning

Research, 10(3).

Ciriello, G., Miller, M. L., Aksoy, B. A., Senbabaoglu, Y.,

Schultz, N., and Sander, C. (2013). Emerging land-

scape of oncogenic signatures across human cancers.

Nature genetics, 45(10):1127–1133.

Jiao, W., Atwal, G., Polak, P., Karlic, R., Cuppen, E.,

Danyi, A., De Ridder, J., van Herpen, C., Lolkema,

M. P., et al. (2020). A deep learning system accu-

rately classiﬁes primary and metastatic cancers using

passenger mutation patterns. Nature communications,

11(1):728.

Kandoth, C., McLellan, M. D., Vandin, F., Ye, K., Niu, B.,

Lu, C., Xie, M., Zhang, Q., McMichael, J. F., Wycza-

lkowski, M. A., et al. (2013). Mutational landscape

and signiﬁcance across 12 major cancer types. Na-

ture, 502(7471):333–339.

Karim, M. R., Beyan, O., Zappa, A., Costa, I. G., Rebholz-

Schuhmann, D., Cochez, M., and Decker, S. (2021).

Deep learning-based clustering approaches for bioin-

formatics. Brieﬁngs in bioinformatics, 22(1):393–

415.

Kübler, K., Karli

c, R., Haradhvala, N. J., Ha, K., Kim, J.,

Kuzman, M., Jiao, W., Gakkhar, S., Mouw, K. W.,

Braunstein, L. Z., et al. (2019). Tumor mutational

landscape is a record of the pre-malignant state.

BioRxiv, page 517565.

Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V.,

Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart,

C., Mermel, C. H., Roberts, S. A., et al. (2013). Muta-

tional heterogeneity in cancer and the search for new

cancer-associated genes. Nature, 499(7457):214–218.

Mathai, N. and Kirchmair, J. (2020). Similarity-based

methods and machine learning approaches for target

prediction in early drug discovery: performance and

scope. International Journal of Molecular Sciences,

21(10):3585.

Narmatha, P., Gupta, S., Lakshmi, T. V., and Manikavelan,

D. (2023). Skin cancer detection from dermoscopic

images using deep siamese domain adaptation convo-

Visual Insights in Human Cancer Mutational Patterns: Similarity-Based Cancer Classiﬁcation Using Siamese Networks

469

lutional neural network optimized with honey badger

algorithm. Biomedical Signal Processing and Control,

86:105264.

Tan, P.-N., Steinbach, M., and Kumar, V. (2016). Introduc-

tion to data mining. Pearson Education India.

Tao, Y., Liu, Y., Friedman, C., and Lussier, Y. A. (2004). In-

formation visualization techniques in bioinformatics

during the postgenomic era. Drug Discovery Today:

BIOSILICO, 2(6):237–245.

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

470