ANALYSIS OF CORRELATION STRUCTURES IN RENAL CELL
CARCINOMA PATIENT DATA
Italo Zoppis
1
, Massimiliano Borsani
1
, Erica Gianazza
2
, Clizia Chinello
2
, Francesco Rocco
4
,
Giancarlo Albo
4
, Andr´e M. Deelder
3
, Yuri E. M. van der Burgt
3
, Marco Antoniotti
1
, Fulvio Magni
2
and Giancarlo Mauri
1
1
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
2
Department of Experimental Medicine, University of Milano-Bicocca, Monza, Italy
3
Department of Parasitology, Leiden University Medical Center, Leiden, The Netherlands
4
Department of Specialistic Surgical Sciences, “Ospedale Maggiore Policlinico” Foundation, Milano, Italy
Keywords:
Proteomics, Mass spectrometry, Hypotheses testing, Clinical analysis, Correlation, Bipartite graphs.
Abstract:
Mass Spectrometry (MS)-based technologies represent a promising area of research in clinical analysis. They
are primarily concerned with measuring the relative intensity (abundance) of many protein/peptide molecules
associated with their mass-to-charge ratios over a particular range of molecular masses. These measurements
(generally referred as proteomic signals or spectra) constitute a huge amount of information which requires
adequate tools to be investigated and interpreted. Following the methodology for testing hypotheses, we in-
vestigate the proteomic signals of the most common type of Renal Cell Carcinoma, the Clear Cell variant
(ccRCC). Specifically, the aim of our investigation is to detect changes of the signal correlations from control
to case group (ccRCC or non–ccRCC). To this end, we sample and represent each population group through a
graph providing, as it will be defined below, the observed signal correlation structure. This way, graphs estab-
lish abstract frames of reference in our analysis giving the opportunity to test hypotheses over their properties.
In other terms, changes are detected by testing graph property modifications from group to group. We show
the results by reporting the mass-to-charge values which identify bounded regions where changes have been
detected. The main interest in handling these regions is to perceive which signal ranges are associated with
some specific factors of interest (e.g., studying differentially expressed peaks between case and control groups)
and thus, to suggest potential biomarkers for future analysis or for clinical monitoring. Data were collected,
from patients and healthy volunteers at the Ospedale Maggiore Policlinico Foundation (Milano, Italy).
1 INTRODUCTION
Renal Cell Carcinoma (RCC) is the most common
tumor in the adult kidney and accounts for about 3-
4% of all adult malignancies (Brannon and Rathmell,
2010). The most frequent histological subtype (60-
80%) is the Clear Cell variant (ccRCC). There are
currently no biomarkers available for its early detec-
tion, for an efficient prognosis, and for optimal pre-
dictive therapeutic approaches (Drucker, 2005). At
present, proteomics represents a good tool for defin-
ing biomarkers in biological fluids which can char-
acterize and predict multifactorial diseases. In this
context, Mass Spectrometry (MS) techniques have re-
cently been playing an important role in studying bi-
ological samples. They are primarily concerned with
measuring the relative intensity (abundance) of many
protein/peptide molecules associated with their mass-
to-charge ratios over a particular Dalton range. The
Figure 1: A typical protein/peptide profile.
resulting measurements are often displayed as a graph
a protein/peptideprofile (Fig. 1), in which each peak
(or signal) identifies the pair of values given by the
intensity (related to the abundance) of a molecule (y–
axis) with its specific molecular mass-to-charge ra-
tio (x–axis). The final interest in handling the huge
amount of data produced from these analyses is to pe-
251
Zoppis I., Borsani M., Gianazza E., Chinello C., Rocco F., Albo G., M. Deelder A., E. M. van der Burgt Y., Antoniotti M., Magni F. and Mauri G..
ANALYSIS OF CORRELATION STRUCTURES IN RENAL CELL CARCINOMA PATIENT DATA.
DOI: 10.5220/0003856702510256
In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2012), pages 251-256
ISBN: 978-989-8425-90-4
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
rceive which peaks are associated with some specific
factors of interest (e.g., studying differentially ex-
pressed peaks between case and control groups) and
thus, to suggest potential biomarkers for future anal-
ysis (Latterich et al., 2008). However, to our knowl-
edge, most of these studies omit to consider the fol-
lowing key-points.
Constrained Classification. Case / Control discrim-
ination requirements for real-world problems are
often constrained by a given true positive or false
positive rate to ensure that the classification error
for the most important class is within a desired
limit.
Relational Information. Many domains are best de-
scribed by relational models in which instances of
multiple types are related to each other in com-
plex ways see for example (Getoor and Taskar,
2007). In this case, some features of one entity are
often correlated with features of related entities.
It is intuitive that, just as some features are not
helpful for mining data sets, some relations might
provide informations for clustering or classifica-
tion algorithms. When it comes to analyze differ-
entially expressed peaks in a case/control classi-
fication problem, comparisons are generally per-
formed between protein/peptide profiles of differ-
ent groups or between statistics summarizing
the peaks’ property of a group, (Solassol et al.,
2006). Actually, different neighborhoods in the
m/z spectra can be (anti)correlated each other
and, this property, in turn, may change from group
to group. In such a situation, the incorporation
of relational information may increase the perfor-
mance of the system for “difficult” data sets.
In order to manage the aboveissues, we formulate our
framework as follow.
1. The constrained classification is met following a
standard test of hypothesis approach. This way,
one must decide between a null hypothesis and an
alternative hypothesis. A level of significance α
(called the size of the test) is imposed on the false
alarm probability (type I error), and one seeks a
test that satisfies this constraint. The experimen-
tal design which derive from this formulation pro-
vide us with a tool for detecting regions of the pro-
teomic spectra characterized by properties differ-
entially expressed from group to group. Specifi-
cally, in these region correlations between signals
are a “powerful” discrimination factor between
groups. This detection is our primary interest in
this paper.
2. Relational informations are introduced by giving
new graph representations for the observed sam-
ples. This way, as is used to represent relation-
ships of many interacting entities, we express cor-
relations between signals in the m/z spectra of
a patient group. Throughout, we call these rep-
resentations correlation structures (shortly, tem-
plates). Arguments of our hypotheses state con-
jectures over specific graph (i.e., template) prop-
erties. Therefore, by testing hypotheses over
properties, we can decide whether these graphs
have been changed from control to case groups
(i.e, either ccRCC or non-ccRCC groups).
Given the above concerns, this paper is laid out as
follows. In sections 2 we introduce the preliminaries
and notations. In section 3 we formulate the problem.
In section 4 we report the clinical setting and some
numerical results. Finally, in section 5 we conclude
the paper by discussing some issues of this work.
2 BASIC DEFINITIONS AND
NOTATION
Graphs are important structures to model a wide range
of natural phenomena, particularly when one has to
represent complex systems of interactions among en-
tities. Throughout this paper G = (V
1
V
2
,E) de-
notes a oriented bipartite graph; that is, V
1
and V
2
are two sets of vertices such that the set of all arcs
E V
1
×V
2
connect vertices in one set with vertices
in the other: i.e., E is a set of ordered pairs (v
i
,v
j
)
with v
i
V
1
and v
j
V
2
constrained to not contain
any of the arcs (v
i
,v
j
) and (v
j
,v
i
). Given an oriented
bipartite graph G = (V
1
V
2
,E), the subgraph of G
given by
˜
G = (
˜
A,
˜
E), with
˜
A V
1
V
2
and
˜
E E is
a biclique if, for all v
1
(
˜
A V
1
) and v
2
(
˜
A V
2
)
then (v
1
,v
2
)
˜
E. Biclique are, therefore, “extreme”
forms of highly inter-connected bipartite graphs and
they will of interest in defining indexes for our anal-
isys. The number of vertexes N
v
= |V
1
V
2
| and the
number of arcs N
e
= |E| are generally called the or-
der and the size of the graph. Moreover, graphs can
be, generally, “summarized” in a compact way by var-
ious graph properties. Among all the properties in
literature (Brandes and Erlebach, 2005), here we fo-
cus on cohesion. A well known index to characterize
this notion is that of density. We treat the subject in
order to give a “local” scale of characterization for
it. While, in general, with a “global” density, we can
characterize the cohesion on the whole graph, with a
local density index as we will define below, we wish
to analyze the cohesion (i.e., by testing hypotheses),
on differently located parts of the graph. Before in-
troducing formally this notion we give the following
definition.
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
252
Definition 1 (Neighborhood). Let G = (V
1
V
2
,E)
be an oriented bipartite graph with V
1
,V
2
two well-
ordered sets of vertexes. We call M
i, j,k
(G) = (
˜
A,
˜
E)
a (i, j,k)neighborhood (or simply, a neighborhood
M
i, j,k
centered in (v
i
,v
j
)) the subgraphs of G induced
by
˜
A =
˜
V
i,k
˜
V
j,k
where
˜
V
i,k
= {v
ik
,...,v
i
,...,v
i+k
}
and
˜
V
j,k
= {v
jk
,...,v
j
,...,v
j+k
}
1
.
We are now able to give the following definition.
Definition 2 (Local Density). Let G = (V
1
V
2
,E)
be an oriented bipartite graph and M
i, j,k
= (
˜
A,
˜
E) a
neighborhood of size S centered in (v
i
,v
j
), we define
the local density of G in M
i, j,k
as
den(M
i, j,k
) =
S
|
˜
V
i,k
×
˜
V
j,k
|
. (1)
The local density is based on the ratio of the number
of arcs among a subset of vertices to the total num-
ber of possible arcs. This way they provide a measure
of “how close” M
i, j,k
is to being an oriented biclique.
Since our primary interest is to detect which regions
of the spectra express different properties from con-
trol to case group (in our case, correlation structure
properties) we stress this point with the following def-
inition.
Definition 3 (Bipartite Graph Region). Let G = (V
1
V
2
,E) be an oriented bipartite graph with V
1
,V
2
two
well-ordered sets of vertexes. We say that S is a re-
gion of G if it is the subgraph S = (
˜
V
1
˜
V
2
,
˜
E) induced
through the two sequences of vertexes
˜
V
1
and
˜
V
2
.
For a formal point of view, definition 3 says nothing
more than S is a subgraph induced by a set of ver-
texes. We give this definition purely as a matter of
convenience to point out that any region of the pro-
teomic spectra (i.e., a sequence of mass-to-charge ra-
tio values) is represented here through the region of a
bipartite graph. We use widely this term in section 3
to formulate our testing procedures.
3 PROBLEM FORMULATION
In this section we formally define the problem inside
the standard test of hypotheses framework. The sub-
jects of our formulation are tests concerning graphs
properties which can be easily obtained from the fol-
lowing new samples representations. We start by
considering a population of interest divided into two
groups; respectively case and control subjects. This
population expresses the signal intensity values ob-
servable in different regions over the spectra. We
1
We also refer to the pair (v
i
,v
j
) and the constant k as,
respectively, the center and the ray of the neighborhood
sample and represent each population group through
graphs which provide the observed signal correlation
structure as will be defined below in section 3.1. This
way, graphs establish abstract frames of reference in
our analysis giving the opportunity to test hypothe-
ses over their properties (section 3.2). In other terms,
changes are detected by testing graph property mod-
ifications from group to group. The whole procedure
provide the mass-to-charge Dalton ranges bounding
the regions where significant changes have been de-
tected.
3.1 Correlation Structure
Representation
As is used to represent structures of many interact-
ing entities, we can express correlations inside pa-
tients’ groupsthrougha graph whose vertexesare spe-
cific mass-to-charge ratios and arcs “express” corre-
lations between signal intensities with these specific
mass-to-charge values. We call the resulting represen-
tation, the (observed) correlation structure (briefly,
template). More formally, we denote the groups of
control and case subjects with I
ctrl
and I
case
respec-
tively. We assume that each group (for instance I
ctrl
)
can be expressed through a product I
ctrl
m
1
×I
ctrl
m
2
×...×
I
ctrl
m
n
of spaces I
ctrl
m
i
, i [n]
2
, given by all potential in-
tensity values whose mass-to-charge ratio is m
i
. We
also assume that each I
ctrl
m
i
is endowed with a distribu-
tion function f
ctrl
I
m
i
. More in general, let us give the fol-
lowing definition for any group of patients g on which
is defined a distribution f
g
I
m
i
.
Definition 4 (Template). By sampling from each
pair ( f
g
I
m
i
, f
g
I
m
j
), with i [n], j [n], two sets of
i.i.d. random variables {I
g
m
i,1
,I
g
m
i,2
,...,I
g
m
i,n
} and
{I
g
m
j,1
,I
g
m
j,2
,...,I
g
m
j,n
}, we call template (of g) the bi-
partite graph R
g
= (V
1
V
2
,E) with vertexes V
1
=
{m
1
,m
2
,...,m
n
} and V
2
= {m
1
,m
2
,...,m
n
}. More-
over, (m
i
,m
j
) E only if the absolute value of the
Pearson’s correlation coefficient exceeds a threshold
δ. That is,
ρ
g
i, j
=
n
k=1
(I
g
m
i,k
I
g
m
i
)(I
g
m
j,k
I
g
m
j
)
q
n
k=1
(I
g
m
i,k
I
g
m
i
)
2
q
n
k=1
(I
g
m
j,k
I
g
m
j
)
2
δ,
(2)
where
I
g
m
i
and
I
g
m
j
are the sample means.
Notice that, given the template R
g
= (V
1
,V
2
,E)
and any region S of R
g
, we can easily provide a set of
2
We use the bracket notation [n] to denote the set
{1,...,n} of the first n positive integers.
ANALYSIS OF CORRELATION STRUCTURES IN RENAL CELL CARCINOMA PATIENT DATA
253
Figure 2: The bipartite graph for RCC data (template) with
one region and two neighborhoods.
densities { d
1
,d
2
,...,d
n
} by observing a set of neigh-
borhoods in S. For example, in Fig. 2 is reported
a subgraph of R
g
with one region and two neigh-
borhoods M
g
1
and M
g
2
.
3
Yet it is clear that, these
neighborhoods provide the set of local density val-
ues D
g
S
= {den(M
g
1
),den(M
g
2
)}. We assume that D
g
S
are observations from a distribution (of densities) re-
ferred to the region S. Throughout, we will consider
for any pair of templates R
ctrl
and R
case
the set of den-
sities D
ctrl
S
and D
case
S
as samples of observations real-
ized in a common region S to test local hypotheses
over a (density) population.
3.2 Hypothesis Testing
We recall that, statistical hypotheses (noted as H
0
and
H
A
) are competing statements concerning the popu-
lation parameters. The rationale for establishing our
hypotheses is deciding whether a pathology (for in-
stance, ccRCC) has modified the cohesion of a control
groups correlation structure. Since we use density to
analyze cohesions, we should also say that for two
groups of densities, to be consistent with the above
rationale, it suffices that µ
ctrl
6= µ
case
, where µ
ctrl
and
µ
case
are the means in the control and case groups
of densities. Therefore, given (i) the (paired) sam-
ples of densities D
ctrl
= {X
1
,X
2
,...,X
n
}from controls,
and D
case
= {Y
1
,Y
2
,...,Y
n
} from cases, (ii) their dif-
ferences D = {D
i
: D
i
= X
i
Y
i
,X
i
D
ctrl
,Y
i
D
case
},
(iii) the sample mean
˜
D and (iv) the sample standard
deviation of difference scores S
d
, we can reject the
null H
0
: µ
ctrl
= µ
case
(no change) in favor of the alter-
native H
A
: µ
ctrl
6= µ
case
using
T =
˜
D
S
d
/
n
(3)
as test statistic which, in turn, follows a Student’s t-
distribution with n1 degree of freedom if H
0
is true.
3
For sake of clarity to specify the group g from which
the neighborhood M is drawn, we also use the notation M
g
.
Thus, we apply a classical two-sample, paired t-test,
rejecting the null when the realization t of the statistic
in expression 3 is such that |t| > t
1α/2
(n1), where
t
1α/2
(n1) is the quantile of Student’s t-distribution
with n 1 degrees of freedom. As argued above, the
use of local densities gives us the opportunity to ana-
lyze the cohesion in different parts of the graph. This
way, we can consider different regions over the spec-
tra through different “local statistics”, and perform
different tests. Specifically, as noted in section 3.1,
given a common region S for both (the templates)
R
ctrl
and R
case
, we obtain two sets of densities D
ctrl
S
and D
case
S
. As previously stated, using these data as
observations provided by sampling both the control
and the case groups in S, we are able to apply the
test H
0
: µ
ctrl
S
= µ
case
S
against H
A
: µ
ctrl
S
6= µ
case
S
for any
region S; that is, by observing different regions, we
test the cohesion modifications from group to group
in different parts of the spectra. Given the above argu-
ments, we can define different classes of case/control
tests through the following procedures:
Control vs. ccRCC Tests (Noted as CVR Tests).
1. We represent R
ctrl
by sampling from each
pair ( f
ctrl
I
m
i
, f
ctrl
I
m
j
) in the control group, the
sets of i.i.d rvs {I
ctrl
m
i
,1
,I
ctrl
m
i
,2
,...,I
ctrl
m
i
,n
} and
{I
ctrl
m
j
,1
,I
ctrl
m
j
,2
,...,I
ctrl
m
j
,n
}.
2. We represent R
rcc
by sampling from each
pair ( f
rcc
I
m
i
, f
rcc
I
m
j
) in the ccRCC group, the
sets of i.i.d rvs {I
rcc
m
i
,1
,I
rcc
m
i
,2
,...,I
rcc
m
i
,n
} and
{I
rcc
m
j
,1
,I
rcc
m
j
,2
,...,I
rcc
m
j
,n
}.
3. Given any region S, common both to R
ctrl
and R
rcc
, we obtain the local densities D
ctrl
S
=
{den(M
ctrl
1
),den(M
ctrl
2
),...,den(M
ctrl
n
)} and
D
rcc
S
= {den(M
rcc
1
),den(M
rcc
2
),...,den(M
rcc
n
)}.
Then for each S, we employ these sets (as ob-
servations from a density population) together
with Eq. 3 (as test statistic) in the following
tests: H
0
: µ
ctrl
S
= µ
rcc
S
Vs. H
A
: µ
ctrl
S
6= µ
rcc
S
,
where µ
ctrl
S
and µ
rcc
S
are, respectively, the (pop-
ulation) means of the densities in the control
and ccRCC groups.
Control vs. Non-ccRCC Tests (CVNR Tests).
1. We represent R
ctrl
by sampling from each
pair ( f
ctrl
I
m
i
, f
ctrl
I
m
j
) in the control group, the
sets of i.i.d rvs {I
ctrl
m
i
,1
,I
ctrl
m
i
,2
,...,I
ctrl
m
i
,n
} and
{I
ctrl
m
j
,1
,I
ctrl
m
j
,2
,...,I
ctrl
m
j
,n
}.
2. We represent R
nrc
by sampling from each
pair ( f
nrc
I
m
i
, f
nrc
I
m
j
) in the non-ccRCC group,
the sets of i.i.d rvs {I
nrc
m
i,1
,I
nrc
m
i,2
,...,I
nrc
m
i,n
} and
{I
nrc
m
j,1
,I
nrc
m
j,2
,...,I
nrc
m
j,n
}.
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
254
3. Given any region S, common both to R
ctrl
and R
nrc
, we obtain the local densities D
ctrl
S
=
{den(M
ctrl
1
),den(M
ctrl
2
),...,den(M
ctrl
n
)} and
D
nrc
S
= {den(M
nrc
1
),den(M
nrc
2
),...,den(M
nrc
n
)}.
Then for each S, we employ these sets (as ob-
servations from a density population) together
with Eq. 3 (as test statistic) in the following
tests: H
0
: µ
ctrl
S
= µ
nrc
S
Vs. H
A
: µ
ctrl
S
6= µ
nrc
S
,
where µ
ctrl
S
and µ
nrc
S
are, respectively, the means
of the densities in the control and non-ccRCC
population groups.
ccRCC vs. non-ccRCC Tests (RVNR Tests).
1. We represent R
rcc
by sampling from each
pair ( f
rcc
I
m
i
, f
rcc
I
m
j
) in the ccRCC group, the
sets of i.i.d rvs {I
rcc
m
i
,1
,I
rcc
m
i
,2
,...,I
rcc
m
i
,n
} and
{I
rcc
m
j
,1
,I
rcc
m
j
,2
,...,I
rcc
m
j
,n
}.
2. We represent R
nrc
by sampling from each
pair ( f
nrc
I
m
i
, f
nrc
I
m
j
) in the non-ccRCC group,
the sets of i.i.d rvs {I
nrc
m
i,1
,I
nrc
m
i,2
,...,I
nrc
m
i,n
} and
{I
nrc
m
j,1
,I
nrc
m
j,2
,...,I
nrc
m
j,n
}.
3. Given any region S, common both to R
rcc
and R
nrc
, we obtain the local densities D
rcc
S
=
{den(M
rcc
1
),den(M
rcc
2
),...,den(M
rcc
n
)} and
D
nrc
S
= {den(M
nrc
1
),den(M
nrc
2
),...,den(M
nrc
n
)}.
Then for each S, we employ these sets (as ob-
servations from a density population) together
with Eq. 3 (as test statistic) in the following
tests: H
0
: µ
rcc
S
= µ
nrc
S
Vs. H
A
: µ
rcc
S
6= µ
nrc
S
,
where µ
rcc
S
and µ
nrc
S
are, respectively, the means
of the densities in the ccRCC and non-ccRCC
population groups.
We point out that, each of the above class is char-
acterized to have the same alternative conjecture but
test statistics related to different parts of the graph.
We shall also say that, while evaluating higher perfor-
mance tests we may also observe in which regions of
the spectra there are the best chances of seeing dis-
criminative effects between alternatives.
4 CLINICAL SETTING AND
NUMERICAL RESULTS
The above analysis has been applied to samples col-
lected, after informed consent from all subjects par-
ticipating in the study, at the Ospedale Maggiore
Policlinico Foundation (Milano, Italy) using a stan-
dardized protocol. As a first step the morning urine
midstream (100 mL) was collected in tubes. Af-
ter centrifugation at 3000 rpm for 10 minutes sam-
ples were divided into aliquots. For peptide and pro-
tein profiling the eluates from Weak Cation Exchange
magnetic beats extraction were automatically spotted
onto a Matrix–Assisted Laser Desorption Ionization
(MALDI) target plate. All samples were analyzed
using an UltraFlex II MALDI-TOF/TOF MS instru-
ment (Bruker Daltonics) and mass spectra were ac-
quired in positive linear mode in the m/z range of
1000-12000. ClinProTools 2.2 software (Bruker Dal-
tonics) was used for all MS data interpretation proce-
dures (Bosso et al., 2008).
4.1 Clinical Data
The samples cohort consists of 85 control subjects (58
men, 27 women) and 102 Renal Cell Carcinoma pa-
tients (64 men, 38 women). Mean age for controls
was 45 with a range of 30–68 years, while for pa-
tients 64 with a range of 33–88 years. It was possi-
ble to classify pathological group in patients affected
by clear cell (ccRCC) and other different histolog-
ical subtypes (respectively 79 ccRCC and 23 non-
ccRCC). ccRCC samples were classified according to
the 2002 TNM (tumor-node-metastasis) system clas-
sification.
4.2 Numerical Results
Before discussing the numerical results, it might be
useful to remember that the decisions of a statistical
test depends on a number of factors; e.g., the sam-
ple size, the test statistic, the significance level and
the critical value. Moreover, we introduced new pa-
rameters which may influence the result as well; i.e.,
the threshold δ (employed for the template representa-
tion) and the neighborhood ray K. We also stress that,
in each class CVR, CVNR and RVNR (as defined in
section 3.2), tests follow common conjectures (e.g.,
µ
ctrl
= µ
rcc
and µ
ctrl
6= µ
rcc
) but they use statistics re-
ferred to different regions over the spectra. With the
above concerns in mind, we summarize the targets of
our experiments as follows.
1. For each class of tests, we evaluate (empirically)
which threshold δ, and ray K are employed to
detect the lowest number of correlation structure
changes from control to case groups. In other
terms, for different pairs of δ and K we count the
number of significant tests rejecting the null hy-
pothesis. For this, we constrain δ to range within
a set of higher Pearson’s correlation coefficients.
2. By using the values of δ and K obtained above,
we detect the mass-to-charge ratio bounds which
identify modified regions over the spectra. That
is, regions where we have detected a correlation
ANALYSIS OF CORRELATION STRUCTURES IN RENAL CELL CARCINOMA PATIENT DATA
255
structure modification at a specific level of signif-
icance.
Indeed, we first established a fixed number of re-
gions (i.e., 7), a set of arbitrary thresholds T =
{0.75, 0.76,0.77,0.78, 0.79,0.80} and a set of arbi-
trary rays R = [6]. Then, for each combination of
δ T and K R, we evaluated (for each class of tests)
the number of significant tests rejecting the null hy-
pothesis over the spectra. In tab. 1, we report, for
each class, both the pair (δ,K) employed to detect the
lowest number (i.e., n = 1) of tests rejecting the null,
and the mass-to-charge ranges which identify the re-
jection regions at a 5% significance level.
5 CONCLUSIONS
This study showed the possibility to use the extracted
peptides to separate healthy subjects from tumor pa-
tients and mostly to distinguish non-ccRCC from
RCC. By testing hypotheses on a specific graph prop-
erty (i.e., density), we derived decision procedures
able to provide the clinical modeler with lists of Dal-
ton ranges where it has been detected distinguishing
regions. We point out that, from a clinical perspec-
tive, in order to apply this approach (for example, to
decide the membership group of new subjects), it will
be necessary to compute a correlation matrix (whose
components are given by Eq. 2) over a set of techni-
cal replicates. This will be the most obvious extension
for our next work when new (biological and technical)
samples will be available. Moreover, we can sum-
marize, as follow, some further extensions which we
are immediately interested to: (I) We need to deter-
mine conclusively the identity of the lists of signals
in any differentially expressed region. The theoret-
ical framework of section 3 was employed to detect
spectral signals for their biological importance (for
instance, to suggest potential biomarkers for future
analyses) even their identity is not yet ensured. Identi-
fication of the peptides/proteins, generating these sig-
nals, is a very laborious process implying the analy-
sis of the urine extract with different MS approaches.
Therefore, in order to recognize candidate multiple
biomarkers, for a specific disease, it’s important first
to determine their diagnostic “power” and then to in-
vestigate better their biological role in the disease
Table 1: Mass-to-Charge regions for Control vs. Case.
CVR CVNR RVNR
δ = 0.75, K = 2 δ = 0.75,K = 2 δ = 0.75,K = 2
From To From To From To
1719 2084 1719 2084 4625 5374
mechanisms. (II) The dominant approach to classi-
fier design in clinical studies has been to minimize the
probability of error – see for example, (Dudoit et al.,
2002). Yet it is clear that failing to detect a malignant
tumor has drastically different consequences than er-
roneously flagging a benign tumor. In other words,
classification requirements are often constrained by a
given true positive (type I error) and false positive rate
(type II error) to ensure that the classification error
for the most important class is within a desired limit.
In order, for our procedures to take into account all of
these two requirements, it is necessary to constrain the
type II error. We point out that, here by constraining
only the type I error, we applied a methodology ap-
proach mainly to provide the list of modified regions.
ACKNOWLEDGEMENTS
This work was supported by grants of the Ital-
ian Ministry of Research: PRIN 2006, FIRB 2007
(RBRN07BMCT
11), FAR 2006–2011, and in part
by both the EuroKUP COST Action (BM0702) and
the NEDD project of the “Regione Lombardia”.
REFERENCES
Bosso, N., Chinello, C., Picozzi, S., Gianazza, E., Mainini,
V., Galbusera, C., Raimondo, F., Perego, R., Casel-
lato, S., Rocco, F., Ferrero, S., Bosari, S., Mocarelli,
P., Kienle, M. G., and Magni, F. (2008). Human urine
biomarkers of renal cell carcinoma evaluated by clin-
prot. Proteomics - Clin. App., 2:1036–1046.
Brandes, U. and Erlebach, T., editors (2005). Network Anal-
ysis: Methodological Foundations, volume 3418 of
Lect. Notes in Computer Science. Springer.
Brannon, A. and Rathmell, W. (2010). Renal cell carci-
noma: where will the state-of-the-art lead us? Curr.
Oncol. Rep., 12:193–201.
Drucker, B. (2005). Renal cell carcinoma: current status
and future prospects. Cancer Treat. Rev., 31:536–545.
Dudoit, S., Fridlyand, J., and Speed, T. (2002). Compari-
son of discrimination methods for the classification of
tumors using gene expression data. J. of the American
Stat. Assoc., 97(457):77–87.
Getoor, L. and Taskar, B. (2007). Introduction to Statistical
Relational Learning. The MIT Press.
Latterich, M., Abramovitz, M., and Leyland-Jones, B.
(2008). Proteomics: New technologies and clinical
applications. Eur. Jour. Cancer., 44:2737–2741.
Solassol, J., Jacot, W., Lhermitte, L., Boulle, N., Maude-
londe, T., and Mang, A. (2006). Clinical proteomics
and mass spectrometry profiling for cancer detection.
Expert Rev. Proteomics, 3(3):311–320.
BIOINFORMATICS 2012 - International Conference on Bioinformatics Models, Methods and Algorithms
256