Our goal is to define a feature vector to represent, in
a discriminant way to facilitate processing and statis-
tical analysis, each pocket of P
SES
. Many vector’s pa-
rameters need a reference plane that we have identi-
fied with the one to which it belongs the largest CH’s
triangle involved in the pocket.
2.2 Basic Features
The problem of defining an optimal set for feature
selection is complicated because, besides building
robust models, it is also important simplifying the
amount of resources required to describe the data
accurately, without ambiguity, in a very large set
of redundant and relevant information. The expert
can help, but can usually construct only a set of
application-dependent features.
An extended set of general features that can be,
in peculiar cases, partitioned in well-organized pro-
ficient subsets, is the following: i) Pocket Volume
(Laskowski et al., 1996), ii) Surface to Volume Ra-
tio, iii) Skewness and Kurtosis of Height Distribu-
tion (Blunt and Jiang, 2003), iv) Mouth Aperture (in
details we consider area, perimeter and the perime-
ter to area ratio), v) Travel Depth (Coleman and
Sharp, 2006) (Giard et al., 2008), vi) Top Peaks and
Valleys, vii) Summit Density, Mean Summit Curva-
tures (both the average of the principal curvatures of
peaks and valleys (Coleman et al., 2005), (Cantoni
et al., 2009a)), viii) Interfacial Area Ratio, and ix)
Residue Conservation (Glaser et al., 2006) (the con-
servation score for each residue in a given protein can
be obtained from the ConSurf-HSSP database (Glaser
et al., 2005)).
3 THE DATA STRUCTURE
One of the most successful approaches for shape anal-
ysis and description is the structural one. We think
that this is particularly fruitful in proteomics in which
the morphology plays a fundamental role. A complex
shape, like the Re, is segmented into its component
(the pockets set), and each pocket can be subsequently
decomposed into simpler region, and the complete de-
scription is given in terms of the region’s features and
their spatial relationship. Nevertheless, pocket shapes
can be rather complex and not directly decomposable
into simpler regions. However we can re-apply the
segmentation process of the Re into the pockets. This
process can be executed recursively. In this way a se-
quence of approximations is built, and, at each stage,
exact measures of the remaining concavities, based on
the parameters described above, are given. This struc-
tural hierarchical description and analysis, guided by
the concavities (Borgefors and Sanniti Di Baja, 1992),
seems to us a very promising effective description.
The basic structure of the approach was firstly in-
troduced in (Arcelli and Sanniti Di Baja, 1978) and in
(Borgefors and Sanniti Di Baja, 1996) has been finally
called concavity tree: “components of C (Re in our
3D case) for which the internal section of the perime-
ter (surface in 3D) exceeds the external section are
structured concavities. A more sophisticated anal-
ysis of these regions is performed to extract further
features. The envelopes CH of the concavity regions
are computed using the same process as that applied
to the original pattern. Merging can occur while fill-
ing meta-concavities, so the concavity regions must
be labeled and processed individually. For each con-
cavity region, its meta-concavities are identified. The
process continues until all regions are convex”.
The final result is a hierarchical structure, the
(meta) concavity tree. At each level the concavities
can be analyzed and described on the basis of the
above feature vector - computed at each node: ob-
viously the features defined for concavities can also
be computed for the meta-concavities.
The “concavities” (three “pockets” and one “tun-
nel”) and four second level meta-concavities of a 2D
example are shown in Figure 1. Figure 2 shows con-
cavities and meta-concavities of level two, three and
fourth for the tunnel of level one. The corresponding
concavity tree is shown in Figure 3. Note that termi-
nating nodes, i.e. regions without significant concav-
ities, are highlight with a bordeaux contour.
Figure 1: A 2D representation example with a tunnel and
three pockets in a section composed of three connected
components (in brown). The closed curve in black corre-
sponds to the first level convex hull, and the border in brown
dotted-dashed line embodies the area under analysis. Part
of the second level with three meta-concavities (A, B, C)
is shown; in evidence also five third level termination-node
components (A1, A2, C1, D1, and D2).
PROTEINS POCKETS ANALYSIS AND DESCRIPTION
213