distances between the atoms composing each active
(respectively inactive) molecule of the learning set.
Finally, ILP was applied on genomic annotations of
proteins coming from public databases (e.g., Pfam,
InterPro, PROSITE) in order to predict protein-
protein interactions for one specific species (Tran et
al., 2005).
To the best of our knowledge, our study is one of
the firsts which characterize 3D protein-binding sites
using ILP. This forms a natural follow-up of
previous applications of ILP focusing on the
prediction of protein 3D structure, as nowadays
protein 3D structures become increasingly available.
As for post-ILP analysis, our approach is
innovative and represents a step forward in the
interpretation of ILP results in the frame of a
knowledge discovery process. Indeed, it offers the
expert an effective assistance when exploring the
learning results including confrontation with domain
knowledge. By facilitating theory interpretation, our
approach puts the tricky problem of heuristic
parameters selection into perspective. It allows to
take benefit from more than one theory. Otherwise,
upstream investigation into the effect of numerous
parameters on discriminative selection criteria is
required as reported in (Turcotte et al., 2001).
We are convinced that our approach can be
adapted to other learning problems. Using FCA
makes it possible to discover higher-level
knowledge units by extracting from the formal
concepts first-order logic association rules between
the ILP rule bodies (Pasquier et al., 1999). Another
perspective of this study concerns the scaling up of
ILP programs. Theories can be produced on distinct
descriptors subsets corresponding to distinct views
on the examples. FCA-based joint interpretation of
the resulting theories can then enable the discovery
of rules involving descriptors from the distinct
subsets.
REFERENCES
Aloy, P., Russell, R., 2003. InterPreTS: Protein Interaction
Prediction through Tertiary Structure. Bioinformatics
Applications Note, 19 (1): 161-162.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G.,
Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P.
E., 2000. The Protein Data Bank. Nucleic Acids
Research, 28: 235-242.
De Raedt L., 2008. Logical and Relational Learning.
Springer.
Diella, F., Gould, C. M., Chica, C., Via, A., Gibson, T. J.,
2008. Phospho.ELM: a database of phosphorylation
sites – update 2008. Nucleic Acids Res., 36 (Database
issue): D240-4.
Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I., Kim, S-
H., 1999. Recognition of a protein fold in the context
of the SCOP classification. Proteins: Structure,
Function, and Genetics, 35(4): 401-407.
Durek, P., Schudoma, C., Weckwerth, W., Selbig, J.,
Walther, D., 2009. Detection and characterization of
3D-signature phosphorylation site motifs and their
contribution towards improved phosphorylation site
prediction in proteins. BMC Bioinformatics., 10: 117.
Finn, P., Muggleton, S., Page, D., Srinivasan, A., 1998.
Pharmacophore Discovery Using the Inductive Logic
Programming System PROGOL. Machine Learning,
30(2-3):241-273.
Ganter, B. and Wille, R., 1999. Formal concept analysis:
Mathematical foundations. Springer, Heidelberg,
Germany: Springer.
Guharoy, M., Chakrabarti, P., 2005. Conservation and
relative importance of residues across protein-protein
interfaces. PNAS, 102(43):15447-15452.
Humphrey, W., Dalke, A., Schulten, K., 1996. VMD-
Visual Molecular Dynamics. J. Molec. Graphics, 14:
33-38.
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan,
N. J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.
F., Gerstein, M., 2003. A Bayesian networks approach
for predicting protein-protein interactions from
genomic data. Science, 302(5644): 449-53.
Jones, S., Thornton, J., 1997. Analysis of protein-protein
interaction sites using surface patches. J. Mol. Biol.,
272: 121-32.
King, R., 2011. Logic, Automation, and the Future of
Biology. Proceedings of the Spring School on
Modelling Complex Biological Systems, Sophia-
Antipolis, France.
Muggleton, S., 1991. Inductive Logic Programming. New
Generation Computing, 8(4): 295-318.
Muggleton, S., and De Raedt, L., 1994. Inductive Logic
Programming: Theory And Methods. Journal of Logic
Programming, 19/20: 629-679.
Obata, T., Yaffe, M. B., Leparc, G. G., Piro, E. T.,
Maegawa, H., Kashiwagi, A., Kikkawa, R., Cantley L.
C., 2000. Peptide and protein library screening defines
optimal substrate motifs for AKT/PKB. J. Biol. Chem.
275, 36108-36115.
Page, D., Craven, M., 2003. Biological applications of
multi-relational data mining. SIGKDD Explorations,
5(1): 69-79.
Page, D., Srinivasan, A., 2003. ILP: A Short Look Back
and a Longer Look Forward. Journal of Machine
Learning Research 4: 415-430.
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L., 1999.
Efficient mining of association rules using closed
itemset lattices.
Journal of Information Systems, 24(1),
25-46.
Punta, M. et al., 2012. The Pfam protein families database.
Nucleic Acids Research, 40 (Database Issue): D290-
D301.
Smith, G., Sternberg, M., 2002. Prediction of protein-
protein interactions by docking methods. Current
Opinion in Structural Biology, 12(1):28-35.
FormalConceptAnalysisfortheInterpretationofRelationalLearningAppliedon3DProtein-bindingSites
119