types of descriptor sets. In our case, the suggested
KDD approach succeeded in unifying the ligand-
and structure-based approaches for virtual screening.
The prediction models based on the CONF
descriptor set can now be tested as knowledge-based
filters in the VSM-G screening funnel upstream the
flexible docking step in order to reduce the number
of molecules to test with the docking software.
We see two main directions for future work.
Firstly, we plan to use relational data mining
methods for mining relational data and producing
more expressive regularities (Finn et al., 1998;
Dzeroski & Lavrac, 2001; Page & Craven, 2003).
This would allow taking into account the chemical
groups composing a ligand as well as atom-specific
attributes. Secondly, we want to explore various
definitions of ligand activity together with sets of
relational descriptors for producing improved
activity prediction models.
REFERENCES
Beautrait, A. et al. 2008. Multiple-step virtual screening
using VSM-G: overview and validation of fast
geometrical matching enrichment, Journal of
Molecular Modeling, 14, 135-48.
Bennett, D.J., Carswell, E.L., Cooke, A.J., Edwards, A.S.
& Nimz, O. 2008. Design, structure activity
relationships and X-Ray co-crystallography of non-
steroidal LXR agonists. Curr Med Chem 15, 195-209.
Berman, H., WestBrook, J., Feng, A., Gililand, G., Bhat,
T., Weissig, H., Shinlyalov, I., Bourne, P., 2000. The
Protein Data Bank. Nucl. Acid. Res. 28: 235-242.
Cai, W., Xu J., Shao X., Leroux V., Beautrait A., Maigret
B., 2008. SHEF: a vHTS geometrical filter using
coefficients of spherical harmonic molecular surfaces.
J Mol Model 14, 393-401.
Dzeroski, S., and Lavrac, N.(Eds.), 2001. Relational Data
Mining. Springer.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., 1996. From
Data Mining to Knowledge Discovery: an Overview.
MIT Press, Cambridge MA.
Feher, M. (2006) Consensus scoring for protein-ligand
interactions, Drug Discovery Today, 11, 421-428.
Finn, P., Muggleton, S., Page, D., Srinivasan, A., 1998.
Pharmacophore Discovery Using the Inductive Logic
Programming System PROGOL. Machine Learning 30(2-
3): 241-270.
Halgren, T. A., Murphy, R. B., Friesner, R. A., Beard, H.
S., Frye, L. L., Pollard, W. T., Banks, J. L. 2004.
Glide: A New Approach for Rapid, Accurate Docking
and Scoring. J. Med. Chem., 47, 1750-1759.
Janowski, B.A. et al. 1999. Structural requirements of
ligands for the oxysterol liver X receptors LXRalpha
and LXRbeta. Proc Natl Acad Sci U S A 96, 266-71.
Jones G., Willett P., Glen R.C., Leach A.R., Taylor R.
1997. Development and validation of a genetic
algorithm for flexible docking. J Mol Biol., 267, 727-
48.
Jorgensen, W. L., 2004. The Many Roles of Computation
in Drug Discovery. Science 303, 5665-5682.
Karp P., Lee T., Wagner V., 2008. BioWarehouse:
Relational Integration of Eleven Bioinformatics
Databases and Formats. In Data Integration in the
Life Sciences, LNCS 5109, Springer Berlin /
Heidelberg.
Kirchmair, J., Distinto, S., Schuster, D., Spitzer, G.,
Langer, T. and Wolber, G. (2008) Enhancing drug
discovery through in silico screening: strategies to
increase true positives retrieval rates, Current
medicinal chemistry, 15, 2040-2053.
Köppen, H., 2009. Virtual screening - What does it give
us? Curr Opin Drug Discov Devel., 12(3), 397-407.
Krovat, E.M., Steindl T., Langer, T., 2005. Recent
Advances in Docking and Scoring, Current Computer
- Aided Drug Design, 1, 93-102.
Lala, D.S. 2005. The liver X receptors. Curr Opin Investig
Drugs 6, 934-43.
Maron, O., T. Lozano-Perez, T., 1998. A framework for
multiple-instance learning. In Advances in Neural
Information Processing Systems (NIPS), pages 570–
576. MIT Press.
Page, D., Craven, M., 2003. Biological applications of
multi-relational data mining. SIGKDD Explorations
5(1): 69--79.
Spencer, T.A. et al. 2001. Pharmacophore analysis of the
nuclear oxysterol receptor LXRalpha. J Med Chem 44,
886-97.
Winkler D.A., 2002. The role of quantitative structure-
activity relationships in molecular discovery. Briefings
in Bioinformatics 3, 73-86
Witten, I., and Frank, E., 2005. Data Mining: Practical
Machine Learning Tools and Techniques (Second
Edition), Morgan Kaufmann.
A KDD APPROACH FOR DESIGNING FILTERING STRATEGIES TO IMPROVE VIRTUAL SCREENING
151