5.5 displays a SMILES string for this substance. Fig.
5.6 shows its 2-D structure.
Figure 5.5: Midazolam SMILES String – This is a linear
string representing Nelarabine. The fragment in bold [red]
was used to search potential leads.
Figure 5.6: Nelarabine 2-D Structure – This formula
displays hexagonal and pentagonal rings and atoms and
groups (say NH
2
=Amino), linked by chemical bonds.
The component COC1=NC(N)=NC2=C1N=C
highlighted in bold [red] in Fig. 5.5 served as search
input. Result sets (249 up to 429 results) seen in
Table 3 are manageable. The number of substances
of interest is of the same order of magnitude.
Table 3: Nelarabine Search Results.
# Drug/Lead
Search
rank
notes
1
6-O-Methyl
Guanosine
yahoo/1 guanine derivative
used in drug design
2
alfuzosin google/2 treats benign prostatic
hyperplasia
3
6-methoxy-9-
methyl-9H-purine
yahoo/7 substance for drug
design
Result Set sizes: bing = 429; google = 276; yahoo = 249;
6 DISCUSSION
Some preliminary conclusions from case studies are:
a) one can control the jump size between
consecutive leads in the Lead-&-Search
protocol, by controlling the leads' overlap;
b) randomly
sliced SMILES strings give small
result sets, being improbable combinations of
letters; the risk of semantic ambiguity is low;
c) one expects an approximately inverse
proportional relation between components'
string size and result set size;
d) direct search of drug names, say Vancomicyn,
is too weak to be of value for discovery.
The Lead notion has been used within a closed
computational framework (Wise, 1983) and (Exman,
1988), but has not been yet applied to the Web.
Another linear molecular naming system is
InChI (
McNaught 2006). SMILES is more readable
than InChI, but conveys less information. Our choice
of SMILES can be changed, if proved necessary.
In order to demonstrate the actual efficiency of
the approach for drug discovery, an extensive
investigation of a variety of drug families is needed.
6.1 Main Contribution
Our main contribution is the randomized "Lead"
proposal phase added to "Search", forming the
"Lead-&-Search" protocol, a powerful discovery
mechanism in the Web
REFERENCES
Exman, I. and D. H. Smith – "Get a Lead & Search: A
Strategy for Computer-Aided Drug Design'', in Symp.
Expert Systems Applications in Chemistry, ACS, 196
th
National Meeting, Los Angeles, p. COMP-69, (1988).
Homans, S. W., “NMR Spectroscopy Tools for Structure-
Aided Drug Design”, Angewandte Chemie Int. Ed.
Vol. 43, pp. 290–300, (2004).
Konyk, M., A. De Leon and M. Dumontier, "Chemical
Knowledge for the Semantic Web", in A. Bairoch, S.
Cohen-Boulakia, and C. Froidevaux (eds.): DILS,
LNBI 5109, pp. 169-176, Springer, Berlin (2008).
McNaught, Alan, "The IUPAC International Chemical
Identifier: InChI", Chemistry Int., Vol. 28 (6) (2006).
OpenSMILES Standard – http://www.opensmiles.org/
Draft (November 2007).
Searls, D. B., "Data integration: challenges for drug
discovery", Nature Reviews Drug Discovery 4, 45-58
(January 2005).
Weininger, D., "SMILES, a chemical language and
information system. 1. Introduction to methodology
and encoding rules", J. Chem. Inf. Comput. Sci. Vol.
28. pp. 31-36 (1988).
Weininger, D., Weininger, A., Weininger, J.L. "SMILES.
2. Algorithm for generation of unique SMILES
notation", J. Chem. Inf. Comput. Sci, 29, pp. 97-101
(1989).
Wise, M., R. D. Cramer, D. Smith and I. Exman -
“Progress in 3-D Drug Design: the use of Real Time
Colour Graphics and Computer Postulation of
Bioactive Molecules in DYLOMMS" – in J. Dearden,
(ed.) Quantitative Approaches to Drug Design, Proc.
4
th
European Symp. on "Chemical Structure-
Biological Activity: Quantitative Approaches". Bath
(U.K.), pp. 145-146., Elsevier, Amsterdam, 1983.
COC1=NC(N)=NC2=C1N=CN2C1OC(CO)C(O)C1O
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
474