[Zinc-fans] question about searchin exact structures in ZINC

John J. Irwin jji at cgl.ucsf.edu
Tue Sep 30 04:50:29 PDT 2008


Hi Rafael

Rafael Gozalbes wrote:
> Dear Zinc-fans,
>
> I have tried to search exact structures in the ZINC database without 
> success, and I would like to know if it is possible to perform such a 
> query (lets say, is it possible for example to retrieve solely pyridine 
> if in the query I indicate the smiles of pyridine??).
>   
If you use  n1ccccc1 100 as the line in the SMILES window, 100 means
"Tanimoto 100%", i.e. identity. As you see, it finds more than just the
molecule you are looking for (pyridine is actually on page 2 in this
example). So, our fingerprint method is somewhat imprecise.  This is an
area we will return to, but it is not our primary focus, which is
instead on providing ready-to-dock databases for 3D virtual screening.
> I have tried to perform my query by using the Tanimoto and Tversky 
> thresholds as indicated in the HELP pages, but without success. 
> Furthermore, I have the impression that Tversky thresholds do not work 
> and the result is the same as  putting only the Tanimoto number.
>   
If you send me specific examples that you feel do not work, I will look
at them. It "works"  for me, given our not-quite-right fingerprints. 
For instance

n1ccccc1 100 finds pages of hits, including pyridine (ZINC ID 895354) on
page 2 </srchdb.pl?zinc=895354>.

n1ccccc1 100 0 100 finds molecules containing pyridine, generally. Also
finds indoles (!). hmmm. Anyway, they are generally larger molecules
that _include_ pyridine.

n1ccccc1 100 100 0  in this case finds more or less the same as n1ccccc1
100 above ("contained in...")
However, other patterns will find something different in this way.  Thus
C2CCC1CCCC1C2 100 100 0
Finds molecules having the 5- and 6-membered rings separately as well as
5- and 6- rings fused as drawn. It also seems to find a few other
things. OK, OK, our fingerprints are not quite right! But I hope you
will agree that this interface provides a pragmatic way to filter fairly
rapidly through 10^7 molecules to a number that can be inspected by eye.

Happy docking

John
UCSF ZINC Team



More information about the Zinc-fans mailing list