[Zinc-fans] basic question re: numbers

John J. Irwin jji at cgl.ucsf.edu
Fri Jan 23 13:29:46 PST 2009


Hi Martin

Thanks for your interest in ZINC and its contents.

Martin Stoermer wrote:
> Hi folks,
>
> A question I've been meaning to ask for a while, but put off because I
> thought I could work it out for myself...
>
> • According to "Fun Zinc facts", this morning there are about 22M
> compounds "in" ZINC
This is correct.
> • If I add up all the compounds "loaded"under the individual vendors
> (http://zinc.docking.org/vendor0/)
> <http://zinc.docking.org/vendor0/%29> it comes to ~13M compounds 
> • The Property subset "Zinc everything" (#10) has 8.5M compounds.
Also correct.
>
> Does this mean that ~9M compounds are in the DB but not fully loaded,
> or is it that they don't pass filtering, or both? I know that
> the property subsets are created and of course become slightly
> redundant almost immediately so #10 will not ever be the same 22M
> compounds as in "Fun Zinc facts", but where are the other compounds in
> the vendor-supplied-database -> "Zincification" - pipeline?
There are several factors at work. Filtering is obviously important. 
Many molecules do not qualify for any particular subset.  Two other big
ones are :

* Compounds that were once available for purchase are no longer
available as far as we know. These drop out of the ready-to-download
subsets, but remain in ZINC, in case they reappear in a future catalog

* The ready-to-download subsets are static, and take as much as a month
to generate, so there are often molecules in ZINC that haven't yet been
included in any exported subset.

In our view the "lead like" (1M or so) and the "fragment like" (200K or
so) are the subsets that are most useful for ligand discovery and are
probably all that most users will need or want. It is very important to
our own work to keep the purchsability rate of ZINC high - so we are
rather cavalier about removing compounds from the exported subsets that
we do not think you will be able to acquire.

I hope this is useful

Good docking!

John
UCSF ZINC Team



More information about the Zinc-fans mailing list