[Zinc-fans] Comparison mol2 and smiles files
John J. Irwin
jji at cgl.ucsf.edu
Thu Dec 6 14:35:39 PST 2007
Hi Emanuele
I am finally getting back to your question. You are quite correct. I
just did:
> zmore sial_p0.smi.gz | awk '{print $2}' | sort -u > smiles_codes
> zcat sial_p0.?.mol2.gz | grep ZINC | sort -u > mol2_p0_codes
> wc -l smiles_codes mol2_p0_codes
114763 smiles_codes
112069 mol2_p0_codes
> diff smiles_codes mol2_p0_codes |wc -l
4265
I agree that there are a little over 2,500 differences in the mol2 and
SMILES of Sigma Aldrich in ZINC version 7, a little over 2% of the
library. This is of course wrong, and we will attempt to do better in
future versions.
Thank you for reporting this. I have put it on the ZINC errata page,
which I am finally getting around to updating.
http://wiki.compbio.ucsf.edu/wiki/index.php/ZINC:Errata
John
emanuele wrote:
> Dear all ZINC-fans,
>
> I downloaded the databases Asinex and Sigma-aldrich from the version 7
> of ZINC in both the formats SMILES and MOL2. For both the databases I
> found a difference in the molecules present in the archives, that means
> some molecules present in the multi-mol2 file and not in the SMILES and
> vice versa.
>
> Is it possible or I did some errors in the comparison?
>
> Thanks in advance
> Regards,
>
> Emanuele
> _______________________________________________
> Zinc-fans mailing list
> Zinc-fans at docking.org
> http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans
>
More information about the Zinc-fans
mailing list