Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics

Christopher Southan, Péter Várkonyi, Sorel Muresan

Research output: Contribution to journalArticlepeer-review


The last two years have seen a dramatic expansion in public cheminformatics, as exemplified by the approximate five-fold growth of PubChem from over 50 contributing data sources. Consequently, medicinal chemists who were hitherto limited to commercial databases now also have access to public sources that they can download and/or query directly over the Web. The range of public sources, particularly where they link out to structured bioinformatic and biological data, already offer utilities that have no commercial equivalent. This work reviews compound content comparisons between selected public and commercial databases that capture bioactive content. We focused particularly on those that specify relationships between compounds and their protein targets. Our stringent filtering produced lower unique compound numbers than those reported for individual databases and thereby facilitated standardised comparisons of content. The resultant matrix shows the pairwise comparison of each database and selected subsets. Overall, this showed an unexpected degree of non-overlap, thereby emphasising the complementarity gained from combining public and commercial sources. This conclusion is supported by a Venn-type analysis of GVKBIO, WOMBAT (both commercial) and PubChem (public). These databases show not only overlap but also unique bioactive content in each case because of their different strategies for source selection and data collection.

Original languageEnglish
Pages (from-to)1502-8
Number of pages7
JournalCurrent Topics in Medicinal Chemistry
Issue number15
Publication statusPublished - 2007


  • Chemistry, Pharmaceutical
  • Computational Biology
  • Databases, Factual
  • Humans


Dive into the research topics of 'Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics'. Together they form a unique fingerprint.

Cite this