Two Billion Compounds?
I've been cracking my skull against a peculiar problem this week:
How many uniquemolecules compounds have ever been made?*
I'm referring to those produced by humankind, over the past 250 years - give or take a decade - of formal chemistry effort. CAS claims 100 million molecules in their collection, and predict, at the current rate of registration, another 650 million over the next 50 years.
Certainly other databases exist, a well-curated larger example being ChemSpider (34 million), but I'm sure the Venn diagram for that against CAS overlaps quite a bit. Ditto PubChem, which according to ChemConnector had over 37 million structures in 2009, but lots of errors, duplicates, and isotopomers, to hear him tell it. Outside the med-chem arena, there are exciting new collections such as the Aspuru-Guzik lab's Clean Energy Project, to identify photovoltaic materials. Surely the assembled collection of privately-held corporate data from all chemistry, pharma, biotech, and engineering firms must include another windfall; ~200 million compounds?
So, let's try a thought exercise - say we limit the set of what we call "made," or synthesized. We won't consider polymers, whether natural (DNA, polysaccharides) or artificial (Teflon, urethanes). Screening collections, libraries, and combinatorics; unless someone produced >1 mg, I'm leaving it out. Metal complexes and salts are in, since most of the time inorganic and formulations colleagues still produce quantities you can hold and measure (and get a melting point on!).
Granted, by referring explicitly to the public and private chemistry databases, I'm not including dark reactions, those failed experiments or perhaps non-optimal yields that never make it to publication. Based on my lab career (and that of my hood-mates), I'd say there's a comfortable 5-10 molecules made for every 1 that gets reported somewhere. Of course, since many of those are literature preps or repeat reactions, I don't think it inflates the count that much; truly, novel molecules tend to creep into papers and patents somehow.
Chemical space gurus, I apologize - I only want to count things that have been bottled, columned, purified, and analyzed. Large computational data sets of billions - unless they've been made and characterized - aren't up for consideration. Neither are metabolites isolated from plants or microbes; no fair counting what we relied on other organisms to make. S'posing this means we also leave out decomposition products and geological materials.
So them's the rules: 1 mg produced and characterized, non-polymeric, must have been made or produced with human hands. Salts and metals are in, along with isotopomers and stereoisomers.
What do readers and commenters think? My guess is in the title of this post.
--
*On the Twitter, Peter Kenny points out that I should, in truth, be asking after compounds, not molecules. Fair enough.
** Another reader points out that ZINC15, the database of "stuff you can buy now," only includes ~10M at present.
How many unique
I'm referring to those produced by humankind, over the past 250 years - give or take a decade - of formal chemistry effort. CAS claims 100 million molecules in their collection, and predict, at the current rate of registration, another 650 million over the next 50 years.
Berries by the side of the road, 2016. Not counted in billions. |
So, let's try a thought exercise - say we limit the set of what we call "made," or synthesized. We won't consider polymers, whether natural (DNA, polysaccharides) or artificial (Teflon, urethanes). Screening collections, libraries, and combinatorics; unless someone produced >1 mg, I'm leaving it out. Metal complexes and salts are in, since most of the time inorganic and formulations colleagues still produce quantities you can hold and measure (and get a melting point on!).
Granted, by referring explicitly to the public and private chemistry databases, I'm not including dark reactions, those failed experiments or perhaps non-optimal yields that never make it to publication. Based on my lab career (and that of my hood-mates), I'd say there's a comfortable 5-10 molecules made for every 1 that gets reported somewhere. Of course, since many of those are literature preps or repeat reactions, I don't think it inflates the count that much; truly, novel molecules tend to creep into papers and patents somehow.
Chemical space gurus, I apologize - I only want to count things that have been bottled, columned, purified, and analyzed. Large computational data sets of billions - unless they've been made and characterized - aren't up for consideration. Neither are metabolites isolated from plants or microbes; no fair counting what we relied on other organisms to make. S'posing this means we also leave out decomposition products and geological materials.
So them's the rules: 1 mg produced and characterized, non-polymeric, must have been made or produced with human hands. Salts and metals are in, along with isotopomers and stereoisomers.
What do readers and commenters think? My guess is in the title of this post.
--
*On the Twitter, Peter Kenny points out that I should, in truth, be asking after compounds, not molecules. Fair enough.
** Another reader points out that ZINC15, the database of "stuff you can buy now," only includes ~10M at present.