NIST Upgrades Widely Used Database of Molecular ‘Fingerprints’

When scientists need to identify an unknown compound, they do what a police detective might do. They get fingerprints—in this case, the “molecular fingerprints” of the unknown compound—and run them through a database of fingerprints from known suspects to look for a match.

One of the world’s largest and most widely used databases of molecular fingerprints is the NIST Mass Spectral Library, and that library just got larger still. On June 6, NIST added fingerprints from more than 25,000 compounds to the library, bringing the total number to more than 265,000.



Article originally published on Reprinted with permission.



This library contains fingerprints of organic compounds—a class of carbon-containing molecules that exist in an endless variety, both natural and man-made.

“This library is used by scientists and engineers in virtually every industry,” said Stephen Stein, the NIST chemist who oversees the Mass Spectral Library. He rattled off just a few uses: diagnosing medical conditions, conducting forensic investigations, identifying environmental pollutants and developing new fuels.

“And anything having to do with food,” he said, since the taste of a food is determined by the complex mixture of organic molecules within it. “The flavor and fragrance industries live and die by this stuff.”

In this 1948 photo, a NIST staff member operates an early mass spectrometer. Credit: NIST


To generate the molecular fingerprint of an organic compound, scientists put a sample of the compound into a laboratory instrument called a mass spectrometer. In the most common practice, that instrument heats the sample to vaporize it, then shoots it with a beam of high-energy electrons. That causes the molecules to break into electrically charged fragments, which the instrument separates based on their weight, or mass.

When you line up the fragments in order of their mass-to-charge ratio, you get the molecule’s distinctive “mass spectra,” which looks like a barcode and functions like a fingerprint.

The number of organic compounds in the world is astronomical, and any database can only hope to capture a tiny fraction of them. So Stein and his colleagues have to focus on the compounds they think are most important.


The mass spectrum for the synthetic opioid, fentanyl. The mass spectrum is like a molecular fingerprint, and is used to identify unknown compounds. The red lines represent the charged fragments of the molecule created during analysis. The vertical axis shows the relative intensity, or amount, of each fragment, while the horizontal axis shows the mass-to-charge ratio for each fragment. Credit: NIST


Among the important compounds whose fingerprints are included in this upgrade are many dangerous drugs. These include dozens of synthetic cannabinoids—aka “synthetic marijuana”—which can cause psychotic episodes, seizures and death. Also included are more than 30 types of fentanyl, the synthetic opioid that is driving an epidemic of overdoses nationwide.

Having the fingerprints of these compounds in the Mass Spectral Library will help law enforcement and public health officials fight the spread of these new and dangerous substances.

NIST has released the latest version of the Mass Spectral Library, and the software needed to run it, to more than 60 distributors that bundle the data and software into mass spectrometry instruments. Owners of existing instruments can also download the latest version from distributors online.

The NIST Mass Spectral Library is actually several libraries, each covering a variation of the basic analytical method. The library that covers a technique called tandem mass spectrometry has expanded by more than 65 percent the number of compounds covered. For more information on the various libraries and software tools, check out NIST’s Mass Spectrometry Data Center.

NIST has been publishing it’s Mass Spectral Library since 1989. To ensure that the data in that library is accurate, NIST scientists apply a very high level of quality control. “It’s a very specialized activity, and nobody else does it at the level and scale we do,” Stein said.