Massively Parallel Sequencing: Blazing New Trails in DNA Analysis – ISHI News

Mar 16 2022

Massively Parallel Sequencing: Blazing New Trails in DNA Analysis

Massively Parallel SequencingForensic

Today’s blog is written by guest blogger Ken Doyle, Promega. Reposted from The ISHI Report with permission.


Analysis of short tandem repeats (STRs) by capillary electrophoresis remains the most popular method for human identification in forensic laboratories. However, interest in massively parallel sequencing (MPS) is growing rapidly, as reflected in the increasing number of MPS-related presentations and posters at the International Symposium on Human Identification (ISHI) over the past few years.



MPS enables high-throughput analysis of forensic samples, and MPS-based methods can provide information on many more loci compared to conventional STR analysis, thereby offering greater discriminatory power. MPS is especially useful for processing challenging samples, like mixtures or degraded DNA. Modifications to a typical sequencing workflow, such as hybridization-capture sequencing, have made it possible to obtain DNA profiles from samples that previously yielded no useful information by conventional STR analysis. Recently, a hybridization-capture MPS workflow was validated by the Armed Forces Medical Examiner System’s Armed Forces DNA Identification Laboratory (AFMES-AFDIL) and used to identify historical human remains (see Technological Innovation Aids the Identification of Fallen Soldiers).


At ISHI 32, held in Orlando, Florida (and virtually), several presentations and posters discussed a variety of human identification applications for MPS. Three presentations by graduate students are highlighted here.

DNA Analysis of Historical Remains

MPS has proved valuable in working with historical human remains and other highly degraded samples. Typically, mitochondrial DNA (mtDNA) analysis is used for identification of older samples, because mtDNA is more abundant than nuclear DNA. Forensic scientists at AFMES-AFDIL routinely work with skeletal samples that have been compromised by chemical treatment or otherwise degraded over time. A vital process in the analysis workflow is the initial extraction step that is used to obtain DNA from the samples.


Elena I. Zavala, PhD Student at the Max Planck Institute for Evolutionary Anthropology, presented the results of a research project to compare DNA extraction and MPS library preparation methods for historical samples. The project was carried out in collaboration with AFMES-AFDIL, including FS2 Research Analyst Jacqueline T. Thomas, (SNA International LLC contractor supporting AFMES-AFDIL).


Conventional STR analysis often fails with historical human remains for several reasons. The samples may be chemically treated for preservation, which results in permanent changes to the DNA. In addition, the DNA may be highly fragmented and contaminated with both non-human DNA from the environment and more recent human DNA. “This means that DNA fragments of interest are limited and are likely too small for conventional methods,” Thomas explains. “Conventional methods utilize PCR, requiring DNA fragments that are approximately 100 base pairs and larger, while many of our historical samples average 60 base pairs and lower.” MPS-based analysis can overcome the size limitations, as sequencing libraries can be prepared successfully from very small DNA fragments.


Extracting usable DNA from historical remains—typically, from bones or teeth—poses its own challenges. “It is important to have a very sterile environment, starting with sampling and powdering of the bone samples, to avoid introducing modern contaminants,” Thomas says. Several variables need to be optimized for successful DNA extraction, including the amount of bone powder extracted, the composition and relative volume of the buffer used to demineralize the bone powder, and the temperature and time for demineralization. Thomas notes that demineralization of powdered bone or teeth should yield a crude extract that has released as much DNA as possible from the bone cells, or osteocytes, without causing further damage to the DNA.


The method used to purify DNA from the crude extract is also critical. Some popular DNA purification methods rely on size-exclusion spin columns, in which DNA fragments smaller than a certain size or molecular weight limit are eliminated. If such a column normally designed to purify DNA for STR analysis is used with historical DNA samples, the process may yield little or no usable DNA, because the STR analysis workflow is based on larger DNA fragments. In addition, Thomas says, “the purification step of the extraction must sufficiently purify out inhibitors that could interfere with the various enzymes used in library preparation.” Although many of the same inhibitors may be present in historical and modern samples, the buffers/enzymes in library preparation kits may not be as robust to these inhibitors. “Overall, the extraction method should aim to retain as much DNA as possible in a purified extract,” Thomas says, “so that the library preparation has access to as many informative sequences as possible.”


The research team tested several forensic DNA extraction protocols with or without a DNA repair step using uracil DNA glycosylase (UDG), a protocol optimized for ancient DNA, and a combined forensic-ancient DNA extraction protocol. After evaluating the sequencing data, the ancient DNA extraction protocol produced the highest proportion of informative sequences. This protocol is currently being tested at AFMES-AFDIL for casework samples.


Zavala’s research compared three different library preparation methods: the KAPA Hyper Prep kit, which has been used extensively at AFDIL; the SRSLY™ kit (Claret Bioscience); and an automated protocol, specifically developed for ancient DNA samples at the Max Planck Institute (MPI). The KAPA kit requires double-stranded input DNA, while the other two methods can prepare libraries from single-stranded DNA.


Since the DNA from the samples tested was highly fragmented, it’s likely that a large proportion of it was single-stranded; therefore, the KAPA kit did not produce the most complete results with these samples. “This is why we wanted to test the effectiveness of the two single-stranded DNA library preps for this sample set,” Zavala says. Sequencing data showed that both single-stranded library preparation methods produced a significantly higher proportion of informative sequences, compared to the double-stranded protocol.


Zavala expects that hybridization-capture sequencing, which can be applied to both mtDNA and nuclear DNA, could increase future success rates for identification of ancient DNA samples. She also sees the potential for increasing throughput and decreasing per-sample costs by automating workflows.


Using the Skin Microbiome for Human Identification

The skin is the largest human organ, and it plays host to a diverse collection of microorganisms, collectively known as the skin microbiome. Multiple factors affect the composition of the skin microbiome, but a core group of microorganisms tend to remain fairly stable. Understanding the interaction between the skin microbiome and skin cells has guided key discoveries in fields from cosmetology to dermatology.


Now, the skin microbiome is showing promise in yet another field—that of human identification. Allison Sherier discussed the latest results of her research into this novel forensic application. Sherier is a PhD candidate at the Center for Human Identification, University of North Texas Health Science Center, and she was an ISHI Ambassador in 2019.


In her presentation, Sherier summarized previous studies that have shown the skin microbiome composition remains relatively stable for up to 3 years. It is also an abundant source of DNA: a skin swab typically contains DNA equivalent to only four human diploid cells but enough microbial DNA to allow accurate identification. For casework samples, microbial DNA profiling can provide valuable information that complements human DNA profiling. “In a case where nuclear DNA is left behind but it is too little or too degraded to obtain a complete profile,” Sherier says, “the microbial profile collected on the same swab may be able to provide supportive information for human identification.”


Typically, microbiome characterization is performed using 16S ribosomal RNA gene sequencing such as in metagenomic studies. While this approach can provide relatively rapid and simple identification of bacterial species, it has its limitations when bacterial DNA is used for human identification. “These limitations include insufficient resolution at the species and strain level, copy number variation, inaccurate phylogenetic predictions, sample preparation bias and PCR bias,” Sherier says. As an alternative, she recommends whole-genome sequencing (WGS). However, in some situations, WGS does not cover the entire genome of all the microorganisms present in a skin microbiome sample. For Sherier’s research, she used a targeted genome sequencing panel known as hidSkinPlex. This custom panel enriches sequencing libraries in characteristic sequences for 286 microorganisms commonly found in the skin microbiome. “The hidSkinPlex allows for high power of discrimination for human identification, by focusing on specific and informative skin microorganisms,” Sherier says.

The challenge Sherier faced was how to select variants in the microbial sequences that would be informative for human identification. Her approach built on the same process used to select ancestry-informative markers (AIMs) in human populations. Sherier explains, “AIMs are genetic markers that have large differences in allele frequencies between human populations, and these differences have been used to support inferring ancestry of an unknown donor sample.” The central principle behind this process is a statistical value known as the fixation index, or Fst. “It’s one of the most common methods used to quantify genetic differentiation between populations, based on estimates of population heterozygosity between and within subpopulations,” Sherier says.


The index represents the probability that two alleles, drawn randomly from a subpopulation, are identical by descent. It is calculated as:


Fst = 1 – (Hw/Hb)


where Hw is the mean number of pairwise differences within a population, and Hb is the mean number of pairwise differences between two populations. Alleles from two completely identical populations would have Fst = 0, while two completely different populations (i.e., with no shared alleles) would result in Fst = 1. In the case of Sherier’s research, each population is the skin microbiome from a single individual. “Fst may provide insight into whether two microbial alleles are identical by descent between two microbial populations,” Sherier says.


For Sherier’s project, the research team collected skin swabs from the non-dominant hand of 51 individuals in triplicate, extracted DNA, enriched the samples prior to library preparation using the hidSkinPlex panel, and sequenced the libraries using an Illumina MiSeq® system. Pairwise Fst comparisons were conducted between all samples, and the data were analyzed to classify unknown samples to the individual that they most resembled. The accuracy of classification ranged from 88% to 95%, suggesting that AIMs in targeted microorganisms can improve the accuracy of human identification.


Sherier notes that it is premature to compare the accuracy of identification using the skin microbiome to conventional human identification analysis using STR markers. “After reliable markers are selected, and an analysis panel is developed, then more research can be done on the stability and transfer rate of the human skin microbiome,” she says.

Ancestry Information from Autosomal STRs

The growing popularity of consumer DNA tests for genealogy has reinforced the identification of specific single-nucleotide polymorphisms (SNPs) for ancestry information. When studying geographic ancestry, STRs have largely been ignored due to their high mutation rate. However, it is precisely this feature that makes STRs suitable for individual identification in forensic casework.


Laurence Devesse, PhD student at King’s College London and senior field applications specialist at Verogen Inc., presented an approach to combine STR and SNP information using MPS. A single DNA profile suitable for ancestry determination as well as searching against forensic DNA databases would offer considerable cost and time savings.


“Traditionally, autosomal STRs have not been considered to any serious extent in the field of ancestry estimation, due to the limited contrast in allelic frequencies between populations,” Devesse says. Conventional STR analysis assigns alleles based on the length of the PCR amplicon—i.e., the number of “repeats” contained at the STR locus as revealed by capillary electrophoresis. “If we see two peaks,” Devesse says, “we assume it’s a heterozygote profile.” In other words, there are two different sized alleles at that locus. “If we see one peak, we assume it’s homozygous, so both alleles are exactly the same.” However, when these same samples were analyzed by MPS, Devesse discovered that there were, in fact, sequence variations within the repeats, even though the amplicons were the same length. Thus, MPS revealed allelic variation that was missed by conventional STR analysis using capillary electrophoresis.


For example, at locus D12S391, Devesse observed 7 different versions of an allele 20 across 200 samples from the white British population. Sequence variations showed up both within the repeat region and the flanking regions of STRs. Devesse points out the need to understand and characterize these variants, and she also notes that, “we need frequencies for all of these new alleles that we’re finding.”


Devesse genotyped approximately one thousand samples from five global population groups using the MiSeq FGx™ System (Verogen). Her team prepared sequencing libraries using the ForenSeq™ DNA Signature Prep Kit, which targets 27 autosomal STRs, 7 X STRs, 24 Y STRs and 94 SNPs simultaneously, when using Primer Mix A. “What we see is that across all the markers studied, across all populations, we have a huge increase in the number of discernible alleles observed,” she says.


Further analysis brought new insights. “I started noticing just how many of these autosomal alleles only appeared to be present in one of the population groups I was studying,” Devesse says. At D12S391, she observed 38 alleles that only appeared in one population group. Most of these population-specific variants were sequence-based, not length-based. “My question was: can we predict differences in populations using sequence-specific STR alleles?”


After mapping allele frequencies, Devesse identified sets of common alleles that could indeed be used for population identification, both on their own and in combination with ancestry-informative SNPs. Previous studies by other researchers using capillary electrophoresis data suggested that doing so required a much larger marker set: up to 779 non-CODIS STR loci for adequate population differentiation. However, Devesse points out that doing so “is often not really feasible in the context of a forensic investigation.”


Attempting to narrow the scope, Devesse found that the majority of samples could be assigned to the correct population group using data from just 27 autosomal STRs. Next, she looked at whether information from ancestry-informative SNPs could be combined with autosomal STR data. She reanalyzed a subset of the DNA samples using a primer set that included 56 ancestry-informative SNPs. The addition of ancestry-informative SNP data improved ancestry assignment from autosomal STRs, but the improvement was not significant in all population groups. “The most interesting finding,” Devesse says, “is that over 85% of the samples were assigned to the correct population cluster…based on autosomal STR sequencing data alone.” She adds, “The ability to predict ancestry on a continental basis from STR results alone is an exciting, untapped resource that MPS analysis makes possible.”


What’s Next?

Future developments in MPS workflows should further lower the costs to make large-scale typing more accessible. Sherier points to the need for more user-friendly software and bioinformatics pipelines. Increasing acceptance of these methods will also require “better teaching and training tools, such as virtual reality, to educate current forensic scientists and future forensic scientists,” she concludes.


One innovation Zavala would like to see is an efficient method to assess the quality of skeletal remains before beginning DNA extraction. “A way to cheaply and quickly screen remains, so that the correct analytical techniques are applied, would revolutionize the way samples are handled,” she says.


For ancestry testing using STRs, Devesse says the main advance that will accelerate its uptake in forensics is the user-friendliness of ancestry prediction software. “The data are often there; they just need interpreting and reporting.”