Analyzing Population Structure for Forensic STR Markers in Next Generation Sequencing Data

Thursday September 26th, 2019 // 9:50 am - 10:10 am // Oasis 1-2

After attending this presentation, attendees will better understand the framework behind the notion of population structure. Using accessible terms such as allele dosages and matching proportions within and between individuals or populations, this measure can be assessed relative to a reference set of populations. This presentation impacts the forensic science community by demonstrating the effect of sequence data on  estimates, a measure integral to DNA evidence evaluations.


Forensic DNA interpretation has been centered on the analysis of short tandem repeats (STRs), traditionally relying on capillary electrophoresis (CE) to gain access to the allele numbers contained in a DNA sample. Match probabilities calculated during the evaluation of such DNA evidence profiles rely on appropriate estimation of the population structure quantity  or theta values.


With the introduction of next generation sequencing (NGS) a new dimension has been added to the field of forensic genetics, providing distinct advantages over CE systems in terms of captured information. STR analysis has been well established in the forensic community so backward compatibility with CE-based STR profiles is needed to ensure feasibility with existing DNA databases. As long as this is the case, it is expected that NGS methods will continue to be implemented, stressing the need to facilitate NGS-based population genetics analysis.


In recent years, studies have reported population statistics demonstrating the increase in discrimination power by differentiating the nucleotide sequences of alleles with identical size. If NGS data are to be used for match probabilities there needs to be a way to accommodate population structure, which requires values for  for NGS data. This presentation will detail an appropriate approach to estimating  from sequence data to obtain single-locus and multi-locus estimates, as well as estimates on a population-level that can be applied in quantifying the strength of genomic evidence. Results will show that initial estimates indicate greater  values than those for CE data, suggesting that current CE-based recommendations may not be conservative for NGS analysis.


Sanne (Elise) Aalbers

Research Scientist, Department of Biostatistics, University of Washington (Seattle)

Sanne Aalbers' main area of interest is forensic statistics: in her BSc thesis, she analyzed the statistical evidence in the Lucia de Berk case (famous in the Netherlands for its miscarriage of justice) and for her MSc thesis, she developed novel likelihood ratio models for gunshot residue comparisons.

Submit Questions