DNA as an Eyewitness: How Phenotyping is Being Used to Predict a Suspect’s Appearance

Like many teenage girls in the 1990s, Dr. Susan Walsh was fascinated by Agent Dana Scully’s work as a forensic pathologist on the television series, The X-Files. Already interested in understanding how humans work and function, watching Agent Scully use science to try to solve highly irregular cases helped to inspire Susan to pursue a degree in Biochemistry from University College in Cork, Ireland , followed by a Master’s degree in Forensic DNA Profiling at the University of Central Lancashire.


Written by Ann MacPhetridge, Promega



She further pursued a PhD in Forensic Genetics from Erasmus University and came to the US to do a post-doc at Yale University.


“… when [I was ready] to do a PhD, I thought long and hard. And that’s what I say to students as well,  “Don’t just jump into a PhD, this is 4 to 5 years of your life, so you’ve got to hold back and really, really make sure that the topic that you’re picking is what you want to do. I literally just Googled ‘appearance forensics DNA’ and up popped [Dr.] Manfred [Kayser]. I sent him an email and the rest is history, really.”


After a post-doc at Yale University, she applied for a post for a combined biology and forensics professorship at the Indiana University and Purdue University – Indianapolis and has been there since August of 2014.


Recognizing that there was much to be gained in the field of forensics from DNA phenotyping, she wanted to research methodologies that would enable more information to be gained from the DNA taken from crime scene samples. Dr. Walsh’s earlier work was focused on using SNPs to enhance understanding of the genetics behind determination of human physical appearance and ancestry. This information can be used in a forensic casework context to provide key intelligence information such as eye, hair or skin color to law enforcement. While this information is not comprehensive, it can be used to give direction to law enforcement as an investigative lead in an otherwise tough missing persons case or in a mass disaster situation.


“As we deal with quantitative traits where different genes and also environmental factors can contribute to the phenotype, we try to unveil the genetic basis of certain traits using genome-wide association approaches and next-generation sequencing, amongst others. We then try to determine what are the most-predictive biomarkers and develop molecular tools to predict these traits for practical applications in forensics and anthropological studies.”


An Overview of the DNA Phenotyping Process

The forensic DNA phenotyping workflow is not unlike that of laboratories performing now on their casework samples. Laboratories follow the same steps and utilize the equivalent types of instruments they would be using if doing autosomal STR analysis: the crime scene sample is collected, DNA is extracted and quantified from that sample, and then a STR profile is generated. The results that laboratories obtain from that profile can then be used to determine next steps. In the case where there is no comparable hit to a database, then intelligence gathering methods such as DNA phenotyping may be a valuable option. Forensic DNA phenotyping requires minute amounts of DNA. In fact, the typical input amounts of 500pg to 1ng of DNA used in STR analysis on CE instruments are much greater than what is required to perform DNA phenotyping. While there are additional clean up steps and an extra PCR step or two, the workflow is generally the same. “We look at peaks that are similar in color to that of STRs and where instead a nucleotide is shown, we have an A, C, T, and G to show the variants that we need.”


Massively parallel sequencing (MPS) has revolutionized DNA phenotyping. Capillary electrophoresis limited the number of SNPs that could be used in assays. With MPS, laboratories can combine hundreds to thousands of SNPs. “We have the freedom now to say that although it’s only introducing a 0.05% increase in prediction accuracy, just do it because it’s not impacting our assay design. This gives us a tremendous amount of freedom when designing prediction models because we can include minor contributors, interactions, and it helps push our research. We’ll see more and more accurate prediction models made because we’re able to put in 100s more variants than before.”


Another Tool in the Toolbox 

The accuracy of forensic DNA phenotyping has improved tremendously in recent years. A great deal of the accuracy differences between assays is related to how continuous the trait is in general. For instance, there are fewer categories associated with eye color than with hair color or skin color, so eye color is generally easier to predict. Results are often reported in categories, not with continuous color, because continuous color is more difficult to predict despite having a more accurate phenotype.


Pigmentation has approximately an 80% accuracy rate, with a lot of the errors being demonstrated in the boundaries of the categories. The correct variants are known but understanding the key combinations of color in various populations is still being studied, with more work to be done. Various programs exist as tools for hair structure, male pattern baldness, and height.


“Height, although it has been studied the longest, is also one of the most difficult traits. You need approximately 700 variants to get at least 70% accuracy. And even at that you are talking millimeters to centimeters differences for the phenotype. We can get the extremes–very tall or very small, but at the end of the day you are investing, many, many markers for something that I don’t think we’re going to significantly improve on. The key with phenotyping is to know where the wins are. Recognizing that we won’t be able to understand the environmental impact on many of the appearance traits.”


Her lab has worked with law enforcement, but few details have been shared by those agencies on the conclusion of the cases. Engaging with practitioners at conferences such as the International Symposium on Human Identification enables her to discuss practitioner needs in more detail with laboratory staff. Hearing firsthand about what modifications to a workflow would be acceptable versus not helps direct the assays she and her team are developing.


There are several key areas where DNA phenotyping can be engaged for investigative lead purposes. In the cases of missing persons for mass disasters DNA phenotyping can help paint a picture of the victim “Trying to give them back their life. Gives you a direction. Gives the victim their voice back.”


In a situation where a heinous murder or sexual assault has been completed and an active investigation is underway, a laboratory is often pushed to turn around their casework results as soon as possible. What many laboratories may not know is that a phenotype can be generated in the same amount of time as a DNA profile. Since DNA phenotyping requires very little DNA for the assay, running an STR assay concurrently with a phenotypic assay could provide relevant, actionable information, particularly if conventional DNA profiling is not that useful.


Genetic genealogy, a method which has seen a massive increase in interest since the arrest of the Golden State Killer in April of 2018, could also benefit from DNA phenotyping. Genetic genealogy often gives you hundreds of scenarios that the investigative team can pursue. By utilizing the phenotypic results, detectives could narrow down the number of potential suspects. For example, if the phenotypic result predicts that the DNA comes from an individual with blue eyes, investigators could concentrate on families where blue eyes are more prevalent to find that connection.


To those readers who may be concerned about privacy violations, Dr. Walsh reminds them that DNA phenotyping is very good at predicting with broad groups and does not produce individualized results. In the scenario of a case, investigators can then narrow their focus based on these broad group predictions. At the end of the day, it remains nothing more than an investigative lead. She feels that it’s actually less intrusive than other techniques and tools used by law enforcement such as CCTV.


Hurdles for Wider Adoption 

The lack of numerous validated commercial kits is limiting the adoption of this methodology. Incorporating kits and protocols developed in a research laboratory is not a viable option for most casework laboratories who simply lack the expertise and required resources to optimize a kit. Instead, mass produced kits that have undergone robust manufacturing and quality control processes are needed for DNA phenotyping to be more widely adopted by the community.


In addition to the lack of available kits, bioinformatics continues to be a hurdle. Laboratories often lack bioinformatics experts. The vast amounts of data that is generated by MPS is daunting to laboratories who are often juggling backlogs, staffing shortages, training challenges, and ever shortening deadlines. For analysts to feel comfortable testifying in court, there must be a clear understanding of the algorithms used to analyze the data. This understanding must be balanced with the need to also make kits that are user friendly and easy to troubleshoot.


From Eye Color to Facial Morphology 

Facial morphology is a generally unknown research area with tremendous potential in terms of prediction. Still in its early stages, the current work is devoted to understanding the genetics and how genes contribute to pathways. Dr. Walsh pointed out that this basic groundwork for eye color and pigmentation was done more than a decade ago. It’s essential to understand what the genes are doing, how they are contributing to a pathway, and how are they may be different between populations. Forensics and biomedical teams are learning from one another’s research and subsequently applying that knowledge to their own research, resulting in advances that mutually benefit both groups’ work. Understanding what is contributing to bone structure and cartilage is one such area where both groups are focusing their research energies.


Through collaborating with prominent researchers such as Dr. Manfred Kayser (Erasmus University), Dr. Peter Claes (KU Leuven) and Dr. Mark Shriver (Penn State University), much of the research performed by Dr. Walsh and her team is on genome wide studies and proving which genes are essential for prediction versus not, with the goal of determining which variants are causal variants and are responsible for contributing to pigmentation, and facial morphology.


Much of the work on facial morphology is centered on dividing the face into several regions and analyzing what variation exists between individuals in that particular area of the face.


“An interesting aspect of the face is that it is not flat. The face is a 3D structure. Trying to predict not only one dimension, [but] trying to predict several dimensions. The work of [Dr.] Peter Claes is really revolutionary. He is able to take his engineering research and apply it to the face.  Peter is taking a data driven approach to understand the face. He uses the face’s variation of multiple individuals to tell us what structure exists within the face. From there we try to understand what a gene may be doing in this part of the face…What kind of variation exists between individuals…it’s so exciting”


While the technology allows for predicting the curve of a lip or the structure of a chin, predicting the sides of one’s face is nearly impossible. Body mass index, age or other environmental factors contribute to the fat tissue that is deposited on the bone and influence the makeup of the sides of the face. And once it becomes possible to predict a complete facial appearance, it will not be feasible to predict one specific face and its various features. Because of the variations of environment and diet, the programs will be required to produce several faces for identification purposes.  “It will be up to the multiple faces that are predicted to give an impression of what a face could look like by combining the features of several faces. But we’re still a long way from that. Let’s find the genes and then we can see what we can do next.”