Beyond Genetic Genealogy: Building Family Trees to Investigate Crime

“Class! Class!” Diahan Southard raises her voice and claps twice, cutting through the buzz of conversation that fills the room.


Dutifully, her class responds. “Yes! Yes!”


It’s a technique that Southard has learned from her experience with raising three children and over 20 years of teaching. Her audience is broken up into small groups, gathered around tables in a ballroom at the Palm Springs Convention Center, California. Palm Springs is a trendy resort town in the Coachella Valley, in the middle of the Colorado desert and surrounded by mountains. It’s frequented by Hollywood stars and seems at odds with the deadly seriousness marking many of the sessions at the 30th International Symposium on Human Identification (ISHI 30), held in September 2019. Topics at the conference ranged from improving the response to mass fatalities to forensic taphonomy research, or the study of how human bodies decompose. Southard’s class, titled “Can You Solve Your Case Using Genetic Genealogy?” closed out the final day of the conference.


Written by: Ken Doyle, Promega



The class, attended by around a hundred people, constituted a diverse group—forensic scientists, law enforcement officials, academic researchers and yes, genealogists. What united them was their determination to learn more about the field of genetic genealogy, a discipline that has recently seen a surge in popularity. The goal of Southard’s class was to introduce key principles of genetic genealogy through hands-on exercises.


Genetic Genealogy and Forensics

Genetic genealogy combines the traditional discipline of genealogy with modern DNA analysis techniques. At a fundamental level, it builds genetic relationships among individuals—family trees—based on analysis of their DNA that determines how much of the genome two individuals share. The technique, by itself, isn’t new. What’s gained recent attention is the use of genetic genealogy by law enforcement to solve crimes, known as investigative genetic genealogy or forensic genetic genealogy (FGG). That application was the focus of Southard’s class.


The case that put the spotlight squarely on FGG had remained unsolved for over forty years. A serial killer and rapist, known by several names but dubbed the Golden State Killer by the media, left a trail of victims across ten California counties from 1976 through 1986. Ultimately, it was FGG that delivered the breakthrough that Paul Holes, the lead investigator, needed to crack the case in April 2018.


However, the publicity generated in the Golden State Killer case had an unintended consequence. Along with another case using FGG later that year, the media attention prompted GEDmatch, the genetic genealogy research site that Holes and his team used, to change its terms of service regarding the use of its data by law enforcement. Previously, over a million DNA profiles in GEDmatch had been automatically opted in for law enforcement use. With the change in policy, all DNA profiles were opted out by default, and people uploading their profiles had to specifically opt in to permit access to their data by law enforcement. As of October 1, 2019, only 163,000 out of 1.3 million users had opted in, making the database considerably less useful for FGG. However, the number of opt-ins is growing, thanks to appeals from key figures in the genetic genealogy field.


The Origin Story

Southard’s fascination with the subject of genetic genealogy has its roots in high school. Her biology teacher managed to scrounge up used pipettes and left-over supplies from a local laboratory, so that his students could develop hands-on technical experience. As a result, Southard’s biology class learned basic DNA cloning techniques.


“Having a background in science means that I know that science isn’t scary,” Southard says. She cites a fear of the underlying science as one of the biggest obstacles to those who are new to the field. In addition, she says, her undergraduate education taught her the importance of the scientific method. “You make a hypothesis and then generate data, without bias toward or against your hypothesis. Then you evaluate how your data fit into that hypothesis.”


Further, Southard credits her high school English teacher for setting her on the career path that eventually led her to build her own genetic genealogy education and consulting company, Your DNA Guide. “He told all of us graduating seniors,” she says, “that the best thing we could do when we got to college was find a professor who was researching something we were interested in and get involved.” For Southard, that “something” turned out to be the archaeogenetics laboratory of Dr. Scott Woodward at Brigham Young University. “It’s studying the genetics of mummies…and dead things,” she explains. Her initial project involved analyzing teeth and bone samples from bodies found in an ancient Egyptian cemetery outside Cairo. The analysis was built on mitochondrial DNA testing to identify maternal family relationships. Although the group was able to obtain a large collection of mitochondrial DNA profiles, the challenge they faced was not having anything to use as a reference to build genealogical networks. The need for a treasure trove of DNA data, from across the world, soon became apparent.


Woodward’s research led to the formation of the Sorenson Molecular Genealogy Foundation (SMGF), named after local philanthropist James Sorenson, in 1999. “I still distinctly remember sitting in the basement of the Benson building (affectionately called the Fishbowl) on the BYU campus,” Southard recalls, “where Dr. Woodward explained how we could create a database of DNA and genealogy. At some point in the not-too-distant future, we would be able to tell where someone—anyone—came from, by just looking at their DNA.”


The SMGF began building the first genetic DNA database by collecting samples from students at Brigham Young University and developing their family trees. Soon, its efforts spread, and its reach grew to include samples from across the globe. While other college students were partying on the weekends, Southard says, she and her colleagues were traveling across the US and around the world, educating people about genetic genealogy and collecting blood samples from volunteers. “I would carry home a cooler of blood [samples] on the airplane,” Southard says, “and Monday morning, I was back in the lab.”


By 2012, the Sorenson Database contained over 100,000 DNA samples and familial pedigrees, encompassing 2.8 million genealogical records and 2.4 million genotypes. The public database contained both Y-chromosome data (for tracing paternal lineage) and mitochondrial DNA information (for tracing maternal lineage). Although the database also contained a repository of autosomal DNA information, those data were not made publicly available.


After the death of James Sorenson in 2008, enthusiasm for the project waned. The Sorenson Database was acquired by the genealogy company in 2012, whose founders were also graduates of Brigham Young University. In 2015, use of the data generated negative media and public attention, as a result of a false lead in a case from 1996 involving the murder of a young woman named Angie Dodge. As a result, took down the Sorenson Database. They claimed it had been used by law enforcement in a manner that violated the principles on which the SMGF was established.


“When the database was taken down, it did feel like the end of an era,” Southard says. There was an understanding, during the data collection, that the information would be always be free and available. Southard felt personally responsible for all of the samples she had collected, and all the volunteers whom she had encouraged to participate in the project.


From Genealogist to Entrepreneur

The sale of the Sorenson Database marked a turning point in Southard’s career. After the sale, she explains, full-time work was not an option. “I had three small kids, and I wanted to be a mom, first and foremost.” Two of her colleagues from the SMGF joined her in launching a consulting business. “That lasted for a bit, but then they both got other full-time jobs, so I created Your DNA Guide,” Southard says. “There was then, and there is now, such a huge gap between what the companies provide for the test taker and what customers need to do to actually find answers.” That knowledge gap has propelled Your DNA Guide to success, sending Southard on a quest to educate the general public, forensic investigators, and law enforcement officials about the methods and applications of genetic genealogy. Currently, she leads a team of genetic genealogists who work with clients all over the world, helping them make sense of their DNA and family history through educational content that includes a blog, frequently asked questions, videos, research and personal mentoring services. Southard is also sought after as a speaker at events and regularly conducts genetic genealogy workshops, such as the one at ISHI 30.


When asked how she balances her professional and personal lives—a challenge faced by many entrepreneurs who work from a home office—Southard makes her priorities clear. “When my kids walk in the door from school, my eyes are on them. My attention is there.” She sets ground rules for herself that include family dinners where electronic devices are banned from the table, and never working on Sundays. “If I’m ever tempted to work in the evening and someone suggests a card game,” she says, “there’s no choice. Play the game. Every time.” Southard admits that she does occasionally break her own rules, but having the rules helps her stay in control of her schedule.


Building Relationships

At Southard’s ISHI 30 workshop, the participants take on the role of an FGG investigator, trying to solve a cold case in which a woman was murdered over 30 years ago. Southard doles out information a little bit at a time, starting with an analysis of a DNA sample from the crime scene. The goal of the exercise is to identify people who may be related to the killer, by building genetic networks based on common ancestors. Just as happened in the Angie Dodge case, Southard leads the class through several twists and turns, identifying second and third cousins of the suspect. Although the process appears complex, it’s based on a simple principle that involves a unit of measure called a centimorgan (cM)—technically, a measurement of the DNA recombination frequency within a region on a chromosome. However, it is often equated to a length of DNA. In humans, on average, 1 cM corresponds to approximately 1 million base-pairs. As an article on Southard’s web site explains, ”Your total shared cM tells you how much DNA you share with another match. In general, the more DNA you share with a match, the higher the cM number will be and the more closely related you are.”


Any genetic genealogy service, such as those offered by 23andMe or AncestryDNA, provides a report that includes the total amount of DNA sequence you share with other people in the database. A free online tool, DNA Painter, includes an option that makes use of information from the Shared centiMorgan Project to predict relationships. For example, entering 200 cM of shared DNA into the tool returns a range of possibilities: there is a 45% probability that the match is a half second cousin, a second cousin once removed, or even a half great-great aunt or uncle. At the top of the chart, a parent and child will typically share 3,300-3,720 cM of DNA. At the other end, say 20 cM, the relationships become substantially more difficult to trace, stretching to sixth or seventh cousins.


In the genetic genealogy workshop, the search for the elusive killer results in building several genetic networks based on shared cM data, which Southard has her class plot out on traditional family tree diagrams. It’s nearly impossible for a FGG investigation to unearth a single match; more often than not, the process uncovers tens or even hundreds of possibilities. Narrowing down the list involves a lot of traditional investigative work—searching through obituaries, newspaper records, census information or even old store receipts. It’s certainly not as glamorous a process as often depicted in television shows.


A Path Forward

Forensic investigators who are new to the field of FGG may be daunted by the learning curve initially. The first piece of advice Southard offers is to get your own DNA analyzed and plot out your family tree. “Hands down the best way to learn,” she says, “is to watch it work in your own family, with people you know. The more you can understand about your known relationships, the more you will be able to tackle the unknown.”


Particularly when using FGG in a criminal case, Southard notes that it’s important to respect the space and follow the rules. She says that the leads provided by genetic genealogy databases are often compared to a concerned citizen calling a tip line to report suspicious activity in the neighborhood. However, DNA information is a lot more personal and sensitive. Southard admonishes users of the technique to proceed cautiously. “While the guidelines around how to use these data are still evolving, honor what has been established. That’s the best way to ensure the longevity of this technology.”


Southard is optimistic about the future of FGG. She sees the technique becoming easier to use with the development of software tools that can automate some of the labor-intensive tasks involved in finding matches and building family trees. “23andMe recently released a tool that can reconstruct a tree for a group of individuals who are second cousins or closer,” she says. “This kind of tool can certainly speed up the work that we’re doing.”


In the end, Southard’s class exercise at ISHI 30 did not yield a definitive result. Addressing the class at the conclusion of the session, Southard says she was torn by the decision of whether or not to have the investigative trail end with the positive identification of the killer. However, the exercise reflects the reality that FGG isn’t a magic wand, and some real-world cases that employ FGG still remain unsolved.


That reality, however, shouldn’t prevent an investigator from using FGG in a case that could benefit from the technique. The limitations of the data set available for law enforcement use should also not prove to be a deterrent. As Southard says, “You don’t need a million samples to solve your case. You only need a few. And perhaps the few you need are already there in the database.”