Manually Degraded SNP Microarray Data on GEDmatch Top Matches for Forensic Genetic Genealogy

Forensic Genetic Genealogy (FGG) has recently become a valuable tool in the forensic science community and is having a great impact on the resolution of unresolved cases, including homicides, sexual assaults, and Unidentified Human Remains (UHRs) cases.


FGG employs SNP sequence data uploaded to genetic genealogy databases (i.e., FamilyTreeDNA® and GEDmatch PRO®) to identify genetic relatives (i.e., genetic matches) of the unknown individual. Family tree(s) are then constructed using the genetic matches to reach a possible candidate identity of the unknown individual.


SNP sequencing (i.e., SNP microarray) typically requires high-quality/high-quantity DNA samples. Degraded DNA samples, however, are regularly encountered in forensic investigations. Therefore, a critical analysis of the impact of degraded DNA/SNP data is necessary to investigate the downstream effects this may have on the subsequent FGG analysis within the genetic genealogy databases.


Addressing this potential issue, this study investigates how manually degraded SNP DNA data files affect the top ten genetic matches generated in GEDmatch.


Following informed consent, three volunteers provided their own downloaded raw DNA SNP microarray data. Once received by the principal investigator, the data files were anonymized and subjected to a randomized manual deletion protocol using Microsoft Excel. This process is composed of increasing increments of deletion percentages from the overall SNP data profile with a total of nine modified files for each donor (minus 5%,10%, 15%, 20%,25%, 30%, 40%, and 50% deletion), each file was uploaded to GEDmatch as “Research Files”, and a list of the top ten
genetic matches based on shared DNA (total shared cM value) was produced.


Each modified file was examined using autosomal One-to-Many matching, autosomal One-to-One Q-Matching, and Segment Searching, to investigate how values and top matches were altered with increased deletion of data.


The results highlight various changes among top matches, including, but not limited to; matches that decrease/increase in total shared cM value, decrease/increase in quality scores of matching segments on a one-to-one basis, and changes to percentage confidence in predicted relationships.


Additionally, the ranking of each donor’s top ten genetic matches became altered with increasing deleted percentages, with some moving up in rank, some moving down in rank, and some lost completely (from the top ten list) when compared to the original full DNA SNP data file.


Practically, these findings highlight potential issues for match assessment as typically the top ten genetic matches are the most valuable starting point in an FGG investigation.


As FGG use grows, it is important to understand how to assess the information coming from a subject’s matches, particularly when dealing with degraded DNA samples. Overall, this research emphasizes the need for further empirical research to assess the impact of degraded DNA samples in FGG investigations.