Ellen Greytak of Parabon Nanolabs shares how they can predict a person’s appearance, and how they’re using the technology to help solve crime.
I’m Ellen McRae Greytak. I’m the director of Bioinformatics at Parabon Nanolabs and we do DNA Phenotyping – so predicting a person’s appearance, ancestry, all that stuff – just from a DNA sample.
We start from what we call genotype to phenotype data. That just means a big database where on thousands of samples we have both information about their phenotypes – like say their eye color (are their eyes blue, green, brown) and we also have genotype SNP data. So we have got about 1 million SNPs on each person and their genotype – so that’s their AA, or are they GG at each site, and what we want to know is which of those SNPs is significant for eye color. So we basically look through at each individual SNP and we score how well does this data associate with the phenotype data? If people who have an AA at this SNP always have blue eyes, well that would be a really strong association. It’s never that simple, but that’s what we’re looking for.
The thing is, it actually gets much more complicated than that. If you look through the literature in genomics, you find what they call missing heritability, which means there are these traits that we know are genetic. They pass through families, they’re very clearly heritable, but when we look across the genome, we don’t actually find all the information to explain all the variation in that phenotype. So that’s missing heritability – we’re only explaining some of the heritability.
One idea for what might be missing is actually interactions among SNPs. So you might have a set of five SNPs that each of them individually has no effect on phenotype, but if you look at all five together, there are certain combinations of genotypes that have a significant effect. But if you’ve got a million, and you need to look at five-way combinations, that’s way more than you can ever test. I mean you’d be doing calculations until the end of the universe. We don’t have time for that.
So what we’ve done is we’ve developed software that’s called an evolutionary algorithm or a genetic algorithm. So what it does is it starts from a random population – we’re still doing scoring, we’re looking at this combination of SNPs and scoring how well does this predict that phenotype? If it scores well, it gets to survive and breed and mutate and we get to look all around it at other possible combinations of SNPs. If it scores poorly, then it dies off – it doesn’t get to get searched around. So in that way, we’re trying to explore the important parts of the space while not wasting the rest of our lives looking at the entire space. So that’s how we’re trying to discover SNP associations that haven’t been found before.
So that’s just the data mining. Then we build it into a predictive model. So we take those SNPs we’ve found, and we say, “Ok, we know that that AA genotype is highly associative with blue eyes.” So, in our predictive model, it would say if I see AA, I’m going to give a higher probability to blue eyes. That’s basically what the predictive modeling is.
So for kinship, we also started from scratch. It’s a similar approach where we also have a database, except now instead of eye color being our phenotype that we want to predict, it’s “I have these two genomes and they’re related at this distance – they’re third degree relatives” for example. So, we’re taking the same approach, except instead of looking at each genotype, now you have two genomes you have to look at. So what we’re looking at is how similar are these genomes and can we use that information to predict the relatedness.
So we’ve got a database that goes out to 7th degree relatives in there as well as many unrelated pairs, and we’re using this to predict two peoples’ relatedness. The value of that for forensics, it’s two-fold. It could be maybe the perpetrator of this crime is related to another perpetrator, a victim, a neighbor. You could test any pair of people that you suspect might be related.
The challenge with faces is basically to describe a face in a way that can be predicted. So, from our point of view, if you can measure something, you can predict it – as long as there is some underlying genetics. So, what we do is we take the face and we turn it into a combination of variables. How wide is the face? How long? How angular? How big is the nose? All of those can be turned into numbers that can then be predicted. So, we’re looking (in exactly the same way we did with eye color), we’re looking for what are the genes that are involved in this? How can they be used to predict what this person’s face is like? This specific person who’s face is somewhere along all of these variables – you know, maybe their face is wide and short or long and narrow – all of those things are somewhere in that face space (as we call it). We just need to be able to predict those variables.
So, when we use this for casework, what we try to do is emphasize both what we can predict and what we can exclude. So, we don’t just say, “This person has blue eyes”, because a lot of the time, maybe this person has dark blue eyes. They’re sort of on the line between blue and green, and it’s very difficult to predict, or even know by looking at them, are their eyes blue or green? So that’s going to hurt our confidence. So, whenever we report a result, we say this person has blue eyes and we have 75% confidence in that. For example, because maybe their eyes are green. However, maybe we can also say with 99% confidence that this person does not have hazel, brown, or black eyes. And that’s what’s really important for investigators. We can tell them that you don’t need to look at the people in your suspect list who have dark eyes, because they really do not match this profile that we’ve come up with, and there’s only maybe a 1% chance that this person has dark eyes versus a 99% chance that they have blue or green eyes. So, that’s what we try to focus on, is how can you use this information to narrow a list of possible matches?
So, we’ve been using this in casework. We’ve done several dozen cases at this point where investigators come to us and say, “I’ve had this case hanging over me for years. I have no witnesses, and I just don’t know who to look for.” If they still have DNA from that case, we can do our genotyping assay, analyze that, and tell them, “Ok, you are looking for someone who matches this profile. The vast majority of the people you see walking down the street are not going to match that profile, so you need to focus in on the smaller subset of your possible matches.”
What we really want to see in the future, as this is getting out there – you know like I say, now it’s mostly that oldest, worst cold case, but imagine now a crime occurs and before you spend five years looking at every person around. If you narrowed it down to 5% of the population, suddenly, ok, I’m only looking for this blue-eyed person, and that’s really going to change the efficiency of these investigations, and that’s what we’re really looking for going forward.
WOULD YOU LIKE TO SEE MORE ARTICLES LIKE THIS? SUBSCRIBE TO THE ISHI BLOG BELOW!