Wednesday, February 16, 2011

Applying priors on TMRCA based on surname

The match or non-match of a surname can be used as a prior to improve the accuracy of TMRCA calculations. The TMRCA just computes a likelihood based on the DNA data. But the surnames carry more information. If you have different surnames, it is unlikely that your TMRCA is in the last 500 years. Or, to turn that around in a Bayesian fashion, if your TMRCA is within the last 500 years, it is unlikely that your surname will be different. If your TMRCA is say 2000 years ago, it is very likely that your surname will be different. However, there is some chance that it will end up by luck to be the same.

So I have worked out a simple prior that encodes this information and applied it to two people who have similar likelihoods. But one is from Sweden (different surname) and one is from Scotland with my Johnston surname. See the plot above. The black is the likelihood from the DNA data. The red and green solid curves show the priors for surname-match (green) and surname-not-match (red). The dashed lines show the product of likelihood and surname prior with blue being the result depending on surname match or not. I chose 1200 years ago as about the time when people with TMRCA of that date would have a roughly 50-50 chance of ending up with the same surname and chose an effective width for this sigmoid function of roughly 100 years. I also decided not to let this function get too low at distant times. There is always some chance that you end up with the same surname by luck. I picked 10% for that parameter. Clearly one can fiddle around with these parameters.

The effect of the prior moves the TMRCA for the Johnston to slightly more recent years and trims the tale at more distant years. The effect on the person from Sweden is the opposite. Just from the surname and geographical difference, we are unlikely to have a TMRCA within the period of surname creation. That fact moves the TMRCA to more distant times. The Johnston person's maximum a posteriori TMRCA ends up about half the person from Sweden.

