My genotyping results, plus a brief introduction to population genetics

My 23 and Me results finally came, and I’ve spent the little amount of free time I’ve had this week exploring the results. If you are unfamiliar, 23 and Me is a personal genotyping service. In short, I sent them some DNA and they identified various genes and gave me the results. The genotyping method used by 23 and Me is different from genome sequencing, because instead of actually determining every single nucleotide in my entire genome, they isolate segments called single-nucleotide polymorphisms (SNPs) and identify the gene alleles at that location. This is much cheaper and faster than sequencing (you may remember that your DNA contains a lot of non-coding regions, redundancies, etc), and still provides a good amount of meaningful results.

My friend Razib was kind enough to include my raw data in his hobby genealogy dataset that he’s been playing with in a program called ADMIXTURE. The idea behind this software is that you input the genetic data for a group of people, and the program determines the relative contribution of hypothetical “parent” populations to each individual’s genome. Let’s look at some examples:

Click to enlarge. The K=2 at the bottom means that this plot was generated assuming only two parent populations. Usually when you set K=2 you wind up separating African vs. non-African genetic components, but this dataset doesn’t have any African populations. It is almost entirely Eurasian, so the first split is actually east (Asian) to west (European). The red component shows the proportion of European ancestry, and the teal component shows the proportion of Asian ancestry, if we assume that these are the only two reference populations. The biggest problem with this kind of analysis is that it is often hard to determine which K’s are meaningful. Razib has a good caveat on the limitations of ADMIXTURE here. Anyhow, my data is at the very bottom. As you can see, when K=2 I am overwhelmingly European with a tiny Asian component. This is more or less in line with other white Americans and many Europeans (see the Irish, Swedish, and French samples), but also very similar to, say, Palestinians. Clearly we need to add more parent populations.

Now we have K=3, so three theoretical parent populations. The first thing you may notice is that the Pima group separates out from the others as the only group with a large amount of lime green. The Pima are a Native American ethnic group, and therefore the most genetically distant from the other groups because of the amount of time they spent isolated from the rest of the world. So now our three reference groups are European (still red), Asian (now blue), and “Native American” (lime green). However, notice that the group with the second-largest green component is the Yakut, a Russian ethnic group. It is more likely that when we set K=3, the third group is actually a Siberian component. This makes a bit of sense because Native American ethnic groups are widely believed to be descended from a Siberian group that crossed the Bering Strait into Alaska. And the green seen in the other populations would likely be a result of Siberian groups moving the other way, into western Eurasia. Again, my data is at the bottom. If we assume K=3, I have much less Asian genetic input because most of it has actually segregated out into Siberian input.

You with me so far? Let’s get a little crazy.

This time K=12. The parent populations can be roughly described as: Dai (I am actually unsure exactly where this population is from, but signs point to south Asia, China/Myanmar-ish), Druze (Near East ethnic group, primarily in the countries surrounding Israel), Lahu (Southeast Asian), Southern European, Arab (1), Native American, Northern European, Arab (2), Siberian, Northeast Asian, Native American (2), and South Asian. The two different Native American groups are likely representing a north/south split, but that is just speculation on my part.

As you can see, I am somewhat unsurprisingly very European. 93% of my genome shuffles out as European, although I am a bit surprised that it is more southern than northern, especially considering that all of my genealogical lines that I can trace back to country of origin are overwhelmingly from the British Isles, with French and German minorities. This difference is probably due to early northward migrations of southern European populations into the British Isles, although I cannot count out the possibility of a more recent southern European ancestor. My paternal grandmother’s lines are largely unknown with respect to country of origin, and I always thought she was a bit swarthy (see photo). I have had multiple strangers approach me to ask if I’m Italian (I don’t see it personally, but whatever), so I think it is a bit likely that she had more recent Mediterranean ancestry.

I have roughly the same amount of Near East and Arab ancestry as your average white American or European, but I seem to have a bit more Southeast Asian? Not much more, but I’m still not sure how to reconcile that. I have the tracest amounts of Native American ancestry, a little bit on each side of the split, but not significantly more than your average American, so much to everyone in my family’s chagrin, I think it is very unlikely that I will find any recent Native American ancestors in my genealogical searches. Sorry guys.

  1. Razib says:

    >nice post michelle. re: dai, they're the south chinese parent population of the thai groups. in yunnan.

  2. >It occurred to me that the 'northern' European component might be less of what I'm thinking of and more of a– what's the word, Scandinavian?– thing, but in that case I would expect the Swedish population to have a lot more of it. So I'm right back to where I started.

  3. Zack says:

    >Who represents North European at K=12? Russians, White Americans (I assume CEU from HapMap?) and Swedes all seem to have less than 50% of it.

  4. Pingback: More on ADMIXTURE. « C6-H12-O6

