Haplogroups, Haplotypes and genealogy, oh my


I received some criticism the other day over my own criticism of the use of haplogroups in genealogy -- the finding and tracing of relatives. My language was imprecise so I want to make a correction and explore the issue in a bit more detail.

One of the most basic facts of inheritance is that while most of your DNA is a mishmash of your parents (and all their ancestors before them) two pieces of DNA are passed down almost unchanged. One is the mitochondrial DNA, which is passed down from the mother to all her children. The other is the Y chromosome, which is passed down directly from father to son. Girls don't get one. Most of the mother's X chromosome is passed down unchanged to her sons (but not her daughters) but of course they can't pass it unchanged to anybody.

This allow us to track the ancestry of two lines. The maternal line tracks your mother, her mother, her mother, her mother and so on. The paternal line tracks your father, his father and so on. The paternal line should, in theory, match the surname, but for various reasons it sometimes doesn't. Females don't have a Y, but they can often find out what Y their father had if they can sequence a sample from him, his sons, his brothers and other male relatives who share his surname.

The ability to do this got people very excited. DNA that can be tracked back arbitrarily far in time has become very useful for the study of human migrations and population genetics. The DNA is normally passed down completely but every so often there is a mutation. These mutations, if they don't kill you, are passed down. The various collections of mutations are formed into a tree, and the branches of the tree are known as haplogroups. For both kinds of DNA, there are around a couple of hundred haplogroups commonly identified. Many DNA testing companies will look at your DNA and tell you your MTDNA haplogroup, and if male, your Y haplogroup.

There is another way to type Y DNA, however, known as the short tandem repeat. This test relies on changes in the so-called "junk DNA" of repeated sequences. SNP mutations (which drive haplogroups) are rare, but STR changes are more common. Your list of STRs is called a haplotype and is much more specific.

Services like 23andMe give you your haplogroups. You will share a haplogroups with your siblings, one of your parents, one of your grandparents, one of your great-grandparents and so on. You will also share it, by descent, with some fraction of all those people's descendants. (With two haplogroups you share with 2 of each generation.)

Your family tree is very bushy. Go back 10 generations, and you have 2^10 or 1024 ancestors at that level. Most people actually have slightly fewer because we all start duplicating ancestors before that point. Go back 100 generations (2,000 years) and you would have a million trillion trillion ancestors, if not for the fact that each one re-appears billions of trillions of times in your family tree. Yes, more than a Sagan. However, the maternal and paternal lines identify just 2 different ancestors out of that huge tree, the one on the far left and the one on the far right.

Rules of the patriarchy aside, these ancestors are not any more special or important than any others. What is different is that we can see them in the DNA. Sometimes, because they are the only thing about the past we can see, we get overly excited, as in the old joke:

A scientist is seen crawling around a lamp-post looking for something. When asked what he is doing, he says, "looking for my car keys."

"Did you lose them here?" asks the other scientist.

"No, I lost them over there by my car. But the light is here."

I've watched people looking for relatives get very excited to discover somebody with the same haplogroup. They feel a connection. They start to feel that they must be related to the person, or if they are a suspected relative, that they will be related along the paternal or maternal lines. In fact, all the haplogroup shows is a common ancestor thousands or even 10s of thousands of years ago. One common ancestor out of millions of common ancestors. For if you go back that far, all people from our rough geographic region are common ancestors, not just the mothers and fathers of the haplogroups. People get really excited about the idea of a mitochondrial Eve and a Y-Adam, even though we are also commonly descended from everybody who was alive and offspring-productive in those eras. Everybody was the ancestor of everybody if you go back that far. In fact, the "everybody then was the ancestor of everybody now" period known as the "universal ancestor" point would be fairly recent if it were not for the geographic isolations of populations on Australia, the Americas and many Islands. Everybody today who has successful children will be the ancestor of everybody in perhaps just 1,000 years if current trends follow as they are. That's because now we have airplanes, and even 50 generations at a zero-population growth rate of 2.3 children per couple, you get a million trillion theoretical descendants.

Some haplogroups are more rare than others, and some are much more common within certain ethnic groups and locations, due to the slowness of pre-20th century human migration. As such they help population geneticists track patterns of migration.

Because there are so few haplogroups, a match is not particularly unlikely, and if you have one of the haplogroups that is particularly common in your ethnic or geographic grouping, a random match is actually fairly likely. A non-match on haplogroups does confirm that the other person is not your relative on that very specific family line, but not that they aren't your relative. A non-match with a sibling or other close relative for whom the known family tree demands a match does indicate a mismatch of genetic and known parentage. And a match on a very close relative (1st cousin and perhaps 2nd cousin) does indicate it is probable (but not certain) the common ancestor will be on the appropriate line.

But on the other hand, consider a haplogroup match with somebody more distant, like a possible 5th cousin. There have been 6 generations from the common ancestor to you. The common ancestors will be one of your 32 pairs of g-g-g-g-grandparents. Of those 32 pairs, one has your maternal line g-g-g-g-grandmother. (Let's look at the maternal haplogroup for now.)

But those 32 pairs all started their own family trees, and the children, 6 generations on, are you and your cousins, all the way up to 5th cousins.

How many such children do they have? That varies a lot, but one thing that doesn't vary two much is that only half will pass down their haplogroup. For Y-haplogroup, only the sons will get it at all. For MT-haplogroup, all will get it but only the girls will pass it on.

So for that 6-generation mother who gave you your haplogroup, only 1/32nd of her descendants will get her haplogroup, including your immediate family. In fact, many of them won't be 5th cousins because they are closer relatives. For example, if the matriarch had just one daughter and 2 sons, then none of your 5th cousins from her got the haplogroup, as everybody is closer. If she had two daughters, the other daughter (gggg-aunt) passes it on but with no multiplier that generation. It varies a lot but there's an argument that a typically only 1/64th of her descendants will get her haplogroup and be 5th (but not 4th or closer) cousins. If we also assume that all the other ancestors had children at a similar rate, we now see that the odds of a 5th cousin or closer having also gotten her haplogroup are about 1 in 2000, and if we remove the closer cousins about 1 in 1700.

With the Y-group, as only half the people in the current generation have it, it's twice as unlikely.

Thus the problem. A 5th cousin sharing your haplogroup has only 1 in 1700 odds of doing it through the common ancestor. But there are only around 200 haplogroups. So it's much more likely they share it just by chance at this level. Worse, if your haplogroup is a common one in your population, it is a great deal more likely to have happened by chance. Of course this doesn't mean it didn't happen by descent, and in fact there will be just a few for whom it did. But it is a bad assumption to make.

Another way to see this is to imagine that surnames are passed down as reliably as Y chromosomes. They aren't, because there are lots of non-sired children due to adoption, infidelity and even sperm donation. In additions, surnames change over time and many cultures did not even have surnames in the fairly recent past. As you probably realize, only a tiny fraction of your 5th cousins share your surname -- again just one in 2,000. (One in 4,000 if the women of your generation have all changed surname, which of course does not always happen any more.) The haplogroup is something akin to the first two letters of your surname. So if I meet a cousin, and all I learn is that his or her birth surname starts with "Te" the odds are again much more likely that it's something else, and not Templeton. No surname, not even Smith or Chang, is as frequent as some haplogroups are in their main populations, though.

When it comes to the haplotypes, things are much more specific. Some companies, like Family Tree DNA, will scan 37 or 67 of the STR markers. They claim that a match on 37 markers indicates a recent ancestor "within the period of human record keeping" which is known as the era of genealogy. And a match on the 67 markers indicates a common ancestor "within recent times" though they don't specify a time, it is implied that it's less than 2 centuries. Such matches are more useful for genealogy. Now it still remains the case that only a small fraction of your distant cousins will share your Y and its haplotype, so the odds of finding the sort of matches hoped for are still rare. But unlike the haplogroup, if you find a match in the haplotype it is quite probably real, particularly if the surname matches.

Over time all of this is going to get better, and in particular it's going to get better through the non-haploid DNA, which involves all your ancestors, not just these 2 at the edges of your bushy tree.

In particular, the more people map their DNA, the more individual segments will start getting tracked and even attributed to particular individuals. We should be able to get partial reconstructions of the genomes of people who lived several generations ago by sequencing enough of their descendants. From these reconstructions, combined with more traditional family tree maps, DNA sequencing should, before too long, be able to plunk you down pretty precisely into a fairly complete family tree. Because while most people don't know their great-great grandparents (or even the great-grandparents) all it takes is some descendants who did know them to key in and sequence the appropriate data. Just one link will tie people into a vast web.

How much does this mean? Probably not a lot. I see no special bond among 3rd and later cousins, not even among 2nd cousins. I'm told I had a 3rd cousin who lived on the next street over when I was a kid. I don't even remember anything about that. It may have an effect on people who perhaps hold prejudices against some ethnic groups who discover they are part of that group, who knows?

Our connection to the more distant "haplomothers" who were the first members of a haplogroup is even more remote. Most haplogroups formed 10,000 to 30,000 years ago. This is past the "universal ancestor point" which means that you are descended from almost everybody alive in that era, not just from the haplomother. She's just one of millions from whomyou are descended. And because you only have under 30,000 protein encoding genes, chances are you got none of your genome (except the mitochondria) from her. And your mitochondria are already almost identical to everybody else's on the planet. She is a truly meaningless ancestor, and the only thing that makes us pay attention is that we can identify approximately when she lived, and track broad populations of people using that.


Can two brothers with the same parents have different haplogroup numbers? Let's say mine is R-M417 and my brother is R-M512. Is that possible?

These are your male haplogroups. Do your material ones also mismatch? If your paternal groups don't match, well, the hard truth is it's pretty unlikely that's an error, and a significant chance that you have different biological fathers. If you were not aware of that, it's obviously big news that should be approached with great caution. It could be your real father (the one who raised you) is fully aware of this and decided not to tell you. It could be he's not aware, and spreading this information could damage your family. It could be your mother is aware and your real father is not aware, and she decided not to tell him. It could even be that she's not aware (though she would likely be aware of the possibility.)

If you also mismatch on the maternal then adoption is the most likely explanation. In this case obviously your parents are aware and decided to not disclose this information.

Of course, every so often there is a mutation that creates a new haplogroup but this is extremely, extremely rare. But it's obvious why people would hope for this instead of the above explanations.

You can learn a bit more by doing the autosomal DNA comparison with your brother, or other known relatives whom you can trust. This will let you find out more specifics without talking to your parents. (If you can scan your parents of course you learn more.) For example, which one of you is unrelated to your parent (or whether both of you are.)

Once you know this you have a hard decision.

  • If you mismatch on both material and paternal groups, it is likely one or both of you are adopted, and your parents obviously know.
  • If you mismatch on only the paternal group, then your mother probably (but not certainly) already knows, and if you have to talk to your parents, start with her.
  • If you have other brothers, and one of you matches the other brother, then it makes it slightly more likely your father might not know about the situation. If all of you differ then things like adoption and sperm donation are modestly more likely, but it's far from sure.

The big thing to be careful about would be approaching your father, in case he doesn't know. What if the knowledge makes him distrust or break up with your mother? What if it makes him change his view of his relationship to you?

In the end though, don't panic. You are not that unusual. Various studies have estimated that anything from 3.5% to as much as 10% of people are not genetically related to the man named on their birth certificate. It happens a lot,to many hundreds of millions of people. Yes, there is something that people like to keep secret at the root of it, but it's not so unusual.

My brother and I share paternal haplogroups R-M412
on our Maternal side I am H and my brother is H33

Does this difference infer anything?

If you have the same mother, you will have the same materinal haplogroup. However, before you panic, there can be transcription errors (and extremely rarely, mutations) that would cause a different report. Since you are both in the H subtree, this could be the explanation.

You realise you share the same haplogroup as the blood found on the shroud of Turin?

I was adopted and have tested H1a. As I understand it, my mother would have been that group, as would her mother. Also any maternal aunts, great aunts etc in that specific matriarchal line. So if I locate a half sister from that mother, or sister of that mother, she would share that haplogroup, If I were to locate a 3th cousin with that group, there is a 1/16 chance that the cousin shares that group from a common maternal ancestor, but also a 1/182 chance of having that group by simple coincidence. Given the comparative probabilities, it would seem prudent to at least work up a tree to compare with another cousin who may not share the same haplogroup. At least the matriarchal great great grandmother would be a reasonable place to start a search for an unknown commonality. Telling an adopted person that they have DNA matches doesn't mean much if there is no way to computer match trees of two or more matches. After all, the adopted person has no tree to start with.

While there are a few things you can learn from it -- and you seem to actually know what it is, which is more than I can say for most -- I would say that 99% of the time when people use the maternal haplogroup they are misunderstanding it. Yes, it can be used for confirming certain very close relationships. For example, if your mother doesn't share your maternal haplogroup, then you're adopted. Get more than a few cousinhoods away and it's mostly confusing. And for ancient ancestry it is very highly misleading.

Is it possible to share a maternal Haplogroup with a third cousin but to actually be related on the paternal side?

These are largely irrelevant. You can share them with anybody. They have a mild use in a cheap paternity test since you must share them with your male or female ancestry line. So of course you can share a haplogroup with anybody in the world, regardless of relationship. there are only a few hundred of them.

My mother is deceased. Testing at 23andme shows my haplogroup to be H15a1. My 1c (son of my mom’s sister) has a haplogroup of H15a. Could my variant be a transcription error since it’s so close, or do we indeed come from a different maternal line?

Could a gene drive, potentially created with the help of CRISPR, be used to create a new haplogroup?

Also, Im struggling a bit with wrapping my head around the mitochondrial eve. As I understand it, she was not the first human - so wouldn’t she have still inherited her haplogroup from her mother? Making her mother (or some other mother up the chain) the mitochondrial eve? What makes the eve that we’ve identified special as compared to those that came before her?

Also, do you agree that “Haplogroup Gene Drives & The Mitochondrial Eve” would be a killer name for a band?

I appreciate any insight you’d be willing to share - thanks!

I think it's too much of a mouthful for a band name.

You would need to edit the germline to create a new haplogroup. I don't think the full gene drive, which works on the regular diploid DNA. You only need to alter the haploid DNA which would be easier. Not sure if there is any value in creating a new haplogroup though.

Mitochondrial eve is not really all that special a person. People get confused by it. Yes, of course, she has the same maternal haplogroup as her mother and all the women before her until the one where the mutation that created that haplogroup happened.

She is not "eve" in any sense like the one in the bible. It was a romantic name. What people don't realize is that everybody alive in her day is also your ancestor (unless they didn't have descendants at all.) Everyone. All the women. All the men.

And not only that, everybody after them to for thousands of years. The "universal ancestor point" -- the last time when everybody alive (who had successful children) was the ancestor of everybody today -- comes much, much later than "mitochondrial eve." She's just the one for whom the path to her from you (and everybody else) is always along the maternal line. After all, if she was monogamous, you are also descended from her partner, always through the maternal line until the very last descendant.

My haplogroup is R-L21 and my son’s is a subclase of mine, R-L371. I’ve learned my son’s contains the SNP marker L371. I assume his is different from mine because his new test was more thorough than mine, which is several years old?

I was too quick to submit my comment. ‘Subclase” should be “subclade. “Os” should be “is”. %**?!! Autocorrect!!

It may be he was done on a new chip with more SNPs. You would have to check with the lab you used. Presumably the autosomal DNA confirmed your parentage, so don't worry much about haplogroups.

My mother has haplogroup L3e 1a1a and my haplogroup is M13b2. Why is that? Is it even possible to have different groups if she is my mother?

If you are talking about the maternal haplogroup (which is all that is given for women) then I am afraid the answer is no. I hope you are prepared for this sort of news -- it is always a risk and they warn people before taking most DNA tests. This means that you were adopted (by your mother) or possibly a donor egg was used in your conception. Or, though it's very rare, it could be a "switched at birth" hospital error. Of course she is still your mother, the one who raised you -- I think that people should call adoptive parents the "real parents" and genetic parents should be just that -- genetic only parents.

Do you only have the haplogroups? Most DNA services these days offer much more, the full DNA scan, which also makes it extremely clear about close relations. A person will share a full 50% of their DNA with a genetic parent or child.

Though I must say your story is odd. If you mother adopted you, and decided not to tell you, it is strange that she agreed to take a DNA test with you. With men, it is fairly common for them to be unaware of a missing genetic link to a child. It's almost impossible for women to be unaware of it. The switched-at-birth situation is one of the very few explanations. There have also been cases of errors during IVF at pregnancy clinics, with embryos switched.

Any full DNA test will also tell you your relation to your father and other nearby relatives. If you are female, haplogroups will not reveal anything about your relation to your father.

Of course, I don't need to tell you that your mother is the one who has loved and cared for you your whole life, and the lack of genetic link does not diminish that in any way, and in fact some would argue it makes it even stronger, for she loved you just the same without you coming from her egg. Don't ever forget that.

You'll want to talk to friends who were adopted and found their genetic parents. Sometimes that's a positive experience, on occasion it is a negative one. With a full DNA test there is some chance you would find this. Think with care before deciding if you want to do it.

Ok so my maternal haplogroup is u5a2a, and a 4th cousin and I are supposedly related on my paternal side since my dad took the DNA test, (my mother is deceased). This cousin's maternal side is u5b3b. So could that mean we are double cousins somehow? Neither of us can find the connection except we are both from small towns in East Texas. My dad maternal is J1c1a and paternal is R-L51. The connection to this cousin says P - so not sure if it's my paternal or her paternal. I'm starting to think it's her paternal. However when I search Not Father's side, she doesn't show up, but when I search Father's side, she does.

Pretty much ignore haplogroups-- and definitely ignore them if they are not identical. But usually ignore them even if identical, other than for parentage if you have nothing else. They are a gimmick.

Father is J-L26 and son is J-M172. 47.5% match. Is this a mutation and if so curious how you might know which one is the mutation? Could test another younger male generation. No concern, just interested.

Do you mean your father and your son? Your A father and his son (neither of whom are you since you used a female name.) The most likely explanation is the scary one -- see my advice above about it -- that this father is not the genetic sire of this son, though he is still his father in every other way.

If you mean your father and your son, they would not have the same Y-chromosome at all. You don't have any Y-chromosome as a woman. Your son would have his genetic sire's Y chromosome, which would not be related to your father.

I manage a kit for a family member. She is Haplogroup C1a and has no other C1a matches. I was hoping to use this to find possible matches on her mothers side. From what I read above it seems like no?
Another question, would someone say C1 or C1b be consider a possible match? Just trying to squeeze out any extra information from of a bunch of 4th cousins. Thanks

Haplogroups have close to zero value in tracing relatives. I can't say it's zero but the value is so rare and obscure you should treat it as zero.

Add new comment