In my article two weeks ago about the odds of knowing a cousin I puzzled over the question of how many 3rd cousins a person might have. This is hard to answer, because it depends on figuring out how many successful offspring per generation the various levels of your family (and related families) have. Successful means that they also create a tree of descendants. This number varies a lot among families, it varies a lot among regions and it has varied a great deal over time.
I received some criticism the other day over my own criticism of the use of haplogroups in genealogy -- the finding and tracing of relatives. My language was imprecise so I want to make a correction and explore the issue in a bit more detail.
One of the most basic facts of inheritance is that while most of your DNA is a mishmash of your parents (and all their ancestors before them) two pieces of DNA are passed down almost unchanged. One is the mitochondrial DNA, which is passed down from the mother to all her children. The other is the Y chromosome, which is passed down directly from father to son. Girls don't get one. Most of the mother's X chromosome is passed down unchanged to her sons (but not her daughters) but of course they can't pass it unchanged to anybody.
This allow us to track the ancestry of two lines. The maternal line tracks your mother, her mother, her mother, her mother and so on. The paternal line tracks your father, his father and so on. The paternal line should, in theory, match the surname, but for various reasons it sometimes doesn't. Females don't have a Y, but they can often find out what Y their father had if they can sequence a sample from him, his sons, his brothers and other male relatives who share his surname.
The ability to do this got people very excited. DNA that can be tracked back arbitrarily far in time has become very useful for the study of human migrations and population genetics. The DNA is normally passed down completely but every so often there is a mutation. These mutations, if they don't kill you, are passed down. The various collections of mutations are formed into a tree, and the branches of the tree are known as haplogroups. For both kinds of DNA, there are around a couple of hundred haplogroups commonly identified. Many DNA testing companies will look at your DNA and tell you your MTDNA haplogroup, and if male, your Y haplogroup.
Last week, I wrote about interesting experiences finding Cousins who were already friends via genetic testing. 23andMe's new "Relative Finder" product identifies the other people in their database of about 35,000 to whom you are related, guessing how close. Surprisingly, 2 of the 4 relatives I made contact with were already friends of mine, but not known to be relatives.
Many people are very excited about the potential for services like Relative Finder to take the lid off the field of genealogy. Some people care deeply about genealogy (most notably the Mormons) and others wonder what the fuss is. Genetic genealogy offers the potential to finally link all the family trees built by the enthusiasts and to provably test already known or suspected relationships. As such, the big genealogy web sites are all getting involved, and the Family Tree DNA company, which previously did mostly worthless haplogroup studies (and more useful haplotype scans,) is opening up a paired-chromosome scan service for $250 -- half the price of 23andMe's top-end scan. (There is some genealogical value to the deeper clade Y studies FTDNA does, but the Mitochondrial and 12-marker Y studies show far less than people believe about living relatives. I have a followup post about haplogroups and haplotypes in genealogy.) Note that in March 2010, 23andMe is offering a scan for just $199.
The cost of this is going to keep decreasing and soon will be sub-$100. At the same time, the cost of full sequencing is falling by a factor of 10 every year (!) and many suspect it may reach the $100 price point within just a few years. (Genechip sequencing only finds the SNPs, while a full sequencing reads every letter (allele) of your genome, and perhaps in the future your epigenome.
Discover of relatives through genetics has one big surprising twist to it. You are participating in it whether you sign up or not. That's because your relatives may be participating in it, and as it gets cheaper, your relatives will almost certainly be doing so. You might be the last person on the planet to accept sequencing but it won't matter.
Bizarrely, Jonathan Zittrain turns out to be my cousin -- which is odd because I have known him for some time and he is also very active in the online civil rights world. How we came to learn this will be the first of my postings on the future of DNA sequencing and the company 23andMe.
(Follow the genetics for part two and other articles.)
23andMe is one of a small crop of personal genomics companies. For a cash fee (ranging from $400 to $1000, but dropping with regularity) you get a kit to send in a DNA sample. They can't sequence your genome for that amount today, but they can read around 600,000 "single-nucleotide polymorphisms" (SNPs) which are single-letter locations in the genome that are known to vary among different people, and the subject of various research about disease. 23andMe began hoping to let their customers know about how their own DNA predicted their risk for a variety of different diseases and traits. The result is a collection of information -- some of which will just make you worry (or breathe more easily) and some of which is actually useful. However, the company's second-order goal is the real money-maker. They hope to get the sequenced people to fill out surveys and participate in studies. For example, the more people fill out their weight in surveys, the more likely they might notice, "Hey, all the fat people have this SNP, and the thin people have that SNP, maybe we've found something."
However, recently they added a new feature called "Relative Finder." With Relative Finder, they will compare your DNA with all the other customers, and see if they can find long identical stretches which are very likely to have come from a common ancestor. The more of this they find, the more closely related two people are. All of us are related, often closer than we think, but this technique, in theory, can identify closer relatives like 1st through 4th cousins. (It gets a bit noisy after this.)
Relative Finder shows you a display listing all the people you are related to in their database, and for some people, it turns out to be a lot. You don't see the name of the person but you can send them an E-mail, and if they agree and respond, you can talk, or even compare your genomes to see where you have matching DNA.
For me it showed one third cousin, and about a dozen 4th cousins. Many people don't get many relatives that close. A third cousin, if you were wondering, is somebody who shares a great-great-grandparent with you, or more typically a pair of them. It means that your grandparents and their grandparents were "1st" cousins (ordinary cousins.) Most people don't have much contact with 3rd cousins or care much to. It's not a very close relationship.
However, I was greatly shocked to see the response that this mystery cousin was Jonathan Zittrain. Jonathan and I are not close friends, more appropriately we might be called friendly colleagues in the cyberlaw field, he being a founder of the Berkman Center and I being at the EFF. But we had seen one another a few times in the prior month, and both lectured recently at the new Singularity University, so we are not distant acquaintances either. Still, it was rather shocking to see this result. I was curious to try to figure out what the odds of it are.