The odds of knowing your cousins: 23andme Part 1
Bizarrely, Jonathan Zittrain turns out to be my cousin -- which is odd because I have known him for some time and he is also very active in the online civil rights world. How we came to learn this will be the first of my postings on the future of DNA sequencing and the company 23andMe.
(Follow the genetics for part two and other articles.)
23andMe is one of a small crop of personal genomics companies. For a cash fee (ranging from $400 to $1000, but dropping with regularity) you get a kit to send in a DNA sample. They can't sequence your genome for that amount today, but they can read around 600,000 "single-nucleotide polymorphisms" (SNPs) which are single-letter locations in the genome that are known to vary among different people, and the subject of various research about disease. 23andMe began hoping to let their customers know about how their own DNA predicted their risk for a variety of different diseases and traits. The result is a collection of information -- some of which will just make you worry (or breathe more easily) and some of which is actually useful. However, the company's second-order goal is the real money-maker. They hope to get the sequenced people to fill out surveys and participate in studies. For example, the more people fill out their weight in surveys, the more likely they might notice, "Hey, all the fat people have this SNP, and the thin people have that SNP, maybe we've found something."
However, recently they added a new feature called "Relative Finder." With Relative Finder, they will compare your DNA with all the other customers, and see if they can find long identical stretches which are very likely to have come from a common ancestor. The more of this they find, the more closely related two people are. All of us are related, often closer than we think, but this technique, in theory, can identify closer relatives like 1st through 4th cousins. (It gets a bit noisy after this.)
Relative Finder shows you a display listing all the people you are related to in their database, and for some people, it turns out to be a lot. You don't see the name of the person but you can send them an E-mail, and if they agree and respond, you can talk, or even compare your genomes to see where you have matching DNA.
For me it showed one third cousin, and about a dozen 4th cousins. Many people don't get many relatives that close. A third cousin, if you were wondering, is somebody who shares a great-great-grandparent with you, or more typically a pair of them. It means that your grandparents and their grandparents were "1st" cousins (ordinary cousins.) Most people don't have much contact with 3rd cousins or care much to. It's not a very close relationship.
However, I was greatly shocked to see the response that this mystery cousin was Jonathan Zittrain. Jonathan and I are not close friends, more appropriately we might be called friendly colleagues in the cyberlaw field, he being a founder of the Berkman Center and I being at the EFF. But we had seen one another a few times in the prior month, and both lectured recently at the new Singularity University, so we are not distant acquaintances either. Still, it was rather shocking to see this result. I was curious to try to figure out what the odds of it are.
I've tried to reach about a dozen cousins in this system, and only 4 have responded. The 2nd turned out to be Jonathan's mother, whose account he handles. (This results in some more curious conclusions I will detail later.) The 4th came last week, and amazingly it was another friend. My newly discovered cousin Asya is also not a close friend but somebody I have known entirely in social circles for over 15 years, knowing each other well enough to have attended parties at each other's houses and so on. What can the odds of this be?
There is one important detail to add to this. My grandmother was an Ashkenazi Jew out of Vitebsk (Belarus), and both cousin JZ and cousin Asya are more fully of Ashkenazi ancestry from Eastern Europe. If you have the Jewish DNA on 23AndMe, it finds a lot of cousins for you -- around 800 to 1,000 from their database of around 35,000 customers. Non-Jews tend to match only about 100 to 200. Quite simply, the Ashkenazi community was tight knit, with a fair bit of inbreeding. Thanks to this, we share more DNA with others in our group than average people do, and just about everybody is some level of cousin -- the only question is how close. Still, only a few will share enough DNA to be ranked as 3rd cousins. They rank two people as third cousins when about 0.75% of their DNA shows as being "identical through descent." (For 2nd cousin it requires about 3%, and 12% for 1st cousins. Siblings share about 50% each being a different mix of their 4 grandparents. Each descent -- a more distant level of cousin involves two descents -- cuts the identical DNA in half.)
As a test, in fact, I did a DNA comparison with another friend of Jewish ancestry and we did indeed have a small match, branding us as cousins of unknown distance. Because 23andMe knows about the Ashkenazi inbreeding, they don't try to make claims on low-level matches among us. For cousin B, my .11% level of sharing would typically make us suspected 4th or 5th cousins. So it turns out that it is possible that JZ and Asya are more slightly distant cousins than the tool predicts.
You want more? The first cousin Asya contacted turned out to be somebody she knew! However, in this case she did not know him well, she knew him as a moderately well known blogger whose site she has participated in, and with whom she has had online conversations.
Working out the probabilities of this is difficult. However, in all the cases above, there is no physical connection between me and my cousins. My family left Vitebsk in the late 19th century, and moved to England and then Canada. JZ's family emmigrated earlier, and while there are Russian Jews in his family tree we did not find a link to the 2 surnames I know. Asya was born in Russia and is a modern immigrant. She has recent ancestors from near Vitebsk. cousin B was also born in Russia, but he does have recent ancestors from Vitebsk. So in all cases the migrations were disjoint. This is common for the diaspora from Eastern Europe. Indeed, the one place I probably don't have many relatives is Vitebsk itself. In 1941, the murderers of the Einsatzgruppen B made a base in Vitebsk, following the invading military and slaughtering all the Jews in that town, including, presumably, what remained there of the family of myself and these cousins. I never knew them, of course, but it is still disturbing to think about.
This diaspora is spread over Europe, Canada, Israel, the USA and a number of other countries. There are perhaps 600,000,000 or more people living in places the descendants of my (and JZ's and Asya's) g-g-g-grandparents migrated to. They did not settle in any one place, and nor do I live in any particular place of concentration, and JZ lives in Boston, not the Bay Area (where I, Asya and Cousin B all moved to.)
So the next question is, how many 3rd cousins does a typical person have? This is hard to answer, and it's an answer that changes quite a bit each generation. To come up with a general answer, you need to figure out how many successful children a typical couple has. Successful children are children who themselves have children and grandchildren. A study in Iceland came up with numbers that were quite low -- around 8 to 9 grandchildren per couple, or 3 per generation. The reason this seems low is that when I look in my family tree I see plenty of large families 2 generations back. My grandfather had 8 brothers. My Jewish grandmother was one of 11. Good Jewish families were expected to be fecund, and I also think there was a surge around the start of the 20th century when people suddenly found all their kids were thriving, when before you might have 12 and only see half of them thrive. I have not found a good demographer's report to come up with real numbers. In my immediate family though, I see that my mother's parents only had 5 genetic grandchildren who themselves bred, and my father's parents had 8 such grandchildren, more like the Icelandic numbers.
It's not a good assumption, but if you assume a fecundity number per generation of 3, you get around 600 3rd cousins. My general formula is (2f)^(c+1)/2, where f is breeding children per generation and c is the cousin number (3 for 3rd cousin.) So with f=4 it's 2000 3rd cousins, f=5 yields 5,000. f=5 is quite a lot -- 25 successful grandchildren, each the start of their own line, per pair of grandparents.
Thus the issue: If 2000 cousins are truly and randomly spread among 600,000,000 people, then one in 300,000 people will be a cousin. If you have a circle of "friends" of 1,000 then the odds of even one of them being a cousin are 1 in 300 -- again if all things are evenly distributed. They probably aren't, but there are no obvious clumping factors, other than the suggestion that Jews are prevalent in the high-tech, higher-prosperity circles from which an abnormal number of my friends are drawn. Because I am only ancestrally Jewish, and was not raised that way, it is not the case that most of my friends are Jewish. Though it is true that Jews are well over-represented in my circle for whatever reason. However, even if 500 of my 3rd cousins are among the world's 12 million Jews, it's still 24,000 to one that a random Jew I meet is my 3rd cousin. And if you include partial Jews like me, it's even less likely.
But now we get to the part that seems the most crazy. If the odds of knowing a 3rd cousin at all are one in 300 (or even 1 in 100) what are the odds that the very first cousins encountered in the small database would be ones I knew? That seems through the roof. However, there is a big factor affecting this. Both JZ and myself are in 23andMe for a bunch of similar reasons. First of all, we are friends with Anne, a co-founder of 23andMe. I did some consulting for them and got in as a free customer. We are both early adopters with keen interest in privacy-related issues and genetics. As such we were also participants in the Beta release of the Relative Finder. That reduces the strangeness of JZ being my first contact, though not entirely. Cousin Asya, however, joined 23andMe for its original "medical-scan" purpose, as did the blogger she knew. Together it still seem quite bizarre, both by intuition and through math. Nobody else in the online forums of 23andMe has reported a story like this. (In fact, they mostly complain about how non-responsive the cousins they do try to contact in the database are. Most members seem to not want to contact distant cousins, particularly if they signed up primarily for medical reasons.)
As you might guess, Relative Finder has a number of rather big consequences to privacy and the future of genetics. 23andMe is not the only one doing this. Family Tree DNA, a company whose prime focus is genealogy, has begun a program of relative finding through DNA just recently. For years people have also been misled into thinking they can do genealogy through haplogroups which are groupings read from either the Mitochondrial DNA (which is passed down largely unchanged from mother to child) or the Y-chromosome DNA, which is passed down from father to son. Because these can be traced back through all-maternal (or all-paternal) lines into the distant past, and are easy to read, people get very excited about them, but in truth they reveal almost nothing about genealogy. For example, for your 5th cousins, only 1/2000th of them will share your maternal haplogroup because they are your 5th cousins. Far more of those who share it do so simply by chance, especially in some ethnic groups.
In part two, I write about some of those privacy questions, and the coming (in just a few years) exposure of almost all deep family secrets (adoptions, sperm donations, and children-by-infidelity). And I'll wonder if anything can be done about it, because it seems difficult to imagine what. And I'll explore some even more details of my relationship with Jonathan Zittrain.
A month or so after we learned this, Jonathan happened to sit on the couch next to mine in a lounge in Davos, Switzerland. (He was speaking, I was just party-crashing.) The coincidences keep coming. His interpretation? He wonders which of us will first ask the other for money. The bar was free so we don't yet know, but I'm happy to buy him a drink next time.