The privacy risks of genetic genealogy (23andMe part 2)
Last week, I wrote about interesting experiences finding Cousins who were already friends via genetic testing. 23andMe's new "Relative Finder" product identifies the other people in their database of about 35,000 to whom you are related, guessing how close. Surprisingly, 2 of the 4 relatives I made contact with were already friends of mine, but not known to be relatives.
Many people are very excited about the potential for services like Relative Finder to take the lid off the field of genealogy. Some people care deeply about genealogy (most notably the Mormons) and others wonder what the fuss is. Genetic genealogy offers the potential to finally link all the family trees built by the enthusiasts and to provably test already known or suspected relationships. As such, the big genealogy web sites are all getting involved, and the Family Tree DNA company, which previously did mostly worthless haplogroup studies (and more useful haplotype scans,) is opening up a paired-chromosome scan service for $250 -- half the price of 23andMe's top-end scan. (There is some genealogical value to the deeper clade Y studies FTDNA does, but the Mitochondrial and 12-marker Y studies show far less than people believe about living relatives. I have a followup post about haplogroups and haplotypes in genealogy.) Note that in March 2010, 23andMe is offering a scan for just $199.
The cost of this is going to keep decreasing and soon will be sub-$100. At the same time, the cost of full sequencing is falling by a factor of 10 every year (!) and many suspect it may reach the $100 price point within just a few years. (Genechip sequencing only finds the SNPs, while a full sequencing reads every letter (allele) of your genome, and perhaps in the future your epigenome.
Discover of relatives through genetics has one big surprising twist to it. You are participating in it whether you sign up or not. That's because your relatives may be participating in it, and as it gets cheaper, your relatives will almost certainly be doing so. You might be the last person on the planet to accept sequencing but it won't matter.
What this means is that every deep, dark family secret that people thought was long buried is going to come out in around 5 years or so. Right now, users of genetic services are already discovering secrets in the immediately family, in particular that "daddy is not my daddy." 23andMe and the others even warn pretty strongly that you might discover this, no matter how sure you are that it isn't true.
It's true quite often. A study in the 40s, cited by Jared Diamond in "The Third Chimpanzee" suggested that at least 10%, and probably more of the babies leaving hospitals were not related to the man on the birth certificate. This politically incorrect result was buried. It may have gone down thanks to birth control. Other studies dispute this number and claim a rate of around 2-3%, but varying widely among populations, at least among fathers who were were confident of their parentage. There are also cases where the raising father is fully aware he is not the sire, but the child is not.
There are also 120,000 adoptions per year in the USA, about 2.5% of all births. I don't know how many of these are kept secret from the child into adulthood. In many cases with adoptions from single mothers, the gene-father may be entirely unaware.
There are also sperm donations known to mother and donor but not the child, but for whom the sperm donor and recipient are promised complete anonymity from one another. In such situations, those knowing they have such a secret at first might decide to avoid participating in genetic sharing. However, this doesn't stop their relatives from doing it, and because the secret might not be known the relatives, it is very hard for the secret-holder to ask their relatives not to participate.
Thus the problem. The child of a sperm-donor may enter a genetic database, and be told she is related to a cousin, or even aunt/uncle/grandparent. This cousin will turn out to be a relative of their gene-father. If the child knows she is from a sperm donation, it will be obvious to her that she has found the sperm-father's family. For a child that doesn't know, she will eventually figure things out.
The same things apply to adoptions, infidelities and the chidren of rape. It has already been applied in criminal justice system where people have been placed at a crime scene because they left DNA which was matched not with them, but a relative who had once been arrested.
Now when both parties want to be found, this is often great. In fact, in my own family, two "wild oats" who were children put up for adoption by my grandfather and an uncle, found the family to positive results. I even helped a cousin adopted by another uncle find her birth-mother to positive results. So lots of good will happen. But some families and marriages will also be torn asunder.
In a way, it's time for a public service announcement to go out.
Do you have a family secret regarding somebody who doesn't know who their genetic parent is?
Wondering when it might be time to break the news? Better do it sooner, rather than later.
My own experience with my newfound cousin Jonathan Zittrain showed how surprises can be learned even from those quite aware of their genetic relationships. After I was identified as Jonathan's cousin, I made contact with another cousin who turned out to be his mother. That makes perfect sense, until you learn that I share almost twice as much DNA with Jonathan as I do with his mother, and there are different segments I share with her that I don't share with him (as expected.) Barring an error in the data, this means I am related to both of Jonathan's parents, and at least by the averages, probably more closely related to his father. His father is deceased, so not so easy to gene-sequence.
Because our connection is almost surely through my 2 great-grandparents from Vitebsk who would presumably be siblings or cousins of his parent's ancestors, it either means each of his parents is related to each of the two great-grandparents (or perhaps their 4 parents if the relationship is more distant) or more probably his to parents are both related to the same common ancestor, which is to say his parents are cousins at the 2nd to 4th cousin level. Now it turns out this is ordinary and not really anything to be concerned about, and Jonathan had no problem with me telling this story, but there are people who might freak out and learning their parents might be cousins.
(Aside from the fact that such inbreeding was very common in the close-knit Ashkenazi community, an Icelandic study showed that in fact 3rd cousins who marry have more successful grandchildren than any other type of marriage, including 1st, 2nd, 4th and all other cousins and totally unrelated people.)
While you might not think it is too world-shaking to learn this sort of thing, the important point was that Jonathan and his mother learned it not because of their own sequencing but because of my ability as a more distant relative to compare with both of them. As such, relative finding will result in the discovery, through 3rd parties, of many other things, some of them potentially more embarrassing.
At first, Relative Finder was "opt in" so you could not even be contacted unless you opted in, but you showed up tantalizingly in the list. After the beta, they allowed you to initiate contact with any relative, though you did not learn any more about them unless they responded. This is riskier than they imagine. Consider receiving a mail like this:
Hello to my new 2nd cousin. I am curious to learn about you because we have our own family tree well mapped out and I know all my 2nd cousins. I am wondering how you fit in? Is it possible you are adopted?
For somebody who didn't opt-in, an e-mail like this (featuring the name and profile of the sender, as it usually does,) even if never responded to, could be quite shocking. It may be necessary to make even being able to read such messages be an opt-in act, where all you learn is that "some relatives" which to contact you, without telling you more until you opt in and agree to a warning about the risk of revealed secrets.
This turns out to be not much of a burden. Most genealogy-enthusiasts on 23andMe were quite disappointed when the Relative Finder went out of Beta, because the response rate from people who did not opt in turns out to be very, very low. In fact, it makes sense to at least offer an "eager to meet" flag that people can put on their account so that those seeking to contact relatives can cherry pick the ones who are likely to respond. At least that might cut down on the contacts done to more distant relatives that the real fanatics seem to do in large numbers -- they are almost spam when you consider how many relatives the system finds.
Relative Finder does disclose a little data in advance of contact, namely sex, country and one or two "haplogroups." They use a couple of hundred haplogroups, with widely varying distributions, but for a male, two haplogroups might identify you to one in 40,000 -- which is to say, uniquely in the 23andMe database (except for your brothers.) Because of that I was able to spot, for example, somebody who had a high probability of being one of the family who donated full access to their DNA under pseudonyms so that users could see what the product did with a full family of scans. (Later, the pseudonyms were unmasked with permission from the parties, but this is beside the point.)
The business end
As the number of companies in this business grows, there will be a network effect and a bit of the "Highlander" problem -- there can be only one. That's because when it comes to genetic comparison, you want to compare against the largest database. Until whole-genome becomes cheap, there will be some other competitive differences but eventually it will all be down to database size.
We don't know who will lead in size, but it won't be very good to start up in a lower position. One thing that startups will do is offer to let people transfer their data from another company and get a free or cheap account. DecodeMe.com, for example, did a special offering free accounts to 23andMe members. They could do this because 23andMe, along with DecodeMe and Navigenics, let you download your scan results. There will be contention over this. The industry leader may want to stop letting people download, but companies that don't let you download should, quite rightly, lose business because they effectively lock you in. How much they will lose is another question. A savvy consumer might pass such a company by but others might accept lock-in in exchange for a lower price.
On the other hand, with the cost of sequencing dropping all the time, the lock-in will not last for long.
The smaller companies will realize that they must pool resources to compete. This is not very easy to do while following their privacy policies. There are algorithms, known as cryptographic blinding which can allow parties to perform operations (such as comparisons) without knowing what data they are working on. However, some companies will probably elect to be more lazy and just pass around the data, or pass it temporarily to a 3rd party comparing service.
Either way, the likely result is all the small companies joining a big pool to get larger than the biggest company, which then will be forced to decide whether to fight or join. One way or another there will be a big pool.
I'm sure all the companies have thought about how they might apply relative matching with social networking. While I still believe it was unusual to have a 3rd and 4th cousin in my social network, it's not that unlikely to have many more distant cousins. When it comes to 10th cousins, I figure you have a couple hundred million of them, and so many will be known to you. As such, expect to see a Facebook application which tells you how related you are to all your facebook freinds.
As full sequencing becomes cheap, and genealogy trees are brought in, expect algorithms to even start telling you how you are related, naming the ancestors. With the sequencing of large numbers of people, it will be possible to learn more and more about the genomes of dead ancestors. The more of a person's progeny you sequence, the larger a portion of the ancestor's DNA you can figure out. Someday, I imagine you might get sequenced and get told, "You are approximately 123,000th in line for the throne of England."