The privacy risks of genetic genealogy (23andMe part 2)


Last week, I wrote about interesting experiences finding Cousins who were already friends via genetic testing. 23andMe's new "Relative Finder" product identifies the other people in their database of about 35,000 to whom you are related, guessing how close. Surprisingly, 2 of the 4 relatives I made contact with were already friends of mine, but not known to be relatives.

Many people are very excited about the potential for services like Relative Finder to take the lid off the field of genealogy. Some people care deeply about genealogy (most notably the Mormons) and others wonder what the fuss is. Genetic genealogy offers the potential to finally link all the family trees built by the enthusiasts and to provably test already known or suspected relationships. As such, the big genealogy web sites are all getting involved, and the Family Tree DNA company, which previously did mostly worthless haplogroup studies (and more useful haplotype scans,) is opening up a paired-chromosome scan service for $250 -- half the price of 23andMe's top-end scan. (There is some genealogical value to the deeper clade Y studies FTDNA does, but the Mitochondrial and 12-marker Y studies show far less than people believe about living relatives. I have a followup post about haplogroups and haplotypes in genealogy.) Note that in March 2010, 23andMe is offering a scan for just $199.

The cost of this is going to keep decreasing and soon will be sub-$100. At the same time, the cost of full sequencing is falling by a factor of 10 every year (!) and many suspect it may reach the $100 price point within just a few years. (Genechip sequencing only finds the SNPs, while a full sequencing reads every letter (allele) of your genome, and perhaps in the future your epigenome.

Discover of relatives through genetics has one big surprising twist to it. You are participating in it whether you sign up or not. That's because your relatives may be participating in it, and as it gets cheaper, your relatives will almost certainly be doing so. You might be the last person on the planet to accept sequencing but it won't matter.

What this means is that every deep, dark family secret that people thought was long buried is going to come out in around 5 years or so. Right now, users of genetic services are already discovering secrets in the immediately family, in particular that "daddy is not my daddy." 23andMe and the others even warn pretty strongly that you might discover this, no matter how sure you are that it isn't true.

It's true quite often. A study in the 40s, cited by Jared Diamond in "The Third Chimpanzee" suggested that at least 10%, and probably more of the babies leaving hospitals were not related to the man on the birth certificate. This politically incorrect result was buried. It may have gone down thanks to birth control. Other studies dispute this number and claim a rate of around 2-3%, but varying widely among populations, at least among fathers who were were confident of their parentage. There are also cases where the raising father is fully aware he is not the sire, but the child is not.

There are also 120,000 adoptions per year in the USA, about 2.5% of all births. I don't know how many of these are kept secret from the child into adulthood. In many cases with adoptions from single mothers, the gene-father may be entirely unaware.

There are also sperm donations known to mother and donor but not the child, but for whom the sperm donor and recipient are promised complete anonymity from one another. In such situations, those knowing they have such a secret at first might decide to avoid participating in genetic sharing. However, this doesn't stop their relatives from doing it, and because the secret might not be known the relatives, it is very hard for the secret-holder to ask their relatives not to participate.

Thus the problem. The child of a sperm-donor may enter a genetic database, and be told she is related to a cousin, or even aunt/uncle/grandparent. This cousin will turn out to be a relative of their gene-father. If the child knows she is from a sperm donation, it will be obvious to her that she has found the sperm-father's family. For a child that doesn't know, she will eventually figure things out.

The same things apply to adoptions, infidelities and the chidren of rape. It has already been applied in criminal justice system where people have been placed at a crime scene because they left DNA which was matched not with them, but a relative who had once been arrested.

Now when both parties want to be found, this is often great. In fact, in my own family, two "wild oats" who were children put up for adoption by my grandfather and an uncle, found the family to positive results. I even helped a cousin adopted by another uncle find her birth-mother to positive results. So lots of good will happen. But some families and marriages will also be torn asunder.

In a way, it's time for a public service announcement to go out.

Do you have a family secret regarding somebody who doesn't know who their genetic parent is?

Wondering when it might be time to break the news? Better do it sooner, rather than later.

My own experience with my newfound cousin Jonathan Zittrain showed how surprises can be learned even from those quite aware of their genetic relationships. After I was identified as Jonathan's cousin, I made contact with another cousin who turned out to be his mother. That makes perfect sense, until you learn that I share almost twice as much DNA with Jonathan as I do with his mother, and there are different segments I share with her that I don't share with him (as expected.) Barring an error in the data, this means I am related to both of Jonathan's parents, and at least by the averages, probably more closely related to his father. His father is deceased, so not so easy to gene-sequence.

Because our connection is almost surely through my 2 great-grandparents from Vitebsk who would presumably be siblings or cousins of his parent's ancestors, it either means each of his parents is related to each of the two great-grandparents (or perhaps their 4 parents if the relationship is more distant) or more probably his to parents are both related to the same common ancestor, which is to say his parents are cousins at the 2nd to 4th cousin level. Now it turns out this is ordinary and not really anything to be concerned about, and Jonathan had no problem with me telling this story, but there are people who might freak out and learning their parents might be cousins.

(Aside from the fact that such inbreeding was very common in the close-knit Ashkenazi community, an Icelandic study showed that in fact 3rd cousins who marry have more successful grandchildren than any other type of marriage, including 1st, 2nd, 4th and all other cousins and totally unrelated people.)

While you might not think it is too world-shaking to learn this sort of thing, the important point was that Jonathan and his mother learned it not because of their own sequencing but because of my ability as a more distant relative to compare with both of them. As such, relative finding will result in the discovery, through 3rd parties, of many other things, some of them potentially more embarrassing.

At first, Relative Finder was "opt in" so you could not even be contacted unless you opted in, but you showed up tantalizingly in the list. After the beta, they allowed you to initiate contact with any relative, though you did not learn any more about them unless they responded. This is riskier than they imagine. Consider receiving a mail like this:

Hello to my new 2nd cousin. I am curious to learn about you because we have our own family tree well mapped out and I know all my 2nd cousins. I am wondering how you fit in? Is it possible you are adopted?

For somebody who didn't opt-in, an e-mail like this (featuring the name and profile of the sender, as it usually does,) even if never responded to, could be quite shocking. It may be necessary to make even being able to read such messages be an opt-in act, where all you learn is that "some relatives" which to contact you, without telling you more until you opt in and agree to a warning about the risk of revealed secrets.

This turns out to be not much of a burden. Most genealogy-enthusiasts on 23andMe were quite disappointed when the Relative Finder went out of Beta, because the response rate from people who did not opt in turns out to be very, very low. In fact, it makes sense to at least offer an "eager to meet" flag that people can put on their account so that those seeking to contact relatives can cherry pick the ones who are likely to respond. At least that might cut down on the contacts done to more distant relatives that the real fanatics seem to do in large numbers -- they are almost spam when you consider how many relatives the system finds.

Relative Finder does disclose a little data in advance of contact, namely sex, country and one or two "haplogroups." They use a couple of hundred haplogroups, with widely varying distributions, but for a male, two haplogroups might identify you to one in 40,000 -- which is to say, uniquely in the 23andMe database (except for your brothers.) Because of that I was able to spot, for example, somebody who had a high probability of being one of the family who donated full access to their DNA under pseudonyms so that users could see what the product did with a full family of scans. (Later, the pseudonyms were unmasked with permission from the parties, but this is beside the point.)

The business end

As the number of companies in this business grows, there will be a network effect and a bit of the "Highlander" problem -- there can be only one. That's because when it comes to genetic comparison, you want to compare against the largest database. Until whole-genome becomes cheap, there will be some other competitive differences but eventually it will all be down to database size.

We don't know who will lead in size, but it won't be very good to start up in a lower position. One thing that startups will do is offer to let people transfer their data from another company and get a free or cheap account., for example, did a special offering free accounts to 23andMe members. They could do this because 23andMe, along with DecodeMe and Navigenics, let you download your scan results. There will be contention over this. The industry leader may want to stop letting people download, but companies that don't let you download should, quite rightly, lose business because they effectively lock you in. How much they will lose is another question. A savvy consumer might pass such a company by but others might accept lock-in in exchange for a lower price.

On the other hand, with the cost of sequencing dropping all the time, the lock-in will not last for long.

The smaller companies will realize that they must pool resources to compete. This is not very easy to do while following their privacy policies. There are algorithms, known as cryptographic blinding which can allow parties to perform operations (such as comparisons) without knowing what data they are working on. However, some companies will probably elect to be more lazy and just pass around the data, or pass it temporarily to a 3rd party comparing service.

Either way, the likely result is all the small companies joining a big pool to get larger than the biggest company, which then will be forced to decide whether to fight or join. One way or another there will be a big pool.

Social networking

I'm sure all the companies have thought about how they might apply relative matching with social networking. While I still believe it was unusual to have a 3rd and 4th cousin in my social network, it's not that unlikely to have many more distant cousins. When it comes to 10th cousins, I figure you have a couple hundred million of them, and so many will be known to you. As such, expect to see a Facebook application which tells you how related you are to all your facebook freinds.

As full sequencing becomes cheap, and genealogy trees are brought in, expect algorithms to even start telling you how you are related, naming the ancestors. With the sequencing of large numbers of people, it will be possible to learn more and more about the genomes of dead ancestors. The more of a person's progeny you sequence, the larger a portion of the ancestor's DNA you can figure out. Someday, I imagine you might get sequenced and get told, "You are approximately 123,000th in line for the throne of England."


..or quite probably his to parents are both related to the same common ancestor, which is to say his parents are cousins at the 2nd to 4th cousin level. There doesn't seem to be rationale for the use of "probable". "possible" yes, but it seems just as likely that JZ's parents aren't related at all in the relatively recent generations (that is, of course, before he volunteered that info). It wasn't uncommon for several marriages to occur between unrelated families. And considering how aggressive 23andme's algorithm is for inbred populations, there's no relations that can be inferred with any probability estimate for this time frame (4 generations or sooner).

Now, if you have a segment strand of DNA that matches both strands of JZ's (that is, not just a half-identical match), then you know that JZ's parents are related. But that situation isn't what is stated here, and I haven't heard whether any testing company would report such an occurance.

Note that JZ did not offer anything to suggest his parents are related, he has no knowledge of it and all we know is that they both appear related to me. The reason I said "probably" is that it was quite common for people to marry cousins (particularly in the 2nd and above range) because they lived in the same region, and people didn't get around nearly as much in those days. What we do know of his family tree is that they are not aware of any Belarus, just russian. For his mother to be descended from one of pair of my g-g-grandparents (there are only 2 pairs of them involved here) and his father to be descended from the other pair requires two migrations to the same town, which I judged as less likely than one migration which split and rejoined in them. However, there is not enough to judge, so I should correct it to say it is perhaps more probable.

Now also probable here is that the algorithms are incorrect, and the common ancestors are further back. Had he known ancestors from the same region around Vitebsk, it would have increased the probability of 2 disjoint lines converging at him in my analysis.

At this point, anyone who donates sperm should count on being tracked down eventually. In twenty years it will simply be far too trivial for the genetic testing and social networking to be done for it to not happen sooner or later.

In not too long there's going to be the bigger issue of whether you're allowed to do genetic testing on other people. It wouldn't be hand to swipe a coworker's coffee and learn a whole lot more about them.

And after this gets more press (A TV series right now on one of the networks is doing that) all sperm donors, adoption donors and cuckolders will know it is not anonymous.

The issue is there is about 80 years worth of genetic secrets held by people who were often promised privacy -- sometimes right in the law -- and who planned their lives around it.

In fact, in many cases what may be most disturbing to the child is not the fact of their genetic origin, but the fact that it was hidden from them. That their parents decided to raise them with a lie.

Yeah, it's the lying which is a real problem. People are often traumatized when they find out that they were an accident, since that fact has no bearing on their genetic origins and little on their later life. I think the real reason is that it's a proxy for 'I wish you were never born', a sentiment which is often felt but never spoken.

Well, in the pre-birth-control days there wasn't nearly so much family planning. I think most people would come to readily accept being an accident once they are much older but it would be bothersome before that.

I'm not that bothered by people who want to cover up adoption or sperm donation, or infidelity until later in life. All children, when they go through the rebellion stage, wonder how these people could be their parents. And I can see how adopters would not want to have the genetic parents competing in the child's mind while they are growing up. It probably should be revealed later though, and now it will.

My recollection is that typically when parents sequence their children it is the parent that runs the account on services like 23andme, so the child won't learn something until they are old enough to do it themselves. So the secrets can be protected (from the child) until they are a bit older.

But not too old. Just as kids can buy drugs, alcohol, porn and everything else they are not supposed to buy, kids who doubt their parents will be readily able to buy genetic testing for themselves, and their parents from hair folicles etc. For just doing paternity it will cost a small amount of money, and in fact the kids may well have lab equipment at home or at school on which they can do it.

Interesting to read how affordable sequencing is becoming. I have elderly parents (aged 93 and 86). Should I ask them to send me locks of their hair so I can sequence them now or in the near future? Perhaps they would have to send me blood? (This won't go down well).

Is it possible that offspring will be medically disadvantaged in the near future if they don't have access to their parents' DNA?

Actually 23andMe just put a special on their ancestry version on the Oprah show of $199, but every year it is going to get cheaper -- and better.

Right now the testing services either use a lot of saliva or a cheek swab, so you can't do them trivially on the deceased. However, there are lots of labs which do sequence other tissues, including those of the dead, even the long dead -- though the error rate increases with time. With hair, a lock is not what you want, you want the roots of the hair follicle.

There is an interesting article in the NY Times today where the genetic cause was tracked down by doing a complete DNA sequencing. In one case the mentioned that problem wouldn't have happened without genes from both parents.

If a complete sequencing becomes cheap enough, wouldn't prudent people do a sequencing before they marry? What responsible parent would choose to risk burdening their child with a congenital defect? If it becomes cheap enough how long before it becomes mandatory? The same logic would apply to sperm bank donations. As fertility rates drop all over the world, people are going to more adverse to playing the genetic lottery with their children.

I read an old Robert Heinlein novel called "Beyond This Horizon" where such genetic counseling was the norm, but the point was not to avoid sickly children, but to have optimal children. I remember thinking that if such a thing becomes possible, I hope it will be cheap, because if it is only available to the wealthy, then the social consequences could be unpleasant.

I have to presume you have not seen Gattaca, one of the finer SF movies ever made, which is about this question and beyond.

Many futurists and SF writers have pondered this, and things beyond it, like gene selecting the best children (rather than just making embryos with the old random method and only allowing the best to grow.)

My view is that by the time we're at that last stage, so much other strange stuff will be going on that this won't seem strange. It's a flaw in Gattaca of sorts that they have a world very much like ours in other ways. By that point we'll have tons of organisms-to-order and even probably synthetic life, as well as lots of gene modified humans. It will of course be possible not only to make a child which is "the best of both parents" but also to add the good stuff from the world's leading minds and athletes with the best of the parents.

But all this, I suspect, will be insignificant next to what the AIs are doing -- but that's another story.

However, in the nearer term we will certainly see what you describe, sequencing before breeding. 23andMe already offers it in a fairly weak way, it will tell you some very simple attribute probabilities for your potential child on things like eye colour, athletic type and a few diseases.

"It will of course be possible not only to make a child which is “the best of both parents” but also to add the good stuff from the world’s leading minds and athletes with the best of the parents."

Considering that the main advantage of these traits (particularly athletic ability) is that not everyone has them, then if it becomes easy to have them, then some of the motivation for having them is gone. I think this vision of the future is about as credible as that of The Jetsons.

Fred Pohl entitled his autobiography THE WAY THE FUTURE WAS.

The main advantage of athletic ability is that not everybody has it? I can't say I agree with that (not having too much of it myself.) Yes, much of what the world's greatest athletes does only comes from intense training that most people have no interest in doing, but not all of it.

The more interesting issue (also talked about in many novels) is beauty. There are objective standards of it, and people want to gene-shape their kids to meet them, to be tall and symmetrical etc. And they will also go for more subjective and cultural ideas of beauty, including strange ones. In addition to the more obviously useful things like health, strength, endurance, weight maintenance, longevity and intelligence.

(Brad aside), has anyone considered the consequences of a convergence of genetic data from 23andMe, et al. with say the face recognition technology of MyHeritage, et al?

To what degree would it be possible to associate a particular shape, sizing or placement nose-eye-mouth, or the ratios between such to specific markers?

Though genetic influence in facial characteristics is fairly obvious over 1 or 2 generations, it would be interesting to see how many generations this persists for before being "washed out" by genetic noise.

Does legislation exist or is required to forever prevent such data associations? How does one prevent a US based 23andMe from sharing information with Israel-based MyHeritage?

I suppose this is no different from correlating finger-prints, retina scans, hand-print analysis, etc. to you-name-it.

I have no privacy concerns. Most people's dna, their genes, their SNPs, their mortality are practically identical. Why be concerned over minor differences, whether you have an A where someone has a C? I personally don't like 23andMe. It is not very well run, but the raw data they gave me has been of some assistance to me in generating a Promethease Report, posting it to Dr.Doug McDonald, using it on deCODEme and in some forums. My interest is in ancestry, not the short term ancestry in hundreds of years, but in thousands of years. I know my ancestry and origins back to the time surnames started in Europe. Unfortunately 23andMe and deCODEme just limit themselves to disease risks (not interested, short term ancestry (boring) and unimportant things like mitochondrial haplogroups, and Y chromosome haplogroups. Really both companies offer very little except raw data, and silly prognostications on your health.

I don't really believe in adoptions, at least the way adoptions are handled in most countries, which is secret and furtive. I believe everyone should know their ancestry or at least know they are not the biological children of their parents. Subterfuge and dissembling is not a good basis for family relationships, and the truth will out, one day. On 23andMe there is a man with a Jewish surname who states on his profile he found out he was adopted at 47 years old, probably the result of adoptive parents deaths, at everything he was told was a lie. Now that is very sad, and would undermine his whole concept of who and what he is. Better be open and save the angst.

I am not the least interested in the RF feature of 23andMe. I have so many relatives already, don't want any more. Luckily I have 34 RF cousins. I am not Jewish, and not Anglo. The feature is more farce than anything else, mostly does not work and causes frustration to people who do want to know more about their genealogy. I know enough about my genealogy already.

Add new comment