Orwell could answer the cell phone driving question

From time to time I come up with ideas that are interesting but I can't advocate because they have overly negative consequences in other areas, like privacy. Nonetheless, they are worth talking about because we might find better ways to do them.

There is some controversy today over whether driving while talking on a cell phone is dangerous, and should be banned, or restricted to handsfree mode. It occurs to me that the data to answer that question is out there. Most cars today have a computer, and it records things like the time that airbags deploy, or even in some cases when you suddenly dropped in speed. (If not, it certainly could.) Your cell phone, and your cell company know when you're on the phone. Your phone knows if you are using the handsfree, though the company doesn't. Your phone and cell company also know (but usually don't record) when you're driving and suddenly stop moving for an extended period.

In other words, something with access to all that data (and a time delta for the car's clock) could quickly answer the question of what cell phone behaviours are more likely to cause accidents. It would get a few errors (such as if the driver borrows their passenger's phone) but would be remarkably comprehensive in providing an answer.

But to gather this data involves way too many scary things. We don't really want our cars or phone companies recording data which can be used against us. They could record things like if we speed, and where we go that we don't want others to know about, and who we're talking to at the time, and much more.

In our quest for learning from private data, we have often sought anonymization technologies that can somehow collect the data and disassociate it from the source. That turns out to be very hard to do, often near impossible, and the infrastructure built for this sort of collection can almost always be trivially repurposed for non-anonymous use; now all that is needed is to flick a switch.

Now I do expect that soon we will see, after a serious car accident, attempts to get at this data on a case by case basis. The insurance companies will ask for cell phone records at the time of the accident, or data from the phone itself. We're already going to lose that privacy once there is an accident, thought at least case by case invasions don't scale. Messy problem.

Comments

I like the idea, but I'm not sure that the savings would be worth the costs. The difficulty is that it is very hard to match the time of the accident to the cell phone records. The car's event recorder is not accurately time synchronized. The cost of equipping event recorders with time synch for gathering accident/cell relationships is too high for just research. If there was some significant monetary value, that could change this. But it needs to be more than just a change in fault assignment because insurers cover both sides.

For example, I've always liked the idea of enforcing seat belt laws with a mandatory $10K extra deductable on health payments for the person not wearing the seat belt. They get to pay some of the extra costs resulting from their decision not to wear a seatbelt. This is probably not all the extra costs, but if the level is too draconian the public will not accept a change like this. The insurers would become interested in cell records if there was an extra deductible paid by the driver in the event that they were contributing to the accident by talking on the phone at the time. Then there might be enough incentive.

As for your concerns on anonymous data, I routinely see effective anonymization. The problem is not technologies. The problem is taking the proper approach. Most people want to gather and mine a vast pool of anonymous data. That does not work. That is relatively easy to break. Good anonymization starts by determining what is the minimum data that must be collected to answer the question being asked, and removing everything else. This is the approach typically used in clinical drug trials, where the requirements for effective double blind analysis coincide well with the needs of anonymization. Breaking the anonymization of drug trial information cannot be done by a simple flip the switch. It takes several steps.

The clinical trials also design in a simple method for breaking anonymization when appropriate, which reduces the pressure to defeat the anonymization. If a trial investigator happens to notice an unrelated disease indication, there is an established series of trusted third parties, none of whom have complete knowledge, for getting a notification back to the correct patient and their doctor. There is less pressure to gather excess data because this does not benefit the drug sponsor, investigators, patients, or regulators. So privacy violations tend to be accidental, misunderstandings of procedures, or mistakes in determining the minimum data requirements for the study.

Because the cell phone's time is generally perfect, and you can pull the time out of the clock from the car for some time after the accident. Modern clocks drift seconds per month at most. When you read the time of the airbag event or similar you can read the current time and compare to accurate time.

Our world is becoming much too connected to anonymize. Every bit of data is connected to others and it leads through chains to fully identified data. Had we wanted to we could have identified every single user in the AOL anonymized clickstreams, not just the few who searched for their own names.

Anyway, as I noted, the car's data isn't even needed. The cell phone company now knows where you are, how fast you are moving, and when you stop moving, should it dare to record that.

I'm interested in your reaction to the privacy flaws in this approach (assuming technology permits).

Hypothesis: Cell phone usage changes accident rates based on whether the phone is in use at the time of the accident. (With subcategorization by severity and kind of roadway, 3 categories of each).

Field data gathering:

At every accident where it is possible to collect the correct time and location of the accident, do so. This should cover a moderately large but uniform area. For example, the entire SF Bay area or LA county. Do this without regard to severity, location, or indicated cell phone use. Categorize each accident by severity (low, medium, high) and by kind of roadway (freeway, major, minor).

Once per month, provide every covering cell carrier with a list of locations, times, and categories (1-10). The tenth category is created by going to highway locations where there is no accident. Pick times and locations with statistics similar to accident occurrences. Based on the roll of dice, make a cell phone call or not. Record the location and time. The categories given to the carriers are randomized each month. The carriers are to respond with ten numbers: the number times that a cell phone from one of their subscribers was in use at a location and time in each category.

The analysis re-bins the categories. The non-accident category is used to determine the accuracy of the carrier's time and location analysis, because the correct number that should be returned is known. It can also be used to assess the extent to which other cell phone usage by nearby automobiles will distort the data.

The monthly numbers can be used together with other statistics on likelihood of cell phone use while driving to determine the impact on accident rate.

I see no financial motivation for the carrier to do this, so they will demand payment. It might be a large payment if they do not normally keep time and location records. I don't see a motivation for the insurers to make this payment. Police and government are not likely to have budget for this. But perhaps you could find someone to cover the costs.

I think that this would sufficiently anonymize the data, and I'm interested in your analysis. This is an example of what you need to do to create a test plan based on the question to be answered, rather than create a massive database and subsequent data mining.

The problem is that this mandates the data be recorded -- more specific locations of phones that are in use, for example. Right now the cell companies log your calls and their times, and I presume they log traffic levels on their cell towers. I don't know if they even log what cell towers you used, but I don't believe so and I don't believe they keep a more detailed track log of their E911 positioning on you. (With many phones, they only get that when the phone is making a call, which is good.)

Doing this would require they record much more data, and in turn would mean that data could be found by others and used against you. People do not want a complete log recorded of where they have traveled. Nor do they want their cars recording such things.

Now such recordings are always a two-edged sword. Sometimes they can prove you are innocent, but sometimes your own car or cell company would be turning against you. People don't want technology that can betray them.

Every cell tower is in use, just knowing how many people were using a tower near the accident at a given time is of no value.

I'm not sure there's any controversy about this at all amongst the scientific community. Drivers on cell phones have been reliably associated with car accidents, and there's a fair bit of evidence that using a hands-free device doesn't help, which suggests that the challenge is attentional, not manual.

The intellectual challenge of scientific research is to come up with experimental designs that allow you to test hypotheses while minimizing expense, time, and risk to the participants. Researchers have come up with several ways of measuring the risk of cell phone distractions to drivers. A quick look at Google Scholar gets you papers on driving safety and cell phones from the Journal of Experimental Psychology: Applied (Strayer, Drews & Johnstone, 2003), another from Accident Analysis & Prevention (Matthews, Legg & Charlton, 2003), Human Factors (Strayer & Drews, 2004) and a piece that comes closest to answering your question directly, in BMJ [British Medical Journal]:

Role of mobile phones in motor vehicle crashes resulting in hospital attendance: a case-crossover study (McEvoy, Stevenson, McCartt, Woodward, Haworth, Palamara & Circarelli, 2005).

The findings are available at http://www.bmj.com/cgi/content/abstract/331/7514/428, but I can summarize here. They looked at 456 drivers who owned or used cell phones and had been injured in accidents seriously enough to require a hospital visit. Researchers checked to see who used their cell phone within ten minutes before their accident, and compared that to their cell phone use around trips at the same time of day in the previous week. "Driver's use of a mobile phone up to 10 minutes before a crash was associated with a fourfold increased likelihood of crashing (odds ratio 4.1, 95% confidence interval 2.2 to 7.7, P < 0.001)." Or, as the researchers concluded, "When drivers use a mobile phone there is an increased likelihood of a crash resulting in injury. Using a hands-free phone is not any safer."

A fourfold increase in the risk of crashing, with a P < 0.001, is weighty evidence.

As Linda pointed out, there is little value in doing this study, but I'll explain why it is a bit more anonymous than you think. First, the database is spread across multiple vendors, never all in one place. Second, it is transient. Data can be discarded after a month or two. Third, it needs only track a location history, with no relationship to any subscriber identification. There is still a privacy risk, but not much greater than the existing privacy risk. Assuming that it can be done at all (which I still question), this has the organizational benefit that it would be policy to not collect identifying information when gathering location. At present, the policy is open.

I do have doubts about it's feasibility. Most of the electronics that I have seen uses RAKE transceivers. These do not internally track distance or angle information. They only track phase shift and frequency shift correlation numbers. The correlation numbers need never leave the transceiver. Pulling this numbers out to get more detailed information than cell tower is not fast or easy. They are rapidly changing, and converting them into angle and distance is subject to a lot of ambiguity. I could see it being reasonable if I had already identified a particular target to track,e.g., e911, but not as a general practice. Easiest is tapping into GPS equiped cell phones.

History is littered with stories of people who thought they anonymised data and later discovered something they did not think of. The only surefire method is not collecting. That doesn't mean there aren't better ways to do it than others, of course.

Accident locations are a public record, police reports are a public record, so you can't keep those away from those with the carrier database.

Of course, another thing I didn't point out is that the carriers do not want solid evidence that talking on cell phones, even with a headset, is dangerous!

Add new comment