Archives

Date

Use GPS Maps to improve map databases, but protect privacy

Mapping programs, and fancy GPSs come with map databases that will, among other things, plot routes for you and estimate the time to travel them. That’s great, but they are often wrong in a number of ways. Sometimes the streets are wrong (missing, really just a trail, etc.) and they just do a rough estimation of travel time.

Yet all the information is there, being collected constantly by every car that drives the roads with a GPS. Aggregating this data will tell you what roads are real, what roads might be missing, which are one-way, where freeway entrances and exists really are.

And it will also tell you real-world speed examples at various times and dates, at rush hour or otherwise. Even a range of speeds so you can know the speeds for faster and slower drivers and get a really good estimate of your own likely speed on a given road at a given time. After removing the anomalies (like people stopping for coffee) of course.

Rental cars with GPSs are collecting this all the time (sometimes to nefarious uses, like charging whopping fees for brief trips out of state). Technically this data can be had.

But here’s the bad part — there is a potential for giant privacy troubles unless this is done very well, and some may be impossible to do without a privacy risk. After all, until you upload the data, there is clearly a log of your travels sitting there to be used against you. Only a system with rapid upload (and which discards data that gets old, even if it’s not uploaded) would not create a large risk of something coming back to haunt you.

The data would have to be anonymized, of course, and that’s harder than it sounds. After all, your GPS logs say a lot about you even without your name. Most would identify where you live, though that can be mitigated by breaking them up into anonymized fragments to a degree. Likewise they’ll identify where you work or shop or who you visit, all of which could be traced back to you.

So here’s the Solve This aspect of this problem. Getting good data would be really handy. So how do we do it without creating a surveillance nightmare?