What if teams were forced to contribute robocar incident data?


At teams around the world attempt to build safe robocar systems, one key asset has stood out as a big differentiator -- experience. For a company to be willing to certify their vehicle as safe, it needs experience with all the strange circumstances that it might encounter driving the roads.

Right now with over 5 million miles on the road, and 5 billion in simulator, Waymo has more than everybody else put together. Since they also have more money, it's likely to stay that way. The main threat is that companies who operate or control large fleets of cars -- like Uber, Lyft, Tesla and other carmakers -- can collect more sensor data on ordinary human driving or autopilot driving. That's not the same as experience, and the sensor suites in those cars are at present vastly simpler, but there is a chance to get a lot more of it.

There are different types of experience. Highways are simple. Urban streets are complex, and they are much more complex in some parts of the world (like India) than in others (like Arizona.) Everybody has lots more experience to gain.

The value of sharing

Experience covers all sorts of road situations, but of particular interest are dangerous situations, especially ones that caused an "incident." An incident is anything risky from a brief lane departure all the way to an impact.

From the standpoint of the entire industry, and the public interest, we want all cars to be as safe as possible. This could be improved if every team had access to the data on dangerous situations, and in particular actual accidents.

The 2016 proposed NHTSA regulations which I roundly criticised, contained a useful proposal -- that all teams would have to share all data about any serious safety incident. Vendors pushed back on this, and it vanished from the newer regulations.

Back in 2010, I proposed the creation of global open source simulator. It was my hope that with such a simulator, contributors from around the world would encode "scenarios" that would be useful to virtually test robocars in. Teams, academics, and open source contributors from around the world would model the situations, until we had a library of every road situation anybody ever thought of or saw. And if teams shared data, anything seen by a car that drove the roads with sensors.

If this existed, when everybody said, "Hey, I just thought of something, what would a car do if it encountered X" the chances are that we would already know the answer.

This has not come to be, though Baidu has put their Apollo simulator up open source, and a few others have followed.

Why they might not want to share

At first examination, everybody would benefit from this big shared testing library. But after a team has spent tens or hundreds of millions of dollars driving to build up their own library, they have strong commercial incentives not to share. It is, as noted, a big competitive advantage to have all that experience. Investors did not put in that money to yield that edge to all the competitors.

The government could force them to share. It could say, "If you want to test and learn on public roads, you have a duty to share things that would improve the safety of everybody, including your competitors." The moral argument is not hard.

But what if compelled sharing means there is less incentive for people to build up the testing library. If a 5 person startup gets access to all the experience that another company used its money and headstart to build, we might not only ask if that is fair, but whether it would discourage companies from all the expensive driving they are doing? Google decided to get into this 10 years ago, and that's part of why they are so far ahead. Should the law erase that?

Paying for entries

We might reduce this factor a bit if companies that submitted real safety scenarios to the database got paid for it, and companies that wanted to use the situations had to pay to use them. Since there are many companies, the cost to use the test suite would be less than is paid out to build it, perhaps much less. That's needed because if such a test suite existed, it would be effectively compulsory to use it. That's because if your car ever had an accident which would have been prevented had you paid for the test suite, it would cost you a lot in court, if nothing else.

It could even become fairly lucrative to drive and generate incidents for the test suite. So lucrative that there would have to be somebody who judges the quality of the scenarios to decide what should be paid for them. The focus would be on real world incidents, well recorded. There would be much more value for truly novel incidents and the most dangerous ones. Bringing in a recording of yet another jaywalker very similar to 100 others would not get you much.

There is still value in the less dangerous situations. Companies would not be compelled to share them, but some might find it financially worthwhile. Ideally, we reach a point where selling scenarios to the database pays for the driving required to gather them. The company doing the driving does not have to hand over everything, and so they can improve their proprietary databases effectively for free or at a profit, which is a win.

Reporting an incident

Every car has a different sensor configuration. This means that raw sensor logs are most useful only to the company that generated them. To be useful for others, these must be converted into a simulator model of the situation. This is work, but should get easier as tools are created to do it.

In addition, since most companies use a reasonable resolution LIDAR (though they mount them in different places) companies can also make use of the raw sensor logs, but that would be their responsibility.

Unfortunately some companies can rightly complain that they are building their own custom proprietary sensors, and that those sensors are even secret in their design, their function and even their existence. Companies will have the right to decline to disclose that data until such time as their new sensor is made public. They can still generate the simulator model without revealing the details of the sensor, though they may have to reveal its range.


Teams are driving all around the world. The law can force sharing only within an individual country, though an international treaty could alter that. There is some risk that a team might decide to simply do all or most testing in countries that don't force it to share data. In addition, some companies have private test tracks. It is harder for the law to force sharing of data from those, since the argument that "you learned this on the public roads" does not apply. As such, if the companies truly wish to fight this rather than cooperate, they might be able to do that.


Is it useful to analyze/train from Russian dash-cam recordings of crashes?

For example, here they have dozens of them posted daily: ru-chp.livejournal.com

You could certainly use those to try to build simulator scenarios, though sometimes they don't really have a lot of information you would want, views from the side, what other cars were doing etc. Since almost no team is trying to drive from a crappy single camera, it would be difficult to test a car on the pure video from the dashcam, though you could learn some things from it, like how well your video tools decode that video and classify things in it.

But yes, what I am talking about is a world where all sorts of people are taking the time to generate good data, put it in simulator model form, and can even make money doing it, so that we get a giant library for all cars to use.

Add new comment