No, you don't need to drive a billion miles to test a robocar

Topic: 

Earlier I noted that Nidi Kalra of Rand spoke at the AVS about Rand's research suggesting that purely road testing robocars is an almost impossible task, because it would take hundreds of millions to a billion miles of driving to prove that a robocar is 10% better than human drivers.

(If the car is 10x better than humans, it doesn't take that long, but that's not where the first cars will be.)

This study has often been cited as saying that it's next to impossible to test robocars. The authors don't say that -- their claim is that road testing will not be enough, and will take too long to really work -- but commenters and press have taken it further to the belief that we'll never be able to test.

The mistake is that while it could take a billion miles to prove a vehicle is 10% safer than human drivers, that is not the goal. Rather, the goal is to decide that it's unlikely it is much worse than that number. It may seem like "better than X" and "not worse than X" are the same thing, but they are not. The difference is where you give the benefit of the doubt.

Consider how we deal with new drivers. We give them a very basic test and hand them a licence. We presume, because they are human teens, that they will have a safety record similar to other human teens. Such a record is worse than the level for experienced drivers, and in fact one could argue it's not at all safe enough, but we know of no way to turn people into experienced drivers without going through the risky phase.

If a human driver starts showing evidence of poor skills or judgments -- lots of tickets, and in particular multiple accidents, we pull their licence. It actually takes a really bad record for that to happen. By my calculations the average human takes around 20 years to have an accident that gets reported to insurance, and 40-50 years to have one that gets reported to police. (Most people never have an injury accident, and a large fraction never have any reported or claimed accident.)

In other words, any teen who has an accident is already demonstrating a high probability that they are a far worse than average driver. Yet we let them stay on the road. We even let most people have two accidents, even though the evidence is now very strong that they're pretty bad. We do this because we are reluctant to pull a licence and deprive somebody of economical travel.

(As an interesting aside, in today's world of Uber at $2/mile and future robotaxis under 50 cents/mile, it is becoming far less of a burden to strip somebody of a licence. While I don't predict we will ban human driving by driver's with good records, we could see the day come before two long where two accidents or multiple tickets mean loss of licence, and eventually where even one negligent accident is enough. While that would no longer ruin your ability to get around, it might be frightening enough to seriously improve human road safety. I do believe before long that a single DUI will result immediately in licence revocation. Today in California you lose it for 4 years for your 4th DUI!)

We could not possibly test humans to the standard proposed in the RAND paper. That standard is about proving a level of safety, but a better standard is about progressively demonstrating lower risk. The longer a car is on the road without incident, the less and less chance that something bad will happen, even if you can't declare a formally provable level of risk. On the other hand, vehicles that have incidents early on may be just unlucky, but for now they had better demonstrate that or go back to the track/safety-driver if they want to be on the road again.

A more likely approach is to start with everything that can be learned with safety-driver supervised road testing and all other types of testing -- off-road and simulator. Those methods can convince you it's safe to deploy on a trial basis without safety drivers, but with careful tracking. If you do so, and you have incidents, you then pull the vehicle if the rate of incidents suggests it does not meet the safety goal. If it is way worse than the safety goal, you pull it very quickly. If it is at the safety goal, you never pull it.

You never will get to the number of miles needed to fully prove a safety record this way, because long before you do, you will have new revisions of the software. Those new revions will have been heavily regression tested and tested in sim, but each will be slightly different. The revisions are of course very similar to their predecessor, so the testing of the predecessor matters, but until you have truly massive fleets there never will be a fixed version that gets tested for hundreds of millions of miles.

As I see it, the current common plan is as follows.

  1. Test on test tracks until performance is good enough for supervised live road testing.
  2. Test on live roads with safety drivers monitoring them until performance is even better.
  3. At the same time, test in sim through a million different safe and dangerous situations and every variation of every problem you can think of.
  4. Allow onto the road for use by customers, but track carefully.
  5. If accident rate shows signs of being too high, fix problems and limit operational areas, and go back to safety driving until shown to be fixed.

In the last step, an important idea is that of limiting operational areas. If you operate a fleet of self-driving cars for mobility service, or ahave sold a million cars to customers, you can't just announce one morning, "We have discovered a safety issue, all cars are now disabled." You will never get any customers if this happens once, stranding customers.

Instead, there are a few options companies will try. If the safety problem is one which only occurs in certain situations (for example, certain types of roads or driving situations) then the cars might be ordered to just avoid those situations. Alternate solutions (such as human driven taxis) would serve customers needing a ride through those situations.

For owned cars, the vendor might also do a "recall." With a recall, the owners are informed of the problem and told to come in for an update to fix it. From that point on, it is the owner's responsibilty if they ignore the warning and don't come in for the update. Of course, the problem here is that updating cars is not hard, it's writing and testing the update that's difficult, so it is uncertain what to do during the interim period. It may be that customers are told they can drive manually, or in another vehicle, in the problem area, and if they want to use self-driving it is again on them. That works for owners but less so for a vendor operated fleet.

Another choice is to just take the risk, if it's small, but lawyers and juries have been very unkind to companies who took such decisions. Society may have to decide what to do about the trade-off between being able to have a viable business (that does not shut down completely every time a flaw is found) and a safe business that avoids exposing customers to risks that it knows about.

An open source minder?

One approach that might help smaller startup companies get to their road testing would be the creation of a basic "minder" which can watch a car and help the safety driver. The minder would prevent accidental lane departure, and have automatic emergency braking to prevent rear-end impacts. In other words, it's a not a lot more sophisticated than the ADAS tools in many cars. It could be a commercial product, but we're also close to the level where open source code will be capable of this.

A young team, hoping to test a car on the road, could install the minder as a semi-independent system, and then drive with safety drivers. The minder would be tested so that we know that "minder plus safety driver" is "safe enough," making it hard for the experimental system to do worse. The experimental tool would have the ability to explicitly demand (in advance) an override from the minder, telling it, "I'm changing lanes now, don't keep me in this one." Teams would know to be extra careful with their code in this mode (and perhaps would make the safety drivers more alert in these phases as well.)

The NTSB review of the Tesla crash noted that the combination of "Autopilot" plus "mostly alert drivers" plus "a small number of reckless autopilot users" was as a system, safer than just plain drivers. We're certainly at the level where "minder" plus "reasonably trained safety driver" would also be that safe. That means new teams could be out testing right away.

Comments

Of course you are correct that with sufficient modeling a little data can provide a lot of confidence. This is more true with simpler systems. The world of driving is pretty complicated though, so getting a lot of confidence will require some combination of decent amounts of data plus a sophisticated model. The more data you have the less you have to rely on the educated guesswork that models enable. Non-expert regulators ought to prefer data since it relies less on trusting models they aren't equipped to evaluate themselves. The alternative is effectively handing the decision to experts and hoping they have no bias.

Of course Tesla may be able to get enough data to not have to rely on complex models, which makes all of this moot.

Add new comment