Designing a metric to measure robocar safety -- what does insurance teach?

Topic: 

The most challenging problem for robocars today is proving they are safe. Yes, making them safe is very important, but they'll only be let on the roads by the companies making them if that safety levels can be demonstrated.

You can't prove safety unless you can measure it. Even though we still don't fully understand everything about testing, it is time to start thinking of preliminary ways to do that. California requires all parties testing in the state to release disengagement reports, and while the reports are very interesting, they are not useful for comparison because each company has different rules about disengagement. In addition, the disengagement rate will depend greatly on the types of roads and traffic situations the car encounters.

(Waymo also just released a new safety report describing 4 million miles of real world testing and 2.7 billion in simulator.)

All cars operating (except the Waymo minivans in Phoenix) have a "safety driver" behind the wheel who is told to grab the wheel and intervene if they think an unsafe situation is present. They also do it if the system announces it has had a failure. I have taken Google's safety driver training; safety drivers go through advanced driver training and are told to intervene if anything looks dangerous. They do not wish to endanger the public, so the drivers don't think, "Hmm, the car seems to not be slowing for those kids in the crosswalk. Let's wait and see what it does!" Instead, they intervene and the whole situation is played back in simulation later to see what would have happened without the intervention. If the simulator says there would have been a "contact," that is recorded as a highly serious incident. Perversely, forcing teams to report all disengagements pushes companies to discourage rather than encourage safety driver paranoia, which we don't want.

Possible metric #1 -- Miles/km per "contact"

This leads to one metric: "miles per contact incident." In theory this could compare to a human being's miles per contact, which is, according to Google, around 100,000. (The insurance industry gets a report every 250K miles, the police every 500K.) The problem, of course, is that it matters a lot what type of miles they are. Highway miles are much easier than complex city street miles, which would be harder in Boston than Phoenix, and harder still in Mumbai.

There are also many types of serious events to consider -- in rough order

  • Contact with a vulnerable road user
  • High speed contact with another vehicle
  • High speed contact with a road feature
  • Lower speed contact with another vehicle
  • Lower speed contact with a road feature
  • Too close approach to a vulnerable road user
  • Leaving your right-of-way (including failure to yield)
  • Too close approach to another vehicle or road feature
  • Drifting out of lane briefly by accident
  • Vehicle code violations

Humans, of course, get too close to things, violate ROW and drift out of lanes all the time, almost always not causing an accident. Whatever our metric, we would like to be able to apply it to humans and see how different classes of people score. Thanks to large naturalistic driving studies there is already data on how often humans do distracting activity or otherwise fail to pay attention, and how often it leads to accidents.

Disengagements for detected software or hardware faults are a difficult thing to quantify. These are in the California reports, but not all of them would be contact events. Well designed systems, in fact, expect software and hardware faults, even very serious ones, and try to "fail safe" or ideally fail operational. While we might want to know how many software faults are happening, if they are handled, they may make a misleading metric. A very successful system might have a software fault every week but have backups ready to handle it. Humans, after all, zone out and look away from the road all the time, but we feel OK being driven by them, and even better if forward collision avoidance systems are there to "fail safe" when they do.

What is a mile?

Our metric is also complicated because not all miles are the same. Waymo is in Phoenix and Silicon Valley for a reason. These are easier cities to drive. Highway miles are even easier. A defined metric will need mixed buckets of miles that include typical patterns of human use, or typical patterns of use in a robotaxi service.

Possible metric #2 -- Incident points (or dollars) per mile/km

We might build a system of "points" for all the serious events. Something that would be a fatality, like hitting a pedestrian or driving off the road, would account for a super high point count. Lower speed contacts with other vehicles which usually just bend fenders would have far fewer points. Events which are a concern but would not have caused a contact would get a small number of points.

Dollars and insurance as metrics

Because the insurance industry has so much data about it, dollars might be the best way to create our "points." While that involves putting a dollar value on injuries and lives, this is a well practiced task in the insurance industry and other places -- see the Value of Life.

For a typical human driven car in the USA, you pay an average of about 6 cents/mile for insurance. Ideally, liability and collision insurance should be priced purely by the mile, and some places claim to have pay-as-you-drive (PAYD) but in reality it's not quite there yet. Nonetheless, paying $600 for liability/collision on a car that goes 10,000 miles per year amounts to around that number.

While the insurance industry could play a role in calculating these numbers, they can't do it all, because it is the team that is testing the cars that has the best handle on these events and probabilities. Large players -- be they automakers, companies like Waymo, Uber or Apple or even large fleet operators, would be highly motivated to self-insure. Even when they do self-insure, they will still need to do this calculation.

Today, insurance companies do this calculation for humans, and they price based on zip code, driving record and many other factors, in part to get a sense of the danger on the types of roads the cars will drive. Human driving patterns will differ from robocars, and for many years, robocars will drive only on a subset of roads that they have maps and safety testing for.

With the development of such a metric, a team might be able to say, "We have our safety cost down to 3 cents/mile in the San Francisco area." This could be compared to other cars and even to humans. The cost would be real because the car would only drive in the areas the safety was measured.

To truly work out the cost of an inident in simulator, you need a good full physics simulator which can calculate crash severity. Fortunately, the regular car industry has already worked hard on that in order to improve crash testing, and this will improve with time.

Could governments or standards bodies play a role?

Regular readers will know I am wary of much early government involvement. However, if governments are going to demand things like engagement reports, they might play a role in defining how to calculate them. The primary legwork should be done in concert with developers and other experts, but then the government could approve the codification of the work. At the same time, it must be understood that this is a very dynamic field, with new inventions and new thinking arising on a monthly basis. As such, no system should expect to last very long, as much as we might like that.

Eventually, a metric like this might become part of a self-certification requirement for car developers. It could also be created by a consortium of developers and other experts, like insurers. I say, "like insurers" because I believe that with self-insurance being so likely, I am not sure today's insurers will do as much with robocars as they hope. There is precedent, however, in how organizations like the Insurance Institute for Highway Safety work. They promote car safety and publish data on cars. You've used their crash ratings. They are an independent scientific lab funded by the insurance industry.

A body like this might work out cost values to attach to certain events, and rules for deciding the cost of any failure, and help companies come into compliance with a system as they build their testing and do their certifications.

Problems to be resolved

The insurance industry works hard to reduce claim costs. Because of this, some people estimate that the true cost of accidents is up to 4 times higher than the cost insurers pay. Insurers don't pay for congestion, medical costs of at-fault parties, pain and suffering and many other factors. Even so, because this is just a metric meant for comparison, it's very useful to be able to compare to the cost rates of humans.

There is a temptation as well to invert the number, to miles per dollar, or perhaps "miles per (human life equivalent)." At the DoT's number of $9.4M per human life, 6 cents per mile is 156M miles per life. This would make the number easier to understand, and the lesson about miles-per-gallon vs. liters/100km tells us the public likes metrics where higher is better. We would be comparing systems by saying, "This one gets 1 billion miles per life, while the other gets 500 million, compared to people who get 150M and drunks who get 50M." if using lives is too creepy, an alternate might be "miles per cost of totalling the average car" -- about 500,000 for human drivers.

(This needs to be refined more. The insurance cost per mile is built from all the things that might have a cost while driving a car, which ranges all the way from little dents to horrible fatalities. Humans don't just drive 150M miles and then kill a person, they actually travel longer than that because all sorts of smaller things happen along the way.)

Most of today's teams, who have not even run for more than 10,000 miles on today's roads, would publish very bad numbers under this metric. You would need to run for several hundred thousand miles with zero to few contact incidents in order to get a good number. On the other hand, since so much of the cost of insurance is put into human lives, it might be that the number will appear artificially low for teams that have not run far enough. For example, while the typical human might drive 500,000 miles in a lifetime, a good chunk of their insurance price will be for the 1 in 400 chance they will cause a fatality. Obviously the vast majority of people never cause fatalities or even serious accidents -- we should hope so -- but the cost of their small risk of that is priced into their insurance.

Testing in simulator

A very useful tool for all teams is testing vehicles in simulator. Waymo reported above that they have done 2.7 billion miles in simulator, and this number is going to keep going up fast. It's worth noting while you will spend some number of "boring" miles of ordinary driving in simulator, most operation will involve something unusual happening. This is the opposite of how real world driving is. The problem of course, is that simulator testing only confirms you handle the chosen and scripted/parameterized situations.

For most of the simulation situations, the only score a car should ever get in simulator is perfect, because any time you find your software doing the wrong thing in simulator, you fix the problem and run the test again until it does the right thing. Simulation can also include "accident inevitable" situatons where you test how well the system does at reducing severity. Again, teams will run these tests again and again until they do the best at this.

From time to time, an agency could create a series of entirely new scenarios in simulation, and each car could test against these to produce a score. But this works only once -- after they know about a situation they will "design to the test" and generate a perfect or optimum result.

Comments

My name is Michael DeKort. I am a former systems engineer, engineering and program manager for Lockheed Martin. I worked in aircraft simulation, the Aegis Weapon System, NORAD and on C4ISR for DHS. I also worked in Commercial IT. I received the IEEE Barus Ethics Award for whistleblowing regarding the DHS Deepwater program post 9/11 - http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4468728

I am also a member of the SAE On-Road Autonomous Driving Validation & Verification Task Force - (SAE asked me to join the group because of my POV and my background)

I contacted you regarding your involvement in autonomous vehicles. There may be some significant issues in the industry you are no aware of regarding how an AV should be created and brought to market? Specifically the use of proper aerospace level simulation vs public shadow driving and L2+/L3 vehicles? (One clear example the shift has started is Waymo's recent massive paradigm shift). I am also concerned that most if not all the AV makers and OEMs' are using the wrong simulation approach. Both of these issues will make it impossible to get anywhere near L4 and create unnecessary casualties when they drive thousands of accident scenarios thousands of times each for AI and testing. (In many cases the sim issues will cause false positives which will not be caught until real-world tragedies occur)

I have provided links to a couple articles of mine that explain my positions on all of this in detail with source references below. Please let me know if you would like to discuss this further.

Autonomous Levels 4 and 5 will never be reached without Simulation vs Public Shadow Driving for AI
https://www.linkedin.com/pulse/autonomous-levels-4-5-never-reached-without-michael-dekort

Autonomous Vehicle Testing – Where is the Due Diligence?
https://www.linkedin.com/pulse/autonomous-vehicle-testing-where-due-diligence-michael-dekort/

Corner or Edge Cases are not Most Complex or Accident Scenarios
https://www.linkedin.com/pulse/corner-edge-cases-most-complex-accident-scenarios-michael-dekort/

The Dangers of Inferior Simulation for Autonomous Vehicles
https://www.linkedin.com/pulse/dangers-inferior-simulation-autonomous-vehicles-michael-dekort/

Michael DeKort

There is often discussion about what lessons should be imported from aviation into cars. It is of course complex, and not all of them can be imported. But lots of people are doing extensive work on simulation and have been for a while. Last night I attended a presentation from yet another company trying to sell these services to vendors.

My point in this article is not to deny the value of simulation -- as far as I know I was the first person to make calls for extensive simulation many years ago -- but rather that it is difficult to measure you safety in sim, at least in comparison to humans who only do real-world driving (3 trillion miles per year in the USA, almost 2 light years a year around the world.)

You can do measurements in sim but only if you have a huge quantity of entirely new scenarios on a regular basis for people to test against.

Add new comment