California publishes robocar "intervention" reports -- Google/Waymo so far ahead it's ridiculous

Topic: 

California published its summary of all the reports submitted by vendors testing robocars in the state. You can read the individual reports -- and they are interesting, but several other outlines have created summaries of the reports calculating things like the number of interventions per mile.

On these numbers, Google's lead is extreme. Of over 600,000 autonomous miles driven by the various teams, Google/Waymo was 97% of them -- in other words 30 times as much as everybody else put together. Beyond that, their rate of miles between disengagements (around 5,000 -- a 4x improvement over 2015) is one or two orders of magnitude better than the others, and in fact for most of the others, they have so few miles that you can't even produce a meaningful number. Only Cruise, Nissan and Delphi can claim enough miles to really tell.

Tesla is a notable entry. In 2015 they reported driving zero miles, and in 2016 they did report a very small number of miles with tons of disengagements from software failures (one very 3 miles.) That's because Tesla's autopilot is not a robocar system, and so miles driven by it are not counted. Tesla's numbers must come from small scale tests of a more experimental vehicle. This is very much not in line with Tesla's claim that it will release full autonomy features for their cars fairly soon, and that they already have all the hardware needed for that to happen.

Unfortunately you can't easily compare these numbers:

  • Some companies are doing most of their testing on test tracks, and they do not need to report what happens there.
  • Companies have taken different interpretations of what needs to be reported. Most of Cruise's disengagements are listed as "planned" but in theory those should not be listed in these reports. But they also don't list the unplanned ones which should be there.
  • Delphi lists real causes and Nissan is very detailed as well. Others are less so.
  • Many teams test outside California, or even do most of their testing there. Waymo/Google actually tests a bunch outside California, making their numbers even bigger.
  • Cars drive all sorts of different roads. Urban streets with pedestrians are much harder than highway miles. The reports do list something about conditions but it takes a lot to compare apples to apples. (Apple is not one of the companies filing a report, BTW.)

One complication is that typically safety drivers are told to disengage if they have any doubts. It thus varies from driver to driver and company to company what "doubts" are and how to deal with them.

Google has said their approach is to test any disengagement in simulator, to find out what probably would have happened if the driver did not disengage. If there would have been a "contact" (accident) then Google considers that a real incident, and those are more rare than is reported here. Many of the disengagements are when software detects faults with software or sensors. There, we do indeed have a problem, but like human beings who zone out, not all such failures will cause accidents or even safety issues. You want to get rid of all of them, to be sure, but if you are are trying to compare the safety of the systems to humans, it's not easy to do.

It's hard to figure out a good way to get comparable numbers from all teams. The new federal guidelines, while mostly terrible, contain an interesting rule that teams must provide their sensor logs for any incident. This will allow independent parties to compare incidents in a meaningful way, and possibly even run them all in simulator at some level.

It would be worthwhile for every team to be required to report incidents that would have caused accidents. That requires a good simulator, however, and it's hard for the law to demand this of everybody.

Comments

Yes, it's really hard to get any sense of what's going on from these numbers. Or maybe I'm just being jaded, since at least this is real data with some king of regularization to it.

But on to the point - if google is clocking 5000 miles between disengagements now, where do you think they want to be before they release a version of this on the public? As you point out it's hard to compare this disengagement criteria to those that would cause real world issues, but if it were equivalent I'm guessing they'd want that number to be more like 1 per 100k miles or 1 per million miles. Your thoughts?

And once again, hard (lets just say impossible) to compare, but if they continue to get 4x better per year then 100k or 1M is still a long way off. Of course it's possible that disengagements that matter are already 100k apart. And what I'm wondering is if it's possible to guess if they will roll something out this year, or if we're talking 2020 for some small test effort on a private campus with no real civilian participation...

The interesting thing there, of course, is that developments in machine sensing - at least the kind that are driven by deep learning - are advancing at an astonishing rate. Google had a mature translation product that was eclipsed in a span of months by a deep learning competitor which outperforms it by margins that are ridiculous for the difficult language pairs. Arguably the kind of sensing that goes into cars is advancing even more quickly, which leads to the interesting observation that even someone who is starting out behind today might be catapulted forward by general developments. And of course, training data plays a big part in that and on that front Tesla is going to be 3 orders of magnitude ahead of everyone else by the end of this year.

Earlier, Google reported going as far as 83,000 miles between virtual accidents, though that was in the past and their longest interval. Now they would be better and perhaps their median interval is that good or more.

Nobody has decided what the target is, but I think that 500K miles would surely be good enough, and probably less.

Neural networks are getting better, but there is no question that the undisputed king -- again by a very large margin -- of neural network R&D is Google. (And when you say their translation engine was replaced by a competitor, it was one Google owned.) Google is now just a cousin company to Waymo, but I presume it will have access to Google's technology.

Training data is an interesting question. Generic training data is plentiful. What's more interesting is unusual situations. NHTSA is insisting all companies make all their data from unusual situations public, so everybody will have access to the same data in time, the question is what they do with it. Or at least all the data gleaned in the USA.

I suspect the Google team will not be ok with an airbag deployment every 500k miles, if that's what constitutes an interesting disengagement event. Maybe they will surprise me.

Data from "unusual" situations has special value to human analysis, and to certain types of algorithms which are not yet used in production control systems. For basic NN training unusual stuff needs to be represented in the dataset in approximate proportion to it's representation in actual use - so you need thousands or millions of "usual" examples for each "unusual" one. With more sophisticated training techniques, which are under development but still immature, you can dilute that, but generally working systems will require representations of all the situations they encounter, not just the ones that a human analyst might find interesting. So you need a huge number of examples of things going right for every example of things going wrong because that's representative of the distribution of real situations that must be managed.

To date the straightforward way to accomplish getting statistically useful cross section of all the situations, both usual and unusual, has been to accumulate a large database of training data. There are ways to make smaller datasets go further but all other things being equal there's still no substitute for having a lot of data, with no upper bound to 'a lot'. 10 million is better than 1 million. A billion is better than 100 million, a trillion is better than 100 billion and so forth. Any player who has a multi-order of magnitude advantage is going to be tough to overcome.

Of course, if your software runs on heuristics or more conventional statistical techniques and you do lots of simulation you can make better use of smaller datasets, but your system is probably not going to exhibit the flexibility and robustness that NN techniques bring to the table. Google started their effort long before NN techniques were dominating perception and even longer before they were being used for planning and other higher level analytical functions. I don't doubt that you can make a system work that way, but it's not going to be as good as one using the best NN techniques. This means that to be competitive with later entrants that a lot of their pre-NN functionality would be displaced by NN techniques and that's going to present a dilemma to teams making that choice. You have a team that has developed a heuristic, who knows that system well, and their program has accumulated significant experience with it. If a radically different but potentially superior technique arrives it's going to be hard to get people up to speed and then make a fair evaluation of the alternate technique. You might have to change the team quite a bit. Teams are generally reluctant to start over with a new skillset that nobody yet understands well. And even when you commit to the new skills deciding where to draw the lines and how to proceed generally does not go smoothly. Generally those changes won't be made until well after the new technique has been demonstrated to be superior in some parallel effort or competing program, and then retrofitting it in to the main branch of the project is going to present integration issues. Preserving the stuff that still has value while abandoning the stuff that has been superseded is a hard balance and you are almost always going to be making the wrong decision in the face of a competing technology that continues to grow and outperform the systems that you have already made big investments in. This is a central reason why companies rarely - essentially never - continue to dominate their industry after a paradigm change.

Of course it can be done, and the Google guys have excellent management in this area. If anyone can do it it's probably them, but it's by no means a sure thing.

There's one other area where lots of data is almost indispensable and that's in making the statistical case that a particular system is safe. It's possible to extrapolate a small data set into conclusions about a larger dataset if you have a good model of a system but I doubt that anyone is going to suggest that they have a good model of everything that can happen to a car in the real world. Musk has suggested that a billion miles is the kind of data you need to have in order to make a solid statistical argument that a self driving system is better than a human in the real world. That gives you maybe 1000 avoided airbag deployments and the opportunity to avoid around 10 fatalities so in that sense it seems like a good number. Anything much less than that and you can't be confident that it is better than a person over the full range of common driving circumstances. Tesla should have 2 billion miles on their HW2 platform by the end of 2017 and another 3.7 billion on their HW1 platform so it's conceivable that they could have the miles to make a good case if they have a system that works. But I'm not expecting anyone else to have even 1% of that. So it raises the interesting question of what kind of hand waving might be involved in getting regulators to agree to deploy a system that is, in the strictest sense, unproven. And is any conservative entity, in particular Google, going to be willing to do that? Currently I'm expecting them to wait until 2020 but I'm excited to think they will come up with a clever way to get into the field before events overtake them.

Correct, the unusual situation data is most valuable for making a test suite, but it does also offer an ability to create training sets for "what to do in this sort of situation" which is useful. It's one thing to get a neural network to figure out what lanes look like and how to stay in one. Quite another to train one to do the right thing in a dangerous situation.

But getting large piles of ordinary miles is not hard for anybody who operates a lot of cars, or whose customers operate a lot of cars. Companies like car companies, Uber, Lyft and many other fleet operators can gather it if they can get sensors on the cars.

To show safety you want to both show safety over a large dataset but you also want to show safety in lots of unusual and dangerous situations. Yet more proof that it stays in its lane in ordinary traffic becomes boring after a while. The more real the situations, the better, though artificially generated situations still can have value. Indeed, for unsupervised learning, you would put the car through a wide variety of dangerous situations until it figures out what gets through and what does not. But again, the closer to real, the better.

I am curious, what does everyone think the cost of the technology in the Google car is within a 90% to 100% assurance? If I were to look at disengagements in relation to dollars spent, I would be inclined to think a whole different conversation occurs. In that calculation, hardware is not necessarily more important than software, or technique. Does anyone know how the Google car performs on the Finnish or Swedish winter test tracks? Does anyone know what the actual processing platform that is embedded into the Google car is, and what it is in terms of cost? Has Google's technology been tested with various braking and steering actuators from more than one source? What is the cost to repair and re-calibrate the Google system in the event of a collision? How much more expensive is Google's physical platform to incorporate into a standard automobile form factor, versus it's current roof design? The questions involving designing technology for scale is still a question I do not know if the industry is agreement on. I have heard an automotive grade LIDAR sensor that is below $250 will not be in production platforms until 2025. Today is February 2017. Has anyone tested LIDAR performance in a heavily saturated and dense LIDAR vehicle arena where LIDAR signals are permeating the arena in density? Has anyone noted the latest DARPA research and assessment for LIDAR in dense urban canyons? Why does a parameter such as disengagement not necessarily overwhelm my inquisitive nature?

Almost nobody is treating cost as relevant. Those who are -- namely Tesla -- are making an error. This is computer and electronics hardware. Made in high volumes it almost always becomes very inexpensive. There is nothing that says this would not be true of lidars and radars and other such sensors. Even LWIR lidars, which can't use silicon, would be cheap if made in quantity.

However, today, it is not important that it be cheap. It is important it be safe, and whoever is the first to get to that level of safety takes the lead, almost regardless of cost.

Telsa cares about cost because they want to sell a car with autopilot today. That means they have to use parts available in large quantities at low cost today. As there are no lidars that meet that criterion, they don't use LIDAR. Others do not plan to ship for a few years, and so don't worry about the cost of their research sensors.

I'd like to address the assertion that "there is nothing that says this would not be true of lidars".

Comparing optical data and sensing equipment to silicon in terms of it's ability to radically decline in cost over time is a mistaken association. The cost learning curves of technologies vary enormously and very few processes have the learning curve that silicon data processing demonstrates. There's overwhelming evidence that optics technologies do not demonstrate silicon-like learning curves. Note that I am not referring to lithographically produced linear response devices like camera sensors, but to optical lenses, waveguides, mounting systems, and especially nonlinear optical sources like lasers and other NLO beam manipulation devices.

Camera lenses, for example, have been around as consumer offerings since long before computers entered that realm, but they've never seen a million fold improvement in cost performance over a span of a few decades despite enormous investments in optical manufacturing, dramatic growth in manufacturing volumes, and extraordinarily accurate scientific models WRT their underlying physical properties. For a direct comparison you might want to take note of how how the price performance improvement curve in networking interconnects degraded after the transition from copper based interconnects over to fiber optic interconnects. In the earlier period performance was driven by improvements in silicon and enjoyed several orders of magnitude of improvement in a couple of decades, but as the channel bandwidth properties of copper became problematic a transition to optical media necessitated the adoption of electro-optical transducers and optical interconnection systems, at which point improvement in link cost performance slowed to a relative crawl. This despite continued high rates of investment. Even today, decades after we hit the wall on copper bandwidth for wired ethernet interconnections and despite enormous increases in manufacturing volume the optical successors have not managed to make it out of the enterprise datacenter due to their inability to scale costs. Nonlinear optical devices, as much as they have improved over time, continue to be hard to manufacture with maddeningly high variation and associated low yields except for the most undemanding applications. Efficient coupling of microscopic electro-optical devices to the real world via lenses and waveguides continues to present manufacturing tolerance issues that have yet to be addressed by any highly scalable manufacturing process. That could change any day, but a lot of resources have been poured into this problem for at least the last 20 years and there is as yet no good candidate. At this point there is no expectation that high bandwidth optical links will ever become a consumer technology.

And so, while there is good reason to believe the price performance of data processing systems associated with self-driving vehicle technologies can be expected to continue their dramatic improvement in the coming years, I find any similar expectation of lidar technologies to be misplaced. Of course they will improve, but improvements in processing are very likely to be dramatically greater. In my 16 years in the business of designing optical systems I watched numerous efforts, both at startups and at industrial giants, march in with the conviction that they were going to bring radical improvements to the field. I mean, how hard can it be right? We have this great demo, and this lovely plan, we just have to get it into the factory. Give us a year, maybe two, just to work out the details. Except it never worked out that way and almost all of those efforts are gone, mostly with nothing to show for their labors aside from gaping holes in checkbooks.

I believe your analysis suggesting that the cost sensitivity of high duty cycle fleet vehicles is likely correct. Fleet operators will not care about an extra few thousand dollars for a sensor if it provides significant benefits in terms of safety or utility. But is lidar ever going to provide the utility to offset the cost? It's possible, but it's far from certain. Lidar will keep getting better, but then so will radar, and multispectral imaging, and thermal imaging, and commercial versions of these systems are much less expensive than lidar is today. As the computational ability to process sensor feeds grows by leaps and bounds the utility window for difficult and expensive sensors will shrink. In the end there will be a multitude of very cheap sensors backed by enormous processing power. In those fully mature systems lidar will only be present if you can make it for a price comparable to a camera, which is unlikely to happen. I say that because the camera based systems will be so good that lidar's incremental safety advantage won't matter.

So will it be useful for a window of time prior to when vehicle vision systems become adequate? That it would is a conjecture that a system employing lidar would be able to achieve a given desirable level of safety at a lower operating cost than a comparably safe non-lidar system. There was a time prior to 2012 when the world had no expectation that camera based systems would be able to do the job in the foreseeable future, so using expensive and difficult sensors was the only option. But now the expectation has shifted, at least in the opinions of people who have not already made a large investment in those expensive and difficult options, to expecting cameras to be able to do the job in the foreseeable future. And unless that expectation turns out to be wrong, there's not going to be a big role for lidar. Tesla is moving forward on this updated expectation. Other developers are largely still in the belt-and-suspenders world of trying all the sensors in parallel to see what works, and that's appropriate for them. Because aside from Tesla, nobody is fabricating hardware for a commercial system in volume. All the others are just development platforms.

It sounds like you believe Tesla is wrong, and so you must expect that Tesla will fail. Is that correct?

Yes, nothing matches what's happened (and keeps happening, though more slowly now) in silicon.

Still, most of the cost of a $75,000 Velodyne unit just comes from the fact they are hand-made in small quantities. Volume manufacturing and DFM and the dropping cost of the silicon portions (the sensors and the processing) are enough to make the cost reasonable. Making the cost dirt cheap (like Quanergy's $100 goal) involves innovation, but it's being done mostly on silicon.

The contrast between cameras and lidar involves two optical devices. More optical stuff in the lidar, but the camera still is an optical device, with a lens. Cheap because cameras are made literally by the billions.

1.5 micron lidar does not use silicon for the sensor, and that keeps it expensive, but even germanium gets cheap if you are ordering many millions. Again, just from volume mfg, not from any Moore's law. Moore's law is mostly helping the other electronics.

Is Tesla wrong? What I say is that Tesla is making a bet that seems wrong. It might pay off. It might be that computer vision gets good enough much faster than currently predicted. But it is unlikely that it will be computer vision plus lidar. The real truth is that if CV gets that good, the right answer will still be CV+Lidar, but it will be a cheaper, less capable lidar which is there to supplement and give superhuman sensing abilities to the whole system.

In fact, it would be fair to say that the debate today is "Lots of CV with a cheap LIDAR and radar" vs "Higher end LIDAR with decent amounts of cheap CV and radar." Tesla's bet for now is "All CV and radar." And there is debate about how much more radar can do.

(Should also throw in maps in the equation, and there are those who also want to throw in V2x.)

The answer, I feel, continues to be, "Use everything that gets you to safety faster, with cost a lesser factor." The superhuman abilities of lidar and radar are silly to throw away just for cost.

1) Does anyone know how the Google car performs on the Finnish or Swedish winter test tracks?
2) Does anyone know what the actual processing platform that is embedded into the Google car is?
3) Has Google's technology been tested with various braking and steering actuators from more than one source?
4) What is the cost to repair and re-calibrate the Google system in the event of a collision?
5) How much more expensive is Google's physical platform to incorporate into a standard automobile form factor, versus it's current roof design?
6) Has anyone tested LIDAR performance in a heavily saturated and dense LIDAR vehicle arena where LIDAR signals are permeating the arena in density?
7) Has anyone noted the latest DARPA research and assessment for LIDAR in dense urban canyons?

In case you did not notice, cost was thrown up as an easy hit for those ignoring the dirty details. The above questions are what an industry executive confronts.

  1. Probably not, I doubt the cars have been to Finland or Sweden.
  2. Yes
  3. They have made 4 vehicles, with a Prius, Lexus, a custom design and a Chrysler minivan. So certainly they have tested with what's in all of those
  4. That would vary greatly based on the type of accident. Calibration tends to be a software thing and not cost money.
  5. Why on earth would you want to? The early adopters want their cars to stand out and be different
  6. Yes. LIDAR interference is very minimal.
  7. I have not seen it, point us to it. Don't anticipate a problem there, what would be the source of one?

Brad, interested in this comment "If there would have been a “contact” (accident) then Google considers that a real incident, and those are more rare than is reported here."

Reading the Google DMV report from 2015, under the section explaining driver initiated disengagements related to safe operation of the vehicle:
"Through this process [simulation] we can determine the events that have safety significance and should receive prompt and thorough attention from our engineers in resolving them. In the reporting period, there were 69 events across our fleet in which safe operation of the vehicle required disengagement by the driver."

I interpreted this to mean that Google had tested each event where the driver chose to disengage and then reported only the events which were deemed safety critical after that analysis. Do you think that's incorrect?

There is no information that the 69 events all would have led to "contact" but that they all had some safety implication is clearer.

Understand that Google and other companies train drivers to grab the wheel if there is any doubt. Which means they should be grabbing the wheel many times when there was no actual problem, just a correctly paranoid safety driver. You would not want to judge the car's safety record based on how skittish the safety drivers are.

I am just guessing, but I would imagine that it would still be a safety issue if the car was late in recognizing an obstacle, but not so late as to hit it, for example.

Add new comment