NTSB holds hearing on Uber Fatality -- pedestrian out of crosswalk issue didn't play a role.

Topic: 
Tags: 
Robert Sumwalt, NTSB chairman

NTSB is holding its live hearings on the Uber robocar fatality. I have detailed live coverage (found in comment #1) which I am updating. Lots of new details, including the fact that what was previously reported --- that Uber's car could not identify pedestrians outside of crosswalks -- turns out to be wrong and played no role in the accident. We also learn that the victim was very high and more.

Read my live coverage and updates in my Forbes article at NTSB hearing blames humans, software and policy

Comments

Firstly, a significant portion of the article is dedicated to an explanation of shoddy reporting on Uber’s ‘failure to do jaywalker detection.’ Even the author admits to shoddy reporting, despite the key facts now being available for a year. This outcome and admission underscore how ‘former experts’ like the author are trying to continue to make money and personal legacy off this lucrative (yet so far largely fruitless) field by emplacing themself as a semantic translator for systems that the engineers themselves have difficultly explaining. The public needs the facts and independent review, if not for truth for the preservation of public safety.

Facts available since 2018 (BI article):
* Lidar saw her 6 seconds before impact and recognized her as potentially motile.
* Uber ATG struggled with trees and braking for these phantoms made the ride too jerky for the Dara/Softbank demo.
* Email evidence shows Meyhofer et al crippled the car’s braking ability, despite internal concerns about safety.
* The Dara/Softbank demo would mean over $10m to Meyhofer and big payouts to engineers.

The crash happened because the software was crap and Uber harbored a culture of greed over safety. Yes the safety driver failed too, but putting the driver in that situation was as negligent as Uber executives allowing in their network criminals with not just priors but on-the-job assaults and crimes.

“all tested vehicles, while generally better than Uber’s, have flaws which would lead to a crash with a negligent safety driver, and to blame those flaws would be to blame the idea of testing this way at all“

Yes!! It is *unquestionably unsafe and negligent* to drive at 45mph with a forecasting and planning system that can’t predict pedestrians at those speeds!! This testing was done brazenly without interest for the public, and the NTSB has failed us.

The author here fails us even more, effectively trying to protect brazenly unsafe testing that resulted in a fatality.

https://www.businessinsider.com/sources-describe-questionable-decisions-and-dysfunction-inside-ubers-self-driving-unit-before-one-of-its-cars-killed-a-pedestrian-2018-10

https://eandt.theiet.org/content/articles/2019/09/uber-allegedly-discourages-staff-from-disclosing-crime-reports-to-police/

The three causes identified by the ntsb were pretty obvious very early on. I'd like to see more focus on the abysmal design flaws in both the product and the testing process (Uber shouldn't have been doing that type of testing on public roadways at all in its abysmal state). But ntsb isn't the organization to do that. Let the private organizations work on best practices and let the states figure out what to do with companies like Uber. Arizona has thrown them out (with regard to Uber ATG), right? Other states should ban them as well. I'm fine with self-regulation. But self-regulation comes along with consequences when your failure to self-regulate kills someone.

NTSB has a pattern of looking into everything. They have to because in their world, saying, "We didn't really look into all those things because they were not what made this crash interesting" is not an answer. What if they're wrong?

NTSB did rule that Uber has fixed the things they identified, which will make it much harder for other states to throw them out.

In the end, two things will come from this tragedy, and yes, they were known before the investigation. Namely, have gaze monitoring on your safety drivers, and get more safety oriented in designing your testing procedures in general.

What I don't know is how many teams have done that. Right after the crash -- literally a week after -- some academics released a gaze monitoring tool open source. It may take a little time to toss in a camera but one hopes that most have done this by now.

I don't think it has to be that fancy. Its very presence will stop things like fiddling with a phone, because you know you will get caught. But it will also do things like saying, "You seem to be getting fatigued, time to take a break"

As I wrote, I think Tesla's numbers suggest a solo safety driver can be OK with this (Tesla doesn't have this but has wheel torque warnings) and while two is better, it's not vital.

Uber's software was crappy for something that had received so much work. But everybody's software was that crappy once.

I'm not sure why other states can't throw them out. If you kill someone by driving recklessly in one state, can't all the other states revoke your license there too?

Uber's actions (not just the safety driver's actions, but the decision to put the car on the road in the first place) were, in my opinion, reckless. Perhaps all the other companies were just as reckless at one time. I doubt it, but maybe. The other companies didn't kill someone due to their recklessness, though. Maybe they were lucky, but that's the way driving regulations tend to work. It's easy to get a license, and if you get lucky, you can drive recklessly for quite some time and get away with it. But if your reckless driving kills someone, you're probably going to lose your license. And when you lose your license in one state, you're probably going to lose it in all the others too.

If the only two things that come from this tragedy are to have better safety drivers and monitor them better, it is a travesty. The design of the software was fundamentally flawed. Furthermore, it was so bad at recognition and prediction that it must have been known ahead of time that this sort of incident was inevitable.

I know, you think safety driver testing is the only way to improve recognition and prediction. I completely disagree. Shadow testing is just as effective at this. It's only once you think you have a finished product with regard to recognition and prediction that you need to do testing with safety drivers. Uber was obviously not at that level yet. Perhaps no one is, though Waymo seems to be there in limited driving situations, like where they're testing with no humans in the car.

You say Uber has fixed the problem. Apoarent Uber doesn't even agree with that, as they're still not doing the kinds of extensive testing they were doing before. (So in that sense I guess they did fix the problem. They're not testing that way any more. Are they doing shadow driving? I don't know.)

I don't say it's the only way to improve recognition and prediction. I do think it's the most effective way, though, but the bigger reason is to encounter more and more unusual situations and test how the car handles them in the real world, not in shadow or simulation.

I don't know Uber has fixed the problem, but they convinced the NTSB they fixed or are fixing the problems NTSB pointed out.

But the NTSB did not say, and I agree, that having a flawed perception system is a blocking factor here. I don't understand the suggestion that you can only road test once you no longer need to road test.

I don't understand the suggestion that you can only road test once you no longer need to road test either.

Testing is what you do when you think you've finished developing, at least in some definable circumstances, though. Its not testing if you know you're going to fail. It's not testing if the only variable is whether or not someone is going to cross the road in front of you.

Furthermore, shadow testing is testing. You can tell just as much about how good you are at recognition and prediction by shadow testing. And if you were going to have two humans in the car while safety driver testing, instead you can have the second human looking at the "tape" and scoring the car on how good its recognition and prediction is.

Beyond that, what do you gain from safety driver testing? If the car does something dangerous, then you get the knowledge that it would do something dangerous. But presumably you aren't trying to let the car do something dangerous. Or are you? It doesn't make sense.

I should change my criterion though. The car should be in charge when you think it's safer for the car to be in charge. Uber wasn't there yet. Maybe it is now. Multiple companies are there in limited situations. Even then it's a cost to society if the driving wouldn't be done anyway. That's likely how Google got a lot of its pre-safety-driver data. Recording data while it was on the roads with a human driver doing mapping anyway. Uber has skipped that step, and it was deadly. Odd that they wouldn't utilize the obvious source for data collection: their vast network of rideshare drivers. They'd have to provide modifications or even provide whole cars cars to select Uber drivers, but it'd be more cost effective than hiring safety drivers.

One of the implicit corollaries to my the above paragraph is that you build features for testing that you don't plan to use in production. Driver gaze tracking might be one. Automatic emergency braking (with a simple system separate from the autonomous one) might be one. Notifications to the driver to take over whenever safety-critical things are unclear (like if a car changes into a bike changes into an unknown object with only seconds before impact) should absolutely be one.

The first two of those is already something Uber has discovered, which is good, and the ntsb did recognize it. I didn't see any mention of the latter.

To build a level 4 car, you should first build a good level 2 car. I know, you'll fight against that, but it's true.

(It's also clear, I think, that Uber was using lidar as a crutch. Their computer vision sucked, and their sucky computer vision was most likely the primary technical cause of the crash. Lidar hid that problem. With that said, they could have used lidar much more effectively, as a safety backup system rather than as the primary recognition and prediction system. They could have alerted the driver to take over whenever the lidar detected something that the cameras couldn't categorize within x seconds before potential impact, for instance.)

You can test perception with shadow testing, yes, though it makes the problem harder, because it is much harder to tell when it made an error. You can compare what the system decided to what the test driver did and see very gross things, but not subtle things which can just be the differences in driving style between the test driver and the system. So you make it harder and more expensive and slower -- and it's already, as can be told from the many billions spent, pretty hard and expensive and slow.

And you don't get to test the full system response. If the vehicle evades due to perception errors (or correct perception) then we test how it perceives the new scene not encountered in the shadow testing, and the new situation. You can do this in sim but it's not the same.

BTW, independent AEB is something I know has been considered early on -- I proposed it -- and it turns out that the main reason Uber did not do it was that their radar and Volvo's where the same band.

I do think Uber could use their rideshare fleet, I have aid that years ago.

As for building level 2 first, I agree you end up doing that, but not deliberately. What I have said is that the technologies you get from having level 2 as an endgoal are a distraction from a real robocar. In building a real robocar, you will get a level 2 car out of it, but not built the way anybody builds a level 2 car.

The role of the vision vs. the lidar is not yet revealed, though I have some hints. The problem was in their object persistence, triggered by poor quality perception. Had they had good object persistence, the perception errors would not have been enough to crash on their own.

Surely these companies are not detecting errors in perception solely by whether or not the vehicle does something so dangerous that a safety driver is forced to take over. That simplistic approach can work for games, but it's too dangerous and too time consuming for building a self-driving vehicle.

There are numerous ways to detect errors in perception, both with human review and automatically. Both human and automatic review can be live or on logs. As one example, when the classification of an object is constantly changing, that's a perception error. As another, when you make a drastic prediction error, often you can roll back the logs and see that a perception error caused it.

Yes, you can also compare what the test driver did to what the car would have done. A simplistic comparison wouldn't be very useful, but over time you can develop a sophisticated system that understands the expected deviations vs. the deviations that are caused by perception errors.

You can do all of these things at once by building a neural network that detects errors in perception at one point in the logs based on what happens later in the logs (both the sensor logs and the logs of the driver inputs).

While building a level 4 car, you also build a lot of other things that don't go into the level 4 car itself. One of them, probably, is a level 2 car.

Uber should use their rideshare fleet. And they should ditch lidar in order to use much much more of that fleet. (They can always add lidar back into the final product.) And they should ditch safety driver testing until they have a much better prototype. Which, fortunately, they already seem to have mostly done, although I fear this awful NTSB report might embolden them to prematurely ramp it up again.

You detect errors many ways. You will start off the road as you say, until you get a decent quality. Even so, in the early years of a project, it seems that going out and driving produces pretty frequent disengagements -- you have seen the numbers in the California reports -- that are frequent enough to keep the software teams busy just fixing the tickets caused by the disengagements that come from software and hardware issues and not safety driver caution.

So yes, during a time, you are finding problems at such a rate that it can be your prime source of trouble tickets. Today, there are better techniques, including simulation, to also generate more trouble tickets more easily, but there is a limit to them.

I am not sure too many would agree with you that "you can always add LIDAR back into the final product." If you do this test driving without LIDAR, you are going to get lots of events where there was a perception error but it's clear that error would not have happened with a LIDAR. You are going to spend a great deal of time trying to identify those (and of course a huge amount of time if you fix them) when you don't need to if you're going to have a LIDAR. LIDAR doesn't make mistakes on the distance of targets. It doesn't make errors due to varying lighting or it being night. It doesn't "not see" an object just because it doesn't match something in the neural networks. Its errors are around not classifying well enough because of its low resolution, or missing very small things that only return a few points. They are quite different.

I think it's pretty clear that you can always add lidar later. If nothing else, you can add it as a backup system, as a form of AEB.

Your argument seems to be that this is wasteful, because without lidar you will have to fix flaws in the vision system that you otherwise wouldn't have had to fix.

I actually agree with you that avoiding lidar during development has a cost (in that more miles of training data are needed if you don't have lidar). On the flip side, without lidar you can get that training data much more cheaply. It's still unclear how the two balance out, but as time goes on without any lidar-using team releasing a production quality level 4 robocar that can drive in most places where people drive (and unless they're being very quiet about it, not even being anywhere close), I think it becomes more clear that the cost of lidar is higher than the cost of no-lidar (in terms of building a level 4 robocar that can drive in most places where people drive). I think almost all of the teams working on building a self-driving car are grossly underestimating, by orders of magnitude, the number of miles of training they're going to need.

Yes, you detect errors in many ways. I don't think Uber's bottleneck was in the error detection area. I think their bottleneck was (and is) in the error fixing area.

Simulation is fairly useless unless you have a good simulator. I doubt Uber had (or has) a very good simulator. If they did, they'd have a good robocar. As I'm sure you know, it's relatively easy to train a car to perform extremely well in a simulator.

You need lots of miles of data to build a good simulator. Perhaps it's easier to see that these miles can be ones where a human does the driving?

You act like fixing "flaws" you don't have to is a good thing. It's a misdirection of energy and resources.

For example, radar tells you how fast things are moving. Thus once you fuse a target, you don't need to have a vision system that can tell how fast things are moving. That your vision system makes mistakes in estimating speed is no longer a "flaw" because you are not intending for it to do that.

With LIDAR, you have fully reliable information on how far away a target is. Because you can merge the pixels of LIDAR and camera, now you can use the camera for the thing its higher resolution is good at, things like classifying targets, figuring out the body language or facial expressions of pedestrians. That you may not have reliable distance figures is no longer a flaw, and thus the wrong place to put energy.

Fixing flaws you don't "have to" might be a good thing. But I admitted it can be a waste. The question is whether it's more of a waste than lidar. I doubt it.

(Incidentally, I don't think you can merge data from lidar with pixels, at a different resolution and different number of frames per second, from cameras, as perfectly as you imply. That process in itself is a lot of work and introduces additional errors. Merging data from cameras with data from lidar loses information, and potentially loses critical information.)

“While generally two safety drivers are better than one…”

I doubt this, because at many times, each driver might assume the other is acting; they are highly likely to collaborate in defeating any safety mechanisms that are enforced.

But nobody should take my, OR YOUR assumptions without testing them. That's not how to do Safety!

The primary safety driver is behind the wheel and is to watch the road as much as any full time driver. The second driver tends to monitor the software and do anything that needs to be done without eyes on the road. However, they don't do that 100% of the time, so they are often looking at the road. Having no access to the pedals, their role there would be to shout "watch out." There is thus little chance one might assume the other is acting.

The other role is to enforce discipline. You won't watch a video if you have somebody in the car with you. Or if, somehow you did it would be much more safe. For example, I could see the main driver saying, "I am going to take my coat off, watch the road" and that would still be pretty safe, though not ideal.

Contrary to mr Vroom I would judge Brad Templetons two recent Forbes reports to be probably the best reading out there on the NTSB investigation, that I have seen.

About the jaywalking classification, there were some wordings in the report that even left board member Jennifer Homendy confused about what the Uber ATG ADS could and could not detect or what it would presume after assigning a label to the object. You can watch her interviewing staff member Ensar Becic about this issue at 1:31:58. And again she returns to the subject once more at 1:46:50 still unsatisfied on what to make of the explanations. Brad Templeton first report was clear, I think, that the reclassifications followed by history resets was the bigger issue.

The meeting: http://ntsb.windrosemedia.com/11052019/

About jaywalking, where I live, as a driver you are not expected to yield for them. Has Uber ATG now reprogrammed its ADS to yield for jaywalkers? If yes, I think that could backfire.

One detail, Brad T wrote: ”It was also reported that pedestrian crossings at this location are rare (though were more common that night due to a concert) and there is no recollection of another pedestrian incident in this location.”

The concert was not at the night of the accident in March, but in in June when NTSB was conducting a study that found 66 pedestrians and 12 bikes passing during 24 hours. This is talked about at 1:27:34 in the meeting.

The lack of a discussion around the cameras is really the big remaining mystery!

Will fix about the concert, but the reason I mis-heard is there was also a party down by the river on the night of the incident, where she got high, and she was leaving it, so there might have been extra ped traffic that night. Nobody really recorded that.

I don't think it's right that there was not much pedestrian traffic there. Governments don't put up a "no pedestrians" sign in a random spot where people rarely cross anyway.

"Whenever there's a concert or a party by the river" is not rare. In fact it's likely frequent.

I think "you are not expected to yield to jaywalkers" was the defense used by the murderer in the Charlottesville car attack.

You are not required to yield to jaywalkers, but you have a duty to avoid them if it is practical to do so. The victim crossed in violation of the law. The Uber hit her in violation of the law, but in the latter case, police decided not to prosecute, with the judgment that "she came out of nowhere" -- even though later it became clear that this was not the case. Of course the victim was not ticketed for her violation of the law.

You are expected to yield to jaywalkers who are in the middle of a roadway and trying to cross.

Police don't prosecute.

Police often do not issue a citation when there is a fatal crash, because issuing a citation can interfere with a successful prosecution (for a more serious crime such as manslaughter) by the office that does prosecutions.

The police issued a report recommending that Vasquez be charged with manslaughter.

Been a while since I looked at the law, but I don't believe it is a requirement to yield, but a perhaps subtly different requirement to always take steps to avoid an accident if you can do so. This is not just true of jaywalkers but of cars in your ROW. If somebody cuts in front of you, you are not allowed to say, "ha, ha, I have ROW I am going to hit him." That does not mean you have the duty to yield in the sense of not having ROW.

Yes, I mean the states attorney, which was handed this case, decided not to prosecute.

I'm not talking about a statute. You are expected to yield. Period.

It's misleading to say the "states attorney" decided not to prosecute. The Maricopa County attorney recused itself because it was biased in favor of Uber. It then went to the Yavapai County attorney, who declined to prosecute Uber but referred Vasquez back to Maricopa County.

Maricopa County hasn't acted on that referral, and I believe the statute of limitations might have run out. So Vasquez got lucky. If it wasn't for the confusion over what to do when a self-driving car kills someone, and possibly if it wasn't for the Maricopa County attorney not wanting to drag Uber's name further through the mud than necessary, she would have surely been prosecuted or forced to make a plea deal.

Her best defense against manslaughter would have been that Uber had not trained her properly, and she thought the car could just drive itself. I'm not sure if it would have succeeded.

By the way, the NTSB found that Vasquez was the primary at-fault party. Then Uber. Then Herzberg. I'd have swapped Uber and Vasquez. Uber was the primary at-fault party, in my opinion. Then Vasquez, then Herzberg. Then Arizona for really poor design of that intersection (non-intersection?). It looks like an intersection, but then there's a sign saying not to cross there.

(You know what common law is, right? The law is not just statutes. That's one reason why it's an extremely difficult AI problem in itself just to describe the law in computer code.)

Sorry, yes it was the Yavapai county prosecutors. I recalled they had taken the case out of Maricopa and for some reason thought it had gone to the state. In any event, my main point remains, that there was no prosecution and it seemed unlikely at this point. I read the section of the Arizona vehicle code on this back at the time, it's a different section than about right-of-way.

Yes, there was no prosecution, and it seems unlikely at this point.

The vehicle code is completely irrelevant to all of this. Arizona doesn't have a statute in the vehicle code for vehicular homicide. A driver who kills someone by driving recklessly in Arizona is charged with regular old manslaughter.

Add new comment