How does a robocar see a pedestrian and how might Uber have gone wrong?

Topic: 

How does a robocar see and avoid hitting a pedestrian? There are a lot of different ways. Some are very common, some are used only by certain teams. To understand what the Uber car was supposed to do, it can help to look at them. I write this without specific knowledge of what techniques Uber uses.

In particular, I want to examine what could go wrong at any of these points, and what is not likely to go wrong.

The usual pipeline looks something like this:

  1. Localization (no indication of failure)
  2. Sensing (With LIDAR, radar, cameras, etc.)
  3. Sensor fusion (also takes place later in the chain)
  4. Classification (preliminary)
  5. Link to objects previously known, determine velocity
  6. Model future paths of all obstacles. Improve classification
  7. Detect possible incursions
  8. Plan a path forward
  9. Execute plan
  10. Send commands to car controls

Localization

Ideally, the car wants to know where it is on its map. This is a continuous process, and involves the sensing system. However, in this case the vehicle drove properly in its lane so there is no sign of failure here.

Sensing

I could write a lot about sensing here. All the sensors have different attributes. All of them should have detected the pedestrian fairly early, though radar has some limitations.

  • LIDAR is extremely reliable. The LIDAR would have sensed her, starting at least 90m out.
  • Radar has challenges on objects not moving in axis towards or away from the car. It also may only know what lane the pedestrian is in. But a good radar will have seen her. It might not report her horizontal motion well, or report she is in the planned lane until about 2 seconds before impact.
  • Cameras would have seen her. Motion sensing in the vision systems should have seen her. Stereo might not have detected her distance until 3 seconds before impact. Computer vision is a less reliable system but when it works should have seen her at the full range of the cameras.

Reader mgschwan generated a model of the Velodyne LIDAR view of this scene from 35m. That's the minimum distance if you want 0.5 seconds to observe and react, and 25m to stop. The pedestrian is very distinct, and would be much further away than that.

Sensor failure

Sensor failure is unlikely, and more to the point, it should have been detected if it took place. Almost all forms of sensor failure are obvious and would trigger an alarm to the safety driver immediately. The combination of sensor failure and lack of alarm seems unlikely.

If a sensor was deliberately off for experimental purposes, the other sensors still should have worked fine.

Sensor experiments

While it's entirely a rumour, the question was asked, "What if Uber was doing experiments driving without use of the very expensive LIDAR?" Many teams are hoping to build systems without it. While this makes a detection failure more likely, the other sensors still should have been easily able to handle this. Still, computer vision is both the most powerful tool and the least reliable, and so it could be a source of the error if the car tried to rely only on computer vision at night.

Sensor fusion

Sensor fusion is the effort by the perception system to combine data from various sensors to try and match data about the same object from different sensors. So you want to decide that the cloud of dots in the LIDAR is the same as a given radar ping or a set of pixels in a camera. Sometimes you do this on raw sensor data, but more often you like to do some understanding of that data to help fuse them. The ideal output is that for every obstacle you have a segment of the point cloud, a visual image, and a radar ping (telling you velocity and range immediately.)

Sensor fusion fails often, you can't always match up results. However, if any one sensor has a clear indication of an obstacle, it's still there, you just don't know as much about it, and you might see it twice.

Classification

The main goal of the perception system is to figure out what the objects are. Here, vision systems shine since they get a lot of pixels, and modern neural networks are good at this job, but not perfect. You can also get classifications from LIDAR points (which are more coarse) and even certain radar objects have special radar signatures, particularly legs in regular motion (walking, cycling.) Radar objects going very fast are almost surely vehicles.

How it can go wrong: Objects will often not be classified correctly. In order, usable classifications could include:

  1. Pedestrian walking a bike horizontally
  2. Pedestrian
  3. Unknown human-sized object
  4. Cyclist

Erroneous classifications such as motorcycle or car are not impossible, but rare.

If she were classified as a cyclist, that might lead the system to make mistakes, particularly if an even more serious error were made and she was classed as a cyclist moving with traffic, ie. somebody in the left lane, possibly thinking of turning left. This is a pretty large error to make because the bicycle is clearly orthogonal to the road, and she's also not riding it.

Temporal tracking

Many things improve once you can connect an object to its prior location in prior scans. with LIDAR, there is a new scan every 100ms (10 per second.) Cameras tend to run anywhere from 10 to 15 per second. Radar scans are faster. In any event, once you have a few locations, you can plot a course, a velocity vector. This will also help you learn what something is, and even see patterns of motion (like legs moving or wheels spinning.)

Usually this should not fail on a slow moving obstacle like a pedestrian which has no other objects near it. It should match easily with the earlier scans, and the path will be known. This can get harder in a big messy road full of obstacles.

Model paths

Once we have the past path of an object, if it's moving, we can try to predict its future path. Both a most likely path (which means it continues on its current course) or also a cone of potential paths. Knowing what something is helps. A car can't suddenly go sideways, but a pedestrian can. You can now make predictions on where things are likely to be in the very near future.

In this case, she took a very predictable path and it's unlikely the prediction would be bad.

(Iterate on all this)

It must be noted that it is a grand simplification to suggest that all these things happen in a specific order. Learning more about what objects are or how they are moving helps with sensor fusion. Classification helps with modeling paths. Tracking speed helps identify obstacles. In reality these various steps are mixed and combined. In particular most modern approaches look at motion through time extensively to help understand the world, as humans and animals also all do.

Detect possible incursions

Once you have a path or a cone of possible paths, you can consider if your planned path and the expected paths of the obstacle might intersect -- in other words, very bad news. You will be concerned about even entry into your lane or near to it, especially with a vulnerable road user.

In this case, once again, it is very basic. Her most likely path (straight forward) led her to collision with the vehicle on its planned path. Even if the planned path was to enter the right turn lane, she still clearly intersects it. Hard to get wrong.

Plan a path

Normally the vehicle is following a simple path: "Continue at this speed in this lane." Once a potential collision is forecast, this plan will change immediately. The first basic change will be "brake hard enough to avoid collision." In some situations, more specific options like braking and swerving should be considered. (The physics of tires says you generally always want to brake first, then release the brakes and turn at the same time if you have to swerve. Many drivers mistakenly keep the brakes pressed during their swerve and ask too much of the tires.)

Execute the plan

The new plan will be sent to a smaller computer controlling the drive-by-wire aspects of the vehicle. In some vehicles, this may mean use of the vehicles own existing DBW features over its own network bus, called the CANBUS. In many cases however, the DBW computer may control brakes, throttle and steering motor by direct connections to those circuits or pedals.

Failure of the DBW system is something that would also trigger a warning.

Trigger the brake fluid pump motor to apply pressure to the brakes

Finally an electrical signal will go down a wire to release the throttle, activate the brake pump or turn the steering motor. It is possible that these physical wires could have gotten loose and the signal did not go through. However, this would have to have just happened, since it would have been detected the last time the vehicle needed to brake, back in the town center no more than a minute or so before.

Some teams use a special DBW system that physically pushes the brake pedal. While this could fail, again it should not do so without detection.

Diagnostics within the code

There was almost surely an error with the large amount of software every robocar has that checks that everything is operating correctly. Modern critical software will have more than half its lines devoted to testing it. Every major action and operation will be preceded by checks that its inputs make sense, and that its outputs are within reason. If those tests fail, an alert will reset certain operations and inform the safety driver.

This is also true for every major hardware component. If the LIDAR stops sending point clouds, this should not go without notice. If the radar gets no pings, if the camera goes black or very noisy, all these things should trigger alarms. If the car tries to apply the brakes the it does not slow down, that should trigger an alarm.

It does not appear that any alarm was triggered. That in itself is the 2nd worst error, after whatever it was that failed in the chain.

Other unlikely things and notes

If the safety driver accidentally kicked the vehicle out of auto-drive mode, it would just start coasting and slowing down on the uphill. But normally it would also make some audible signal that it left auto-drive mode.

If all systems lost power, the car should again have switched to coasting. Since the car was on a slight uphill grade, this should have been obvious as well. And again, there should be alarms, unless the alarms were not on their own power.

In general the presence of a walking pedestrian in the middle of a high-speed, divided road should be cause for immediate caution, even if it is not predicted that they might enter the planned path of the vehicle. This should trigger slowing, and possibly even a "stay sharp" alert to the safety driver. It is unknown if Uber does this in their system.

Uber some months ago changed their sensor configuration from a large array of different extra LIDARS to mainly the Velodyne on top. This makes the vehicle blind in LIDAR in some areas very close to the car. Those blind areas have no bearing on this incident. (It is possible that the extra sensors had redundant coverage of the forward area, in which case they could have helped if it is somehow the case that the Velodyne did not see the pedestrian.)

The general desire in all robocar systems is to expect and tolerate failures. Things should fail safe, or ideally fail operational. That means that even if the main parts of the system suffer catastrophic failure, other redundant components can at least sound alerts and slow the car down quickly or ideally get it safely to the side of the road.

What are the leading contenders?

Nothing looks really probable. So my thoughts include:

  • Testing with camera only which suffered a perception failure due to errors in computer vision
  • A plain old stupid bug, triggered by something unusual, which also remained undetected. The trigger must be unusual for this not to show in regression tests, simulators and other road testing
  • Undetected failure of communications to the brakes and throttle
  • Classification of the pedestrian as a cyclist riding north (ie. not across the road)

Comments

"Classification of the pedestrian as a cyclist riding north (ie. not across the road)"

I've thought about this one, but wouldn't the car have at least slowed down in this case? Moreover, Arizona statutes §28-735 requires a driver to leave at least three feet of space when passing a cyclist. I don't think classification as a cyclist riding north would have been enough to cause the crash.

One thing I've considered is whether it's possible that the car classified her as something which can be ignored, though I'm not sure what that would be. Something along the lines of rain or dust or smoke, but none of those things seem likely.

"Undetected failure of communications to the brakes and throttle"

How can that be undetected? Wouldn't the car know that it's not slowing down (and therefore detect the problem very quickly)?

"A plain old stupid bug, triggered by something unusual, which also remained undetected."

Maybe. But it would have to be a bug in a system without redundancies. Or simultaneous failure of multiple systems.

"Testing with camera only which suffered a perception failure due to errors in computer vision"

This one seems the most likely, except for the fact that I assume the "safety driver" would have been made aware of this and therefore have been extra vigilant. But maybe not.

If Uber really was testing with camera only, at night, with only one human in the car, add that to the long list of reasons why they really really messed up big time.

I think it would be odd to classify her as a cyclist riding north in the left lane. Very odd. But I list it because of one thing. If they classed her as another vehicle rather than a pedestrian, then no, you don't always slow down because of that. You should, for a cyclist, but you probably don't. Here, a cyclist (using the bike lane) might reasonably (but riskily) decide on a clear road to head over to the left turn lanes to make a left turn. If the Uber thought that is what it saw, and furthermore it was going the other way to make a right turn, it might decide, "I don't need to slow for that." Though as the sensors saw her move into the Uber's lane, that decision should have been changed.

I'm trying to think of reasons why a vehicle would not brake at all, and there are not many.

If "you don't always slow down because of that" is a reference to me personally, as a human driver, I (and I would hope the vast majority of human drivers) would *certainly* slow down if there were a cyclist with no rear lights riding at night and taking the lane, especially if I could barely see them. Maybe it's too much to ask of a computer right now to realize how odd and dangerous that situation is, but once we switch back to considering computer drivers, and not human drivers, I think what people actually do, rather than what autonomous vehicles should do, becomes irrelevant.

Moreover, how do you make this mistake with LIDAR? Within less than a second the LIDAR should realize that there's no significant forward movement to the unknown object in the left lane, shouldn't it?

If the LIDAR was turned on and working, it should have seen something of a significant size moving from the left lane to the right lane. Unless that unknown object was classified as something safe to drive through without slowing down (dust? smoke?) the car should have tried to slow down.

I wish Uber would at least clarify whether or not the LIDAR was intentionally turned off. I can't even think of a legal reason why they wouldn't want to admit or deny that right now, as it's something that will eventually become public information.

We are grasping at straws trying to figure out why a high end system would fail in this way. Some of that speculation imagines the LIDAR is off for experimentation.

But the other point is exactly what I am making. Robocars still don't know all the human dynamics of the road, they don't know that a bicycle doing that would be an odd thing. (LIDAR doesn't see lights on a bike of course but sensor fusion should join the camera image and LIDAR points.)

At an intersection like that, it is the case that a bike will cross the lanes to get to the left turn lanes to turn left. Usually not if a car is coming up from behind without first confirming that car sees you and is slowing for you. This is the sort of dynamic that maybe the car doesn't understand as well as a person does. Because there has to be something here.

I guess I just don't understand. The woman was moving from left to right, from the left side of the left lane to the right side of the right lane. I don't see how the system could confuse that with a bicycle which has already successfully moved from the bike lane on the right to the left-hand side of the left lane (more than 3 feet away from your vehicle so you don't even have to slow down).

It would be a perception error. But perception errors are not impossible. However, the key thing about this is that the car would now be judging the "cyclist" as something it doesn't need to slow for. If it perceives her has a pedestrian, it should slow. If it perceives her as a pedestrian who is going to walk into the car's lane, it should stop.

Would it slow for a pedestrian in the left lane who has crossed from the bike lane to the left lane?

The far more important part is the location, and I don't see how the car could see the woman but not know which lane she was in.

As I understand it, the car didn't brake *at all*. If it did brake, just not until the woman entered the right lane, and it was too late, I could see whether she was a cyclist or a pedestrian making a difference. (She seems to have entered the right lane at least 2 seconds before impact, though, which should have been enough time to stop.)

Is a perception error which puts her in the wrong lane and moving in the wrong direction (but doesn't affect the cars ability to maintain the lane) possible? If so, does it even matter if it saw her as a pedestrian or cyclist?

Again, I add that I don't know when the "safety driver" took over. It's possible that the "safety driver" didn't have her feet near enough to the brake to hit it right away. I would have thought that Uber would not have said that the vehicle was in autonomous mode if it left autonomous mode a full 2 seconds before impact, though.

And there is obvious some major flaw somewhere in the Uber system, but yes, I would expect a car to slow if it saw a pedestrian on the road anywhere. My presumption is that the only vulnerable road user it would not slow for is a cyclist it thinks is going away.

The police have said the car did not brake, but it is possible it did and they (and the safety driver) did not know it, though it should have been noticed as a hard brake.

As to what errors are possible, obviously some strange error happened. The best we can do is say that some are very unlikely. Uber has not said much at all so far.

"Testing with camera only which suffered a perception failure due to errors in computer vision"
This was my first thought also as soon as I read that the LIDAR might have been disabled. Sounds very reckless if that is the case, and strange that they would simply disable the LIDAR instead of using it as part of a backup system.

"A plain old stupid bug"
Possible I suppose.

"Undetected failure of communications to the brakes and throttle"
And it could also have been a mechanical failure with the brake system.

"Classification of the pedestrian as a cyclist riding north"
I don't really understand how it could be classification error either. Wouldn't the car conclude it should brake if there is a slow moving object directly in front of the vehicle no matter what it is classified as?

As with a failure of the DBW system, this sort of failure is unlikely because it would have had to have just happened, because the car would have used the breaks less than a minute before in the downtown area it came from. It's possible -- but really, losing the brakes just at the moment you encounter a pedestrian walking a bike across the road at an illegal crossing? What are the odds?

I agree that a mechanical failure that would have happened exactly at that moment seems very unlikely, but since we are grasping at straws as you put it, and this was a very rare type of accident for a robot car (it has only happened once if I am correct?) so it could have been cause by an unlikely failure mode.

Anthony says:
>>>One thing I've considered is whether it's possible that the car classified her as something which can be ignored, though I'm not sure what that would be. Something along the lines of rain or dust or smoke, but none of those things seem likely.<<<<

How about a tree? If you tilt the 3D simulation down to the POV of the vehicle is it possible the upright person and bicycle might have been classified as a distant stationary tree that the vehicle was passing (of which there were many on both sides of the road)?

As I understand it, LIDAR measures the distances, and the vehicle has access to the entire 3D model. Hopefully someone else will clarify and/or correct me on that, though.

To LIDAR, this would not happen. Nor to a stereo camera (Uber has a lot of cameras so I presume they have stereo) once closer than 40m depending on the baseline. Motion parallax -- seeing the pedestrian move against the background image because of the car's motion -- should see the distance too unless she is walking at the exact speed to compensate for that. The only thing that gets confused as to distance is computer vision based classification, which is not perfect but should not confuse a person with a tree.

If you aren't aiming for perfection, you can use heuristics. For example, you can assume that if you can see the car ahead of you safely drive through the space, then the space is very likely to be clear of any hazards, and any lidar/radar returns must be something inconsequential, such as plastic bags blowing in the wind. Using such heuristics will require the safety driver to take over in the rare occasion where something hazardous does encroach in the lane between passing cars, oblivious to your oncoming headlights. But the safety driver is already reportedly taking over every 13 miles or something like that, so such a heuristic wouldn't materially increase the workload on the safety driver. I have no idea whether Uber uses such heuristics or not, just speculation on what is possible.

While I think the lidar-off theory is going to be correct (especially since I'd have expected Uber to quickly squash it as a theory if it had not been the case), the other thing that seems 'unusual' is the strong light-to-dark transition at the edge of the street light's coverage. I know early vision systems had a lot of trouble with shadows and overhangs and other things that could be mistaken for giant holes and barriers. I wonder if there was some heuristic at play that said "I know this transition looks scary, but it's fine and suppress the normal warnings you might emit here" and it was too aggressive.

Can we assume that the car is always aware of the number of lanes? Is it possible that the car's system could misidentify an empty lane as not being part of the road? If so might the car's system reasonably decide to ignore a slow moving pedestrian that it concludes (incorrectly) is not on the road? Just wondering.

Each car is different, however, most advanced cars have a detailed map of everywhere they drive and certainly know the number of lanes and where all the markers are.

There are efforts by people who think they want to drive without a map, to figure the map out as they go. The fact that you might make mistakes is one reason most teams don't try this, though a mistake about how many lanes there are would be a pretty bad one; a system like that you would not want to deploy.

However, in this case the woman was in the Uber's lane. Even if the Uber braked 400ms after she entered the lane, it still would have slowed a lot and possibly not killed her.

Add new comment