How does a robocar see a pedestrian and how might Uber have gone wrong?
How does a robocar see and avoid hitting a pedestrian? There are a lot of different ways. Some are very common, some are used only by certain teams. To understand what the Uber car was supposed to do, it can help to look at them. I write this without specific knowledge of what techniques Uber uses.
In particular, I want to examine what could go wrong at any of these points, and what is not likely to go wrong.
The usual pipeline looks something like this:
- Localization (no indication of failure)
- Sensing (With LIDAR, radar, cameras, etc.)
- Sensor fusion (also takes place later in the chain)
- Classification (preliminary)
- Link to objects previously known, determine velocity
- Model future paths of all obstacles. Improve classification
- Detect possible incursions
- Plan a path forward
- Execute plan
- Send commands to car controls
Ideally, the car wants to know where it is on its map. This is a continuous process, and involves the sensing system. However, in this case the vehicle drove properly in its lane so there is no sign of failure here.
I could write a lot about sensing here. All the sensors have different attributes. All of them should have detected the pedestrian fairly early, though radar has some limitations.
- LIDAR is extremely reliable. The LIDAR would have sensed her, starting at least 90m out.
- Radar has challenges on objects not moving in axis towards or away from the car. It also may only know what lane the pedestrian is in. But a good radar will have seen her. It might not report her horizontal motion well, or report she is in the planned lane until about 2 seconds before impact.
- Cameras would have seen her. Motion sensing in the vision systems should have seen her. Stereo might not have detected her distance until 3 seconds before impact. Computer vision is a less reliable system but when it works should have seen her at the full range of the cameras.
Reader mgschwan generated a model of the Velodyne LIDAR view of this scene from 35m. That's the minimum distance if you want 0.5 seconds to observe and react, and 25m to stop. The pedestrian is very distinct, and would be much further away than that.
Sensor failure is unlikely, and more to the point, it should have been detected if it took place. Almost all forms of sensor failure are obvious and would trigger an alarm to the safety driver immediately. The combination of sensor failure and lack of alarm seems unlikely.
If a sensor was deliberately off for experimental purposes, the other sensors still should have worked fine.
While it's entirely a rumour, the question was asked, "What if Uber was doing experiments driving without use of the very expensive LIDAR?" Many teams are hoping to build systems without it. While this makes a detection failure more likely, the other sensors still should have been easily able to handle this. Still, computer vision is both the most powerful tool and the least reliable, and so it could be a source of the error if the car tried to rely only on computer vision at night.
Sensor fusion is the effort by the perception system to combine data from various sensors to try and match data about the same object from different sensors. So you want to decide that the cloud of dots in the LIDAR is the same as a given radar ping or a set of pixels in a camera. Sometimes you do this on raw sensor data, but more often you like to do some understanding of that data to help fuse them. The ideal output is that for every obstacle you have a segment of the point cloud, a visual image, and a radar ping (telling you velocity and range immediately.)
Sensor fusion fails often, you can't always match up results. However, if any one sensor has a clear indication of an obstacle, it's still there, you just don't know as much about it, and you might see it twice.
The main goal of the perception system is to figure out what the objects are. Here, vision systems shine since they get a lot of pixels, and modern neural networks are good at this job, but not perfect. You can also get classifications from LIDAR points (which are more coarse) and even certain radar objects have special radar signatures, particularly legs in regular motion (walking, cycling.) Radar objects going very fast are almost surely vehicles.
How it can go wrong: Objects will often not be classified correctly. In order, usable classifications could include:
- Pedestrian walking a bike horizontally
- Unknown human-sized object
Erroneous classifications such as motorcycle or car are not impossible, but rare.
If she were classified as a cyclist, that might lead the system to make mistakes, particularly if an even more serious error were made and she was classed as a cyclist moving with traffic, ie. somebody in the left lane, possibly thinking of turning left. This is a pretty large error to make because the bicycle is clearly orthogonal to the road, and she's also not riding it.
Many things improve once you can connect an object to its prior location in prior scans. with LIDAR, there is a new scan every 100ms (10 per second.) Cameras tend to run anywhere from 10 to 15 per second. Radar scans are faster. In any event, once you have a few locations, you can plot a course, a velocity vector. This will also help you learn what something is, and even see patterns of motion (like legs moving or wheels spinning.)
Usually this should not fail on a slow moving obstacle like a pedestrian which has no other objects near it. It should match easily with the earlier scans, and the path will be known. This can get harder in a big messy road full of obstacles.
Once we have the past path of an object, if it's moving, we can try to predict its future path. Both a most likely path (which means it continues on its current course) or also a cone of potential paths. Knowing what something is helps. A car can't suddenly go sideways, but a pedestrian can. You can now make predictions on where things are likely to be in the very near future.
In this case, she took a very predictable path and it's unlikely the prediction would be bad.
(Iterate on all this)
It must be noted that it is a grand simplification to suggest that all these things happen in a specific order. Learning more about what objects are or how they are moving helps with sensor fusion. Classification helps with modeling paths. Tracking speed helps identify obstacles. In reality these various steps are mixed and combined. In particular most modern approaches look at motion through time extensively to help understand the world, as humans and animals also all do.
Detect possible incursions
Once you have a path or a cone of possible paths, you can consider if your planned path and the expected paths of the obstacle might intersect -- in other words, very bad news. You will be concerned about even entry into your lane or near to it, especially with a vulnerable road user.
In this case, once again, it is very basic. Her most likely path (straight forward) led her to collision with the vehicle on its planned path. Even if the planned path was to enter the right turn lane, she still clearly intersects it. Hard to get wrong.
Plan a path
Normally the vehicle is following a simple path: "Continue at this speed in this lane." Once a potential collision is forecast, this plan will change immediately. The first basic change will be "brake hard enough to avoid collision." In some situations, more specific options like braking and swerving should be considered. (The physics of tires says you generally always want to brake first, then release the brakes and turn at the same time if you have to swerve. Many drivers mistakenly keep the brakes pressed during their swerve and ask too much of the tires.)
Execute the plan
The new plan will be sent to a smaller computer controlling the drive-by-wire aspects of the vehicle. In some vehicles, this may mean use of the vehicles own existing DBW features over its own network bus, called the CANBUS. In many cases however, the DBW computer may control brakes, throttle and steering motor by direct connections to those circuits or pedals.
Failure of the DBW system is something that would also trigger a warning.
Trigger the brake fluid pump motor to apply pressure to the brakes
Finally an electrical signal will go down a wire to release the throttle, activate the brake pump or turn the steering motor. It is possible that these physical wires could have gotten loose and the signal did not go through. However, this would have to have just happened, since it would have been detected the last time the vehicle needed to brake, back in the town center no more than a minute or so before.
Some teams use a special DBW system that physically pushes the brake pedal. While this could fail, again it should not do so without detection.
Diagnostics within the code
There was almost surely an error with the large amount of software every robocar has that checks that everything is operating correctly. Modern critical software will have more than half its lines devoted to testing it. Every major action and operation will be preceded by checks that its inputs make sense, and that its outputs are within reason. If those tests fail, an alert will reset certain operations and inform the safety driver.
This is also true for every major hardware component. If the LIDAR stops sending point clouds, this should not go without notice. If the radar gets no pings, if the camera goes black or very noisy, all these things should trigger alarms. If the car tries to apply the brakes the it does not slow down, that should trigger an alarm.
It does not appear that any alarm was triggered. That in itself is the 2nd worst error, after whatever it was that failed in the chain.
Other unlikely things and notes
If the safety driver accidentally kicked the vehicle out of auto-drive mode, it would just start coasting and slowing down on the uphill. But normally it would also make some audible signal that it left auto-drive mode.
If all systems lost power, the car should again have switched to coasting. Since the car was on a slight uphill grade, this should have been obvious as well. And again, there should be alarms, unless the alarms were not on their own power.
In general the presence of a walking pedestrian in the middle of a high-speed, divided road should be cause for immediate caution, even if it is not predicted that they might enter the planned path of the vehicle. This should trigger slowing, and possibly even a "stay sharp" alert to the safety driver. It is unknown if Uber does this in their system.
Uber some months ago changed their sensor configuration from a large array of different extra LIDARS to mainly the Velodyne on top. This makes the vehicle blind in LIDAR in some areas very close to the car. Those blind areas have no bearing on this incident. (It is possible that the extra sensors had redundant coverage of the forward area, in which case they could have helped if it is somehow the case that the Velodyne did not see the pedestrian.)
The general desire in all robocar systems is to expect and tolerate failures. Things should fail safe, or ideally fail operational. That means that even if the main parts of the system suffer catastrophic failure, other redundant components can at least sound alerts and slow the car down quickly or ideally get it safely to the side of the road.
What are the leading contenders?
Nothing looks really probable. So my thoughts include:
- Testing with camera only which suffered a perception failure due to errors in computer vision
- A plain old stupid bug, triggered by something unusual, which also remained undetected. The trigger must be unusual for this not to show in regression tests, simulators and other road testing
- Undetected failure of communications to the brakes and throttle
- Classification of the pedestrian as a cyclist riding north (ie. not across the road)