UMich team works on perception and localization using cameras

Topic: 

Some new results from the NGV Team at the University of Michigan describe different approaches for perception (detecting obstacles on the road) and localizations (figuring out precisely where you are.) Ford helped fund some of the research so they issued press releases about it and got some media stories. Here's a look at what they propose.

Many hope to be able to solve robotics (and thus car) problems with just cameras. While LIDAR is going to become cheap, it is not yet, and cameras are much cheaper. I outline many of the trade-offs between the systems in my article on cameras vs lasers. Everybody hopes for a research breakthrough or computer vision breakthrough to make vision systems reliable enough for safe operation.

The Michigan lab's approach is a special machine vision one. They map the road in advance in 3D and visible light by using a mapping car equipped with lots of expensive LIDAR and other sensors. They build a 3D representation of the road similar to what you need for a video game engine, and from that, with the use of GPUs, they can indeed create a 2D image of what a camera should see from any given point.

The car goes out into the world and its actual camera delivers a 2D frame of what it sees. Their system then compares that with generated 2D images of what the camera should see until it finds the closest match. Effectively, it's like you looking out a window and then going into a video game and wandering around looking for a place that looks like what you see out that window, and then you know where the window is.

Of course it is not "wandering," and they develop efficient search algorithms to quickly find the location that looks most like the real world image. We've all seen video games images, and know they only approximate the real world, so nothing will be an exact match, but if the system is good enough, there will be a "most similar" match that also corresponds with what other sensors, like your GPS and your odometer/dead reckoning system, tell you about where you probably are.

Localization with cameras has been done before, and this is a new approach taking advantage of new generations of GPUs, so it's interesting. The big challenge is simulating the lighting, because the real world is full of different lighting, high dynamic range, and shadows. The human system has no problem understanding a stripe on the road as it moves through the shadow of a tree, but computer systems have a pretty tough time with that. Sun shadows can be mapped well with GPUs, but shadows from things like the moving limbs of trees are not possible to simulate, as are the shadows of other vehicles and road users. At night, light and shadows come from car headlights and urban lights. The team is optimistic about how well they will handle these problems.

The much larger challenge is object perception. Once you have a simulation of what the camera should see, you can notice when there are things present that are not in the prediction -- like another car or pedestrian, or a new road sign. (Right now their system mostly is looking at the ground.) Once you identify the new region, you can attempt to classify it using computer vision techniques, and also by watching it move against the expected background.

This is where it gets challenging, because the bar is very high. To be used for driving it must effectively always work. Even if you miss 1 pedestrian in a million you have a real problem because there are billions of pedestrians encountered by a billion drivers every day. This is why people love LIDAR -- if something (other than a mirror or sheet of glass) sufficiently large is sufficiently close you, you're going to get laser returns from it, and not from what's behind it. It has the reliability number that is needed. The challenge of vision systems is to meet that reliability goal.

This work is interesting because it does a lot without relying on AI "computer vision" techniques. It is not trying to look at a picture and recognize a person. Humans are able to look at 2D pictures with bizarre lighting and still tell you not just what the things in the picture are, but often how far away they are and what they are doing. While we can be fooled in a 2D image, once you have a moving dynamic world, humans are, generally reliable enough at spotting other things on the road. (Though of course, with 1.2 million dead each year, and probably 50 million or more accidents, the majority because somebody was "not looking," we are far from perfect.)

Some day, computer vision will be as good at recognizing and understanding the world as people are -- and in fact surpass us. There are fields (like identifying traffic signs from photos) where they already surpass us. For those not willing to wait until that day, new techniques in perception that don't require full object understanding are always interesting.

I should also point out that while lowering cost is of course a worthwhile goal, it is a false goal at this time. Today, maximal safety is the overriding goal, and as such, nobody will actually release a vehicle to consumers without LIDAR just to save the estimated 2017 cost of LIDAR, which will be sub-$500. Only later, when cameras get so good they completely replace LIDAR safety capabilities for less money would people release such a system to save cost. On the other hand, improving cameras to be used together with LIDAR is a real goal; superior safety, not lower cost.

Comments

This technology looks very promising. Like you, I think that LIDAR is probably essential but this kind of camera based feedback can't hurt as additional corroborating information. On that theme, I wonder if other things humans do not see might be useful like infrared. This could help spot things that are likely to be in motion that you especially do not want to hit like people and engines. And on the theme of corroborating visuals, could stationary cameras mounted on poles (they're already present on many) tell the car from a 3rd person perspective what the scene is like and therefore help confirm what it thinks it knows from first person? Thanks again for your great blog.

But they are not a useful solution, because you either need them or you don't. If you need them, then you can't drive anywhere you haven't had people install the external sensors, and that's a deal-breaker, you will wait centuries for that to get everywhere.

If you don't need them, then, well, why use them?

The exception of this would be special, dangerous places. For example, we put mirrors up on tight corners for human drivers, they are a form of external sensor. So if you need the external sensor only in very special places, and you can arrange to get them put in all those places, you might use it.

Infrared is a bit expensive today, but I think it's interesting. However, it is still a vision system by and large. And sadly, it often happens that the background reaches human body temperature in some areas, so you no longer can easily pull out the humans. At night it's great, particularly for deer, but if it's not 100% it is not too useful.

I don't agree with your logic on this one. It does not have to be a simple "you either need them or you don't" argument. I would ask the question "does it pass a cost/benefit assessment?"
To use your example of infra-red not being too useful if it is not 100% effective. If while driving at night an infra-red sensor picked up a large heat signature behind light vegetation such as a deer, it could alert the programming algorithms that there was a % chance of danger and then the vehicle may reduce its speed a certain % as a precaution. It is just what a human does when dealing with uncertainties.
If the deer suddenly ran across in front the reduced speed could prevent a collision. If not, the inconvenience of slowing down is very minor.
It would be a very brave person to try to estimate how cheap technology can become when mass produced. If extra sensors provide a large enough benefit for the cost out laid, then they may be worthwhile additions. A cheap but rarely needed sensor may only have to prevent a tiny number of accidents to be worthwhile. In the end, the big picture regarding accidents ends up being a percentage game.

The problem is that "safe enough" is already a pretty high bar. Indeed, you want to go as far beyond that as you can, but right now people are fighting with the cost of "safe enough." Thermal sensors are actually quite expensive, especially at high resolution.

For the external sensor, the question is, are you safe enough without it? And the answer has to be yes, because you won't have it everywhere -- only the sensors built into your car are everywhere. Once you are safe enough, how much safer can you become with a sensor that adds some safety some of the time? Well, if "safe enough" is very high, then only a tiny bit safer. Which makes it harder to justify the cost if it's expensive. If it's not expensive, of course you will do it.

As noted, if it's a special area where you are not safe enough without an external sensor, you could consider trying to get one in, but such areas had better be very few!

Add new comment