Tesla also dismisses high-precision maps, again in disagreement with most teams. Here's why.

Topic: 
Tags: 
Example HD map from Navteq/Here with all the texture of the road an position of objects in the environment.

After dissing LIDAR, Elon Musk has also declared that high precision maps, as used by most robocar teams are also a "really bad idea." He wants his cars to drive with only a modest map. That's cheaper and, if you can do it, goes more places, but most teams feel that is not the fastest path to safety, and foolish, since a car that can drive without a map is a car that can make a map, and memory of what a road looked like before is a useful thing, as long as you are flexible enough to deal with when it's changed.

In comment #1, see my new Forbes.com essay on the issues in mapping. Elon Musk declares precision maps a "really bad idea"

Comments

There are quite a few comments in your article that make me wonder if you've actually done work with Neural Nets.

For instance:
"Fortunately, if your map is detailed, it is immediately apparent, thanks to the perfect memory of computers, that the road has changed."

This doesn't make any sense. Neural Nets do not have perfect memory. The very basis of operation is to compress incoming data into the weights between nodes.

Additionally, a neural network has to ingest an HD Map exactly how a human ingests a paper map. It needs to "look" at it, and compare it with the sensor readings coming through, to try to see if it can align the map with sensor data. This is not analogous to human memory, or how a human would internally represent a region in their brain.

We would need to also, like humans, store the map in the neural network, not as an HD Map, but as NN matrices.

As the NN drives, it needs to constantly compare the sensor readings with the map. But a difference doesn't immediately mean that the road has changed, it could be errant sensor data. The NN has to constantly evaluate the data coming from the sensors. It has to do this with or without an HD Map. Is this dirty data from the sensor? Is it a reflection off a mirror? Or is that a person trying to walk across the street? Is that stationary object I see a pole or a person?

At every time slice the NN needs to make predictive decisions. The HD Map doesn't benefit that much, since now the NN has to also compare the sensor data with the map, and then decide if the map is correct, data has changed, or if the sensor is outputting errant data. The HD Map doesn't exist as part of the NN, so all of this has a computational cost.

the NN still needs to label everything it sees, but it now ALSO needs to spend computational resources, mapping the things it sees with the HD Map. In order to speed this up, a great deal of pre-processing, or even another NN would need to be trained to ingest the HD Map at speed, to feed the NN driving, but at all times, the driving NN needs to be able to ingest the sensor input, with or without the map.

Is that a person? Is that a bike? A car? The speed that is required to make this assessment isn't helped if the NN has to consult a map, see if the object appears on the map, and then try to align the map with reality. Humans don't consult pictures of every intersection to identify light posts because doing so would be costly and slow. The data is stored in the network itself.

If we're going to replicate human driving, with the speed that humans do it at, we're going to need to store the "map" inside the network itself, and not save it as a huge, superfluously detailed, "HD Map."

At the end of the day, the primary source of input is sensor readings. Once we have processing power above and beyond what humans possess, then we can afford to spend additional clock cycles, comparing reality with maps, updating them on the fly, and sharing them between cars. But we don't have human-level processing power available. We're struggling with a tiny sliver of that. We need to be extremely judicious with our computational budget. I suspect that the benefits of HD Maps are overstated, and are coming from a very human-centric understanding of the world, or rather how we believe we work.

Localization on the map is not typically done with neural networks. Rather you have two images of the world, one in your maps, and one from your sensors. You distill them down to a more manageable size and then you look for the difference. First you attempt to find where in the map (you know your rough position from GPS and many other things) most looks like your sensor data. That tells you where you are Then you see just how closely it matches. If there are regions where it does not match, the world has changed. You could use neural nets for some of this, but generally that's not the right approach, though the searching problem (looking through your database of images to find the match) has some similarity to neural net convolutions.

For most teams, the neural network does not drive. It does things like segment and understand the road and recognize obstacles. Then, while some teams attempt to have a neural network make driving decisions from that, many do not.

Generally, the use of neural nets for path planning when fed directly by sensor data into the network is by those who are trying to drive without a map.

You interestingly make a lot of strong claims, but I am unclear on what basis you believe your claims can be made with such conviction.

At this point in time, there are no LVL5 driving AIs. Another way of saying this is that NOBODY has solved any of this.

you could use neural nets for some of this, but generally that's not the right approach, though the searching problem (looking through your database of images to find the match) has some similarity to neural net convolutions.

Since nobody has solved this, claiming that something is not the right approach seems foolhardy and sets you up for a fall.

My point was that working with an HD Map requires additional computational resources. Google might have opted to not use NNs to process this (I refer to this as pre-processing in my OP, but that's probably poorly labeled on my part), but I suspect the optimal solution for performing this is still an NN given the differences in sensor data and aligning that with the HD Map, sensors are imperfect, sensor data never produces exactly the same output, and changes in lighting, angles, reflections, shadows, make for a very difficult search space. If you're not using an NN for this, then you're going to have a lot of inflexibility, weather, lighting conditions, object placements, are all going to pose matching difficulty and would result in lots of false positives/negatives for changes.

You handwavium this all away saying "if there are regions where it does not match, the world has changed." If not built to be extremely robust and flexible, all that will happen is that every single time your car hits the road, EVERY SINGLE FRAME reports "the world has changed" at all times, rendering it useless.

In hindsight, that's maybe why google is limiting their experiment to such a small area. Are they storing hundreds of photos of every area in different weather, lighting, seasonal conditions? If that's what they are doing, it is completely untenable and wholly unscalable.

Can you tell me which teams you know of are using neural networks to localize on their maps? It seems like a pretty odd way (and very computationally expensive if it did provide an advantage) to me. Please point me to the teams. There was an experimental research paper last year on localizing from google streetview images with neural networks but it seems this would be hugely expensive and it was not as accurate as existing localization algorithms.

https://www.google.com/search?q=localization+using+neural+networks

There are literally dozens of teams attempting it and writing papers about it.

At the O'Reilly AI Conference last month, there was a panel where it was being attempted.

You've been exposed to one working solution and assume it's the only/best way it can be done, even though the state of the art is moving insanely fast as processing power continues to make itself available.

I get it, Google uses HD Maps, they're one of the leaders in AI. You've worked with them. That's fine, they might even have the best implementation of autonomous driving so far (the top of a pile of failed implementations). But I only take issue with your conviction that "THIS IS THE WAY IT MUST BE DONE, THIS IS THE BEST WAY" because.... ??? Because a bunch of people have failed so far, and that's how they all did it?

Or is this merely a result of how a salesman who deals in marketing his own knowledge must present himself to make his sales pitch?

Can you link me to the paper that shows that NNs are slower, worse, and more processing intensive than other better solutions? I haven't been able to find it.

Please avoid this sort of personal, antagonistic discussion style if you want to participate here or get responses.

Nobody is saying it's impossible to localize with neural networks, and in fact I pointed to papers I had seen on that. You entered here with a brash claim based on the assumption that detecting the difference between your map and your live view is very hard because that's hard for neural networks, but it is not the case that everybody localizes with neural networks. It's a minority, to the best of my knowledge.

The search suggested is not a good one (I had done it before) because it mostly finds research on localization using fuzzier inputs than LIDAR or machine vision. Better to search for specific mentions of localization for cars on roads. Which of the major teams do you claim are localizing on their maps using neural networks?

And once again, pick a different discussion style NOW.

Is the goal of using the HD Map merely to localize? Is the entire point to say "I am here?" or does it go further and identify objects in the scene?

That is a light post (not a human), that is a mailbox(not a bike), that is a fire hydrant (not a child)?

My understanding is that the reason why you would need HD Maps, is because of all that extra data, not merely to determine current location. And I'd assume that the data in the scene is all labeled.

To localize is to just say "ok, using this image, this is where I am at (down to the centimeter)" and if that's what we're talking about, it's pretty rudimentary the kind of data needed, far less than an HD Map.

My understanding is that the HD Map provided additional functionality, like allowing the computer to assign probabilities to the objects in the scene. "This thing I see on LIDAR has a high probability of being a human because it's not on the HD Map therefore has a high probability of walking into the intersection once the light changes" kind of thing.

What algos are used to extract the objects in the sensor data and compare them with the HD Map? AFAIK, one of the best tools for that job is an NN and everything I keep googling indicates that most are using NN's for this kind of identification. If it's just to localize, you don't need an HD Map, can't you make due with much less data and simpler algos. If you're going to take advantage of an HD Map to assist object identification and additional meta data, the computational load increases, as does the algorithmic requirements. What am I missing?

Once we get to the point where we're using the HD Map for object identification, we run into the problems of unreliable sensor data, and stale data. It's raining, and phantom objects are appearing all throughout the lidar data, how do I quickly sift through all of them to identify what is real and what isn't? How do I know that it's the LIDAR that's giving me bad data, and not because there is construction happening and the HD Map is stale?

At any given point in time, what I "see" might not match the HD Map for multiple reasons, not necessarily just because reality changed, but because my sensors got blinded by the sun, or it's raining, or it's foggy. If we're forced to navigate in a world where you have to make decisions on unreliable sensor data, and HD Maps can go stale, how much value is the HD Map providing?

If you have a map, you need to localize on it. Any kind of map really. So naturally your map is used in localization. That is not its only function. It does assist in things you say, and more -- understanding objects, identifying objects that are similar to and in the same location as those in the map, noting objects that are not in the map. Noting objects that were in the map that were not. Noting the location of traffic lights and road signs and their meanings. Knowing where the lines are. Knowing what's there. As I point out, knowing that the lines of an off-ramp gore are in fact an off-ramp gore and not a newly appearing lane. Knowing the shape of the road and the location of potholes and bumps which may be hard to perceive with sensors.

Maps have a very wide ranging set of values. And yes, of course, people use NNs extensively in perception and classification, and you can use maps to assist that, or you can use it at a higher level -- the NN says there is a stop sign at (x,y,z) and the map also says this -- very high confidence. The NN sees green traffic lights at A, B and C and a red light at D, and the maps says to expect lights in those locations, but the one at D is a turn signal only for the left lane, or whatever else it might be. The road has this shape so if you see a puddle 3m long at this location, it means the water is 10cm deep. One can go on and on with what maps can tell you. Which is why people want them.

NNs don't drive. Some early stage R&D projects are exploring NNs that can handle very basic driving. NVIDIA showed an end-to-end system a while back. Waymo has their mid-to-mid ChauffeurNet project:
https://arxiv.org/pdf/1812.03079.pdf

These are interesting projects, but the driving they do is almost laughably crude compared to real self-driving cars in operation today. Real self-driving cars use NNs to provide information to hand-built driving software. Over time the NNs improve and sections of hand-built code can sometimes be eliminated. Waymo's famous example from the early days is NNs trained to recognize pedestrians reduced their pedestrian detection error rate by something like 95% almost overnight. Another example is Tesla training their image recognition NN to estimate distance as well as object type.

As NNs grow to provide more information with higher accuracy, people talk about NNs "taking over" parts of the driving code. But a total takeover is nowhere in sight.

Not only is it nowhere in sight, it's a terrible idea. Even humans don't use solely a NN to drive. We use rules that are taught to us too.

A more interesting question is whether or not a NN will be used to make some driving decisions. They should be, I think. Simple, relatively safe decisions, like when to switch lanes, for instance, would possibly benefit from a NN. The decisions would be recommendations, so maybe that doesn't even qualify as a decision.

NNs are good for inductive reasoning. Not as much for deductive reasoning. Driving well requires both.

I think the closest we'll ever come to having a NN make all driving decisions would be if we had a NN that read the drivers manual and then programmed itself.

NNs are great where you can learn statistical patterns from lots of data. They are being used to make predictions about other cars and what they are doing. They are watching human patterns to figure out how to adjust speed and change lanes and steer more smoothly the way humans like. I think they can have value in how you interact with pedestrians in intersections (constrained by some hard rules as well) and for picking which lane to travel in, when to change lanes to get to an exit, avoiding bad drivers etc. Lots of driving decisions that NNs can contribute to.

Tesla talked briefly about the idea of looking at an intersection, with all its lanes feeding in and out, its right turn ramps, islands etc. While there's a lot of variation there is a reasonable and finite set of possible intersection types which exist. Tesla can watch human drivers drive every type, moving from every lane to every other lane, and learn human ways to do it. That could be handy for training driving.

The other area of interest is driving to break the rules of the road, which humans do all the time. A company will not want to program a car to break the law. But it could let a customer drive for a long time to train a network to drive as they do.

Humans DO SOLELY use an NN to drive.

You even said yourself:
"We use rules that are taught to us too"

The word TAUGHT is when we take data -> in this case "the rules" and encode them in the NN.

Humans are just one big NN, or more specifically, many little NN's connected together by other NNs.

If humans can do inductive and deductive reasoning, by definition NNs can do both. WE ARE NNs. Period. End of story.

Our artificial NNs aren't as good yet. I have my own ideas as to why, and I'm sure they're wrong, along with many other AI scientists, but make no mistake, we are NNs, so whatever we can do, eventually an artificial NN can do.

I think that's overly reductionist. You present no evidence for the assertion that humans are no more than NNs, and I'd say it is trivially untrue due to, among other things, the role that biochemistry plays in human decisionmaking. But moreover, I think it misses the point. Using a neural network to encode things that can be more efficiently and accurately encoded in other ways would be stupid. It is, frankly, never going to happen.

What?

Our brains are "biochemistry" AND they are neural networks. I am utterly baffled by your use of that word, since you seem to believe that biochemistry is somehow something entirely different.

I grant that I have presented no evidence that humans are no more than NNs. I'm not sure what "constitutes" more, but we do know that the human brain is made of up Neurons that do most of the heavy lifting. The role of glia and Astrocytes is an exciting new area, and might help us build better networks once we understand how these are contributing to the wiring, but at the end of the day, it's a neural network. If your "more" is something akin to "quantum processing" scientists have already largely killed that eventuality. The role of all the neurotransmitters, and why we have many different kinds is still a ripe area for understanding, but again, still just part of the network. While our artificial neurons don't quite operate the same way, and effectively only have "one" neurotransmitter, so there is definitely opportunity to improve design, assuming evolution's use of many neurotransmitters is to improve learning (rather than, perhaps, do the heavy lifting of controlling the physical construction of a brain), but there is no evidence so far that our brain (ALL brains) is nothing more than a neural network, a wildly complex, wonderfully efficient one.

The design of artificial neural networks was modeled on the human neuron. And it WORKED. It simply worked. Once we had enough processing power to run them, these artificial neural networks started learning, started processing data in ways that we had never been able to achieve before.

All that's left is improving them. Making them deeper and wider, properly wiring them up, segmenting them into logical groups, improve their efficiency, possibly come up with newer designs that increase rate of learning.

There is still much for us to learn from biology. It spent billions of years refining and improving the biological brain. We've spent 30 years trying to replicate.

Brains are more than just neural networks, but more importantly, humans are more than just brains.

It's interesting that you say that artificial neural networks WORKED right after saying that all the autonomous driving teams have failed. Artificial neural networks have done a lot. But they can't do everything that humans can do, and they never will, because humans are more than just neural networks.

https://www.google.com/amp/s/www.theverge.com/platform/amp/2019/5/23/18637358/cruise-gm-self-driving-unprotected-left-turn

"Cruise has developed an algorithm that can figure out how far it can “creep” into the middle of many intersections before trying to make a left turn." It's interesting to think about how they do that. Maybe not a neural network, but it definitely sounds like machine learning. And how far to creep out is a driving decision.

And there are many examples like this, going back to the earliest days of NNs. They are very useful tools for this sort of thing.

My PhD research project is motivated exactly at this need: reducing the dependency of a very detailed map for self-driving localization. Among the reasons, HD maps require constant updates and performing such task might be, among many other reasons, costly and time consuming. It is very interesting news for me that Tesla goes towards this direction and I think it is going to be payful!

Is it possible that Tesla is using their fleet of cars to build maps on the fly (x3), the next car adds more info to the map (x2), and then after enough samples and cars driving down the road over and over again they achieve (x1)? I think they've forgone relying on a 3rd party to provide the map because they have millions of cars on the road. They are the map maker.

That is definitely what they do, but their "map" that they build this way is not a detailed map with exact position of things, rather a more abstract representation, at least at the bulk of locations. There is speculation that they are storing more about particular locations with problems such as certain intersections.

Add new comment