Computational photography will turn the photo world upside-down
The camera industry is about to come crashing down thanks to the rise of computational photography.
Many have predicted this for some time, and even wondered why it hasn't happened. While many people take most of their photos with their cell phones, at this point, if you want to do serious photography, in spite of what it says on giant Apple billboards, you carry a dedicated camera, and the more you want from that camera, the bigger the lens on the front of it is.
That's because of some basic physics. No matter how big your sensor is, the bigger the lens, the more light that will come in for each pixel. That means less noise, more ability to get enough light in dark situations, faster shutter speeds for moving subjects and more.
For serious photographers, it also means making artistic use of what some might consider a defect of larger lenses -- only a narrow range of distances is in focus. "Shallow depth of field" lets photographers isolate and highlight their subjects, and give depth and dimensionality to photos that need it.
So why is it all about to change?
Traditional photography has always been about capturing a single frame. A frozen moment in time. The more light you gather, the better you can do that. But that's not the way the eye works. Our eyes are constantly scanning a dynamic scene in real time, assembling our image of the world in our brains. We combine information captured at different times to get more out of a scene than our eyes as cameras can extract in a single "frame" (if they had frames.)
Computational photography adds smart digital algorithms not just to single frames, but to quickly shot sequences of them, or frames from multiple different lenses. It uses those to learn more about the image than any one frame or lens could pull out.
What it does is very interesting, but one key factor is it lets you get better results with lesser lenses and sensors. What used to take a big lens and a big sensor gets done almost as well by smaller, cheaper, lighter ones. Ones we can have in our phones, or in some cases, dedicated computational pocket cameras.
Even the most serious photographers take pictures with their phones, or with tiny point-and-shoot cameras. Photographers have a saying, "the best camera is the one you have with you." Light and pocketable is important in getting those photos you would never have been carrying a 20lb camera bag for.
Up to now, however, there have been a lot of types of photos that are seriously compromised when shot on a phone or small camera. And those who want the best won't accept that. The problem for the camera industry is not that their big cameras won't continue to be better at some things. The problem is the list of things they are necessary for is going to shrink. The more that list shrinks, the harder and harder it becomes to justify carrying the big, heavy, expensive camera. Those cameras get relegated only to particular special photo shoots.
One such target was the total solar eclipse of Aug 21. For that you want extreme close up (it's only a few degrees across in the sky) and you have the highest contrast and most dramatic natural phenomenon in existence. Even the best cameras can't capture it, everybody agrees. But to get anything, you need that serious camera.
Other things that are a long way from being solved without a big camera are low light situations with a changing scene of moving targets. Especially if they aren't that close to you. Professional sports photography. Wildlife photography. Very dim rooms with moving people. Shots from moving vehicles or in high wind. And for now, expensive photography like weddings or advertising shoots.
The trick about computational photography is to use time to compensate for the poor size of your lens and sensor. If your scene isn't moving much, you can combine a series of quick images to get most of the information a big lens would have captured in one shot. You can digitally stabilize the images so that even though the camera is not being held still, you can align the frames together. You can even see what parts of the frame are stable and which are changing, and track different components and improve them independently.
In a way, this is what the human brain does with the fairly wimpy camera in our eyes. The eye's sensor is sharp only in the middle, and the lens not so large but the brain combines signals from 2 cameras over time to give you what you think is a sharp view of the world.
What can you do integrating over time and lenses?
- You can gather more total light without blurring, to capture better in low light.
- You can go through a collection of frames to make use of only the least blurry or distorted.
- You can factor out the noise that is the bane of small sensors to make it as if you had much bigger ones.
- You can take pictures at different focal depths to develop a 3D map of the image, and create a shallower depth of field. You can also get the ability to choose what you wanted to focus on after the fact, similar to capture of the whole light field.
- Alternately, you can get infinite depth of field if you want it, even from a faster lens.
- You can take advantage of small or deliberate wobble in the system to generate "sub pixel" resolution. If your camera moves half a pixel between two frames, you can actually resolve things in between the pixels. You can also improve colour accuracy this way.
- You can stitch together overlapping photos to generate a wider angle of view without needing a physical wide lens or lens change.
- You can capture at different exposures to gain vastly greater dynamic range than any single frame can have.
This is not everything, but it's a lot. As noted above, the hardest challenge is what to do if things in the scene are moving. If they are, you won't capture frames you can trivially combine. You can get smart, and see what's moving and what's not, but you'll have less quality on the things that are changing than on the background. Sometimes that's perfectly fine -- you even want moving things to blur to show their motion -- but other times they are your prime subject. But unless the motion is dramatic, tools will continue to get better and better at it.
One way to avoid problems on moving objects is to have multiple lenses and sensors take the same picture. There, things don't move but there will be small amounts of parallax on close objects -- but the computation is getting good at dealing with that. In fact, one can even use pictures from different lenses spaced apart to generate "synthetic aperture" and get a gain in resolution.
Consider the static landscape. By panning a lens around, you'll be able to get as wide a field of view as you like. By moving it slowly you'll be able to eliminate noise and get sub-pixel resolution. Shoot for a few seconds with a cheap lens and capture what an expensive view camera used to get.
Already the latest iPhone offers a very good "portrait" mode that simulates shallow depth of field, and customers are loving it. Some startups such as Light are promising a camera that is not much bigger than a cell phone which features 16 small cameras (at different focal lengths.) They claim it will provide large-camera quality, as well as a quality zoom range and resolution to match larger, heavier and more expensive cameras. If they don't, somebody will, and it will get better.
This creates the conundrum. While there will always be some shots only a big camera can do well, what happens as that list becomes smaller and smaller? How do you justify not just spending many thousands of dollars on the big camera, but more importantly the burden of carrying it around? What if only one shot a week needs that camera? What if only one shot a month needs it?
In particular, we'll also see a world where you can rent a big camera for that rare trip that needs it. It might be delivered in 30 minutes with one of our Starship Delivery Robots. As the need goes down, the quantity of large cameras made goes down, and that makes them more expensive rather than cheaper -- possibly even too costly to be worth manufacturing. There will be less variety of cameras and lenses made as demand drops.
The tripod -- another mainstay of quality photography also drops in value. Oh, there will be shots where you'll want it, but a growing list of times you won't, and it's heavy and expensive.
Worse, their competitor is the cell phone. Most people get a new cell phone every 2 years and there is a new cell phone generation every year. Cell phone people think like computer people, and they work at that pace of innovation. More and more gets left to software where the pace is even faster. Generally, anything that gets into the cell phone out-competes what's outside it. The volumes and market demand are high, and the push for innovation is immense.
Finally, even those who cling to their big cameras will still buy the smaller computational ones. They will do that partly because the big camera makers, not understanding the disruption, will be slower at adopting computational techniques. They will have them, but not in the same way unless they get very wise. Secondly, today, almost all serious big camera owners also own a point and shoot and a cell phone, and use them at different times. Already owning the new devices, the question of "should I carry the camera bag" will be affirmative less and less.
Interfaces and future phones
One commenter wondered about the fact that shooting with a phone isn't as nice as with a camera, which I agree with. The phone UI is different, but I think in time it can become superior. In addition, people will start selling "serious photographer grips" for phones which allow them to be held firmly, have buttons and dials for those who like them, and even contain a viewfinder. The fact that phones are general purpose computers with open platforms gives them a big advantage over today's cameras which are mostly closed systems. I would venture one could also design a nice grip for a phone which would fold down to be very small and light compared to any camera.
People already made add-on lenses for phones to offer telephoto, macro, wide angle and fish-eye for VR. Add on lenses usually mean poor optical quality, but that's not a necessary truth, and things could be even better if the phone cameras were designed to facilitate this. Then the serious photographer might find that the phone plus add-on tele reduces even further the shots the phone can't take, and that which demand the heavy bag.
Beyond extraction to invention
What I describe here is software that pulls out hidden image information by combining multiple images. We will also go farther than that, and have software that invents realistic additional detail using neural networks and other tools. Already some work is being one on neural network based image sharpening. A human analog again explains it. You can show an artist a low resolution image of a face, and they can produce a very realistic looking (and also decently realistic) high resolution image of that face, using the way their brain knows about how faces look. That an eyebrow that may have just been a few pixels is really made of individual hairs. In the future, even blurry images will be able to look sharp -- and satisfying to the photographers -- even if not fully accurate. As the AI gets better however, they will actually be decently accurate as well as look accurate. In particular, if the camera has access to high resolution images of the subjects of the photo (as it will for your friends and family) and even of the scenes behind them, it will be able to fill in real information adjusted to match the lighting of the shot you wanted.