You are here


How to do a low bandwidth, retinal resolution video call

Not everybody loves video calls, but there are times when they are great. I like them with family, and I try to insist on them when negotiating, because body language is important. So I've watched as we've increased the quality and ease of use.

The ultimate goals would be "retinal" resolution -- where the resolution surpasses your eye -- along with high dynamic range, stereo, light field, telepresence mobility and VR/AR with headset image removal. Eventually we'll be able to make a video call or telepresence experience so good it's a little hard to tell from actually being there. This will affect how much we fly for business meetings, travel inside towns, life for bedridden and low mobility people and more.

Here's a proposal for how to provide that very high or retinal resolution without needing hundreds of megabits of high quality bandwidth.

Many people have observed that the human eye is high resolution on in the center of attention, known as the fovea centralis. If you make a display that's sharp where a person is looking, and blurry out at the edges, the eye won't notice -- until of course it quickly moves to another section of the image and the brain will show you the tunnel vision.

Decades ago, people designing flight simulators combined "gaze tracking," where you spot in real time where a person is looking with the foveal concept so that the simulator only rendered the scene in high resolution where the pilot's eyes were. In those days in particular, rendering a whole immersive scene at high resolution wasn't possible. Even today it's a bit expensive. The trick is you have to be fast -- when the eye darts to a new location, you have to render it at high-res within milliseconds, or we notice. Of course, to an outside viewer, such a system looks crazy, and with today's technology, it's still challenging to make it work.

With a video call, it's even more challenging. If a person moves their eyes (or in AR/VR their head) and you need to get a high resolution stream of the new point of attention, it can take a long time -- perhaps hundreds of milliseconds -- to send that signal to the remote camera, have it adjust the feed, and then get that new feed back to you. There is no way the user will not see their new target as blurry for way too long. While it would still be workable, it will not be comfortable or seem real. For VR video conferencing it's even an issue for people turning their head. For now, to get a high resolution remote VR experience would require sending probably a half-sphere of full resolution video. The delay is probably tolerable if the person wants to turn their head enough to look behind them.

One opposite approach being taken for low bandwidth video is the use of "avatars" -- animated cartoons of the other speaker which are driven by motion capture on the other end. You've seen characters in movies like Sméagol, the blue Na'vi of the movie Avatar and perhaps the young Jeff Bridges (acted by old Jeff Bridges) in Tron: Legacy. Cartoon avatars are preferred because of what we call the Uncanny Valley -- people notice flaws in attempts at total realism and just ignore them in cartoonish renderings. But we are now able to do moderately decent realistic renderings, and this is slowly improving.

My thought is to combine foveal video with animated avatars for brief moments after saccades and then gently blend them towards the true image when it arrives. Here's how.

  1. The remote camera will send video with increasing resolution towards the foveal attention point. It will also be scanning the entire scene and making a capture of all motion of the face and body, probably with the use of 3D scanning techniques like time-of-flight or structured light. It will also be, in background bandwidth, updating the static model of the people in the scene and the room.
  2. Upon a saccade, the viewer's display will immediately (within milliseconds) combine the blurry image of the new target with the motion capture data, along with the face model data received, and render a generated view of the new target. It will transmit the new target to the remote.
  3. The remote, when receiving the new target, will now switch the primary video stream to a foveal density video of it.
  4. When the new video stream starts arriving, the viewer's display will attempt to blend them, creating a plausible transition between the rendered scene and the real scene, gradually correcting any differences between them until the video is 100% real
  5. In addition, both systems will be making predictions about what the likely target of next attention is. We tend to focus our eyes on certain places, notably the mouth and eyes, so there are some places that are more likely to be looked at next. Some portion of the spare bandwidth would be allocated to also sending those at higher resolution -- either full resolution if possible, or with better resolution to improve the quality of the animated rendering.

The animated rendering will, today, both be slightly wrong, and also suffer from the uncanny valley problem. My hope is that if this is short lived enough, it will be less noticeable, or not be that bothersome. It will be possible to trade off how long it takes to blend the generated video over to the real video. The longer you take, the less jarring any error correction will be, but the longer the image is "uncanny."

While there are 100 million photoreceptors in the whole eye, but only about a million nerve fibers going out. It would still be expensive to deliver this full resolution in the attention spot and most likely next spots, but it's much less bandwidth than sending the whole scene. Even if full resolution is not delivered, much better resolution can be offered.

Stereo and simulated 3D

You can also do this in stereo to provide 3D. Another interesting approach was done at CMU called pseudo 3D. I recommend you check out the video. This system captures the background and moves the flat head against it as the viewer moves their head. The result looks surprisingly good.

Digitizing your papers, literally, for the future, with 4K video

I have so much paper that I've been on a slow quest to scan things. So I have high speed scanners and other tools, but it remains a great deal of work to get it done, especially reliably enough that you would throw away the scanned papers. I have done around 10 posts on digitizing and gathered them under that tag.

Recently, I was asked by a friend who could not figure out what to do with the papers of a deceased parent. Scanning them on your own or in scanning shops is time consuming and expensive, so a new thought came to me.

Set up a scanning table by mounting a camera that shoots 4K video looking down on the table. I have tripods that have an arm that extends out but there are many ways to mount it. Light the table brightly, and bring your papers. Then start the 4K video and start slapping the pages down (or pulling them off) as fast as you can.

There is no software today that can turn that video into a well scanned document. But there will be. Truth is, we could write it today, but nobody has. If you scan this way, you're making the bet that somebody will. Even if nobody does, you can still go into the video and find any page and pull it out by hand, it will just be a lot of work, and you would only do this for single pages, not for whole documents. You are literally saving the document "for the future" because you are depending on future technology to easily extract it.


Car NAS for semi-offsite backup

Everybody should have off-site backup of their files. For most people, the biggest threat is fire, but here in California, the most likely disaster you will encounter is an earthquake. Only a small fraction of houses will burn down, but everybody will experience the big earthquake that is sure to come in the next few decades. Of course, fortunately only a modest number of houses will collapse, but many computers will be knocked off desks or have things fall on them.

To deal with this, I've been keeping a copy of my data in my car -- encrypted of course. I park in my driveway, so nothing will fall on the car in a quake, and only a very large fire would have risk of spreading to the car, though it's certainly possible.

The two other options are network backup and truly remote backup. Network backup is great, but doesn't work for people who have many terabytes of storage. I came back from my latest trip with 300gb of new photos, and that would take a very long time to upload if I wanted network storage. In addition, many TB of network storage is somewhat expensive. Truly remote storage is great, but the logistics of visiting it regularly, bringing back disks for update and then taking them back again is too much for household and small business backup. In fact, even being diligent about going down to the car to get out the disk and update is difficult.

A possible answer -- a wireless backup box stored in the car. Today, there are many low-cost linux based NAS boxes and they mostly run on 12 volts. So you could easily make a box that goes into the car, plugs into power (many cars now have 12v jacks in the trunk or other access to that power) and wakes up every so often to see if it is on the home wifi, and triggers a backup sync, ideally in the night.

Where's my fast, smart, overhead scanner?

Back in 2008, I proposed the idea of a scanner club which would share high-end scanning equipment to rid of houses of the glut of paper. It's a harder problem than it sounds. I bought a high-end Fujitsu office scanner (original price $5K, but I paid a lot less) and it's done some things for me, but it's still way too hard to use on general scanning problems.


Cranes, and rooftops, should be decorated

Look at the skyline of any growing city, and what do you see, but a sea of construction cranes. The theory is that each crane will go away and be replaced by an architectually interesting or pleasing building, but the cycle continues and there are always cranes.

ESticks -- a standardized quick-swap battery proposal

You've probably noticed that with many of our portable devices, especially phones and tablets, a large fraction of the size and weight are the battery. Battery technology keeps improving, and costs go down, and there are dreams of fancy new chemistries and even ultracapacitors, but this has become a dominant issue.

Every device seems to have a different battery. Industrial designers work very hard on the design of their devices, and they don't want to be constrained by having to standardize the battery space. In many devices, they are even giving up the replaceable battery in the interests of good design. The existing standard battery sizes, such as the AA, AAA and even the AAAA and other less common sizes are just not suitable for a lot of our devices, and while cylindrical form factors make the most sense for many cell designs they don't fit well in the design of small devices.

So what's holding back a new generation of standardization in batteries? Is it the factors named above, the fact that tech is changing rapidly, or something else?

I would propose a small, thin modular battery that I would call the EStick, for energy stick. The smaller EStick sizes would be thin enough for cell phones. The goal would be to have more than one b-stick, or at least more than one battery in a typical device. Because of the packaging and connections, that would mean a modest reduction in battery capacity -- normally a horrible idea -- but some of the advantages might make it worth it.

Quick swap

There are several reasons to have multiple sticks or batteries in a device. In particular, you want the ability to quickly and easily swap at least one stick while the device is still operating, though it might switch to a lower power mode during the swap. The stick slot would have a spring loaded snap, as is common in many devices like cameras, though there may be desire for a door in addition.

Swapping presents the issue that not all the cells are at the same charge level and voltage. This is generally a bad thing, but modern voltage control electronics has reached the level where this should be possible with smaller and smaller electronics. It is possible with some devices to simply use one stick at a time, as long as that provides enough current. This uses up the battery lifetime faster, and means less capacity, but is simpler.

The quick hot swap offers the potential for indefinite battery life. In particular, it means that very small devices, such as wearable computers (watches, glasses and the like) could run a long time. They might run only 3-4 hours on a single stick, but a user could keep a supply of sticks in a pocket or bag to get arbitrary lifetime. Tiny devices that nobody would ever use because "that would only last 2 hours" could become practical.

While 2 or more sticks would be best for swap, a single stick and an internal battery or capacitor, combined with a sleep mode that can survive for 20-30 seconds without a battery could be OK.

Meter to show speakers when they are losing the audience

Any speaker or lecturer is familiar with a modern phenomenon. A large fraction of your audience is using their tablet, phone or laptop doing email or surfing the web rather than paying attention to you. Some of them are taking notes, but it's a minority. And it seems we're not going to stop this, even speakers do it when attending the talks of others.

A foveal digital camera sensor

Earlier I wrote about desires for the next generation of DSLR camera and a number of readers wrote back that they wanted to be able to swap the sensor in their camera, most notably so they could put in a B&W sensor with no colour filter mask on it. This would give you better B&W photos and triple your light gathering ability, though for now only astronomers are keen enough on this to justify filterless cameras.

Where will 3-D cameras like Kinect lead?

This year, I bought Microsoft Kinect cameras for the nephews and niece. At first they will mostly play energetic X-box games with them but my hope is they will start to play with the things coming from the Kinect hacking community -- the videos of the top hacks are quite interesting. At first, MS wanted to lock down the Kinect and threaten the open source developers who reverse engineered the protocol and released drivers. Now Microsoft has official open drivers.

Building a house organizing robot with image search

There are many fields that people expect robotics to change in the consumer space. I write regularly about transportation, and many feel that robots to assist the elderly will be the other big field. The first successful consumer robot (outside of entertainment) was the Roomba, a house cleaning robot. So I've often wondered about how far we are from a robot that can tidy up the house. People got excited with a PR2 robot was programmed to fold towels.

This is a hard problem because it seems such a robot needs to do general object recognition and manipulation, something we're pretty far from doing. Special purpose household chore robots, like the Roomba, might appear first. (A gutter cleaner is already on the market.)

Recently I was pondering what we might do with a robot that is able to pick up objects gently, but isn't that good at recognizing them. Such a robot might not identify the objects, but it could photograph them, and put them in bins. The members of the household could then go to their computers and see a visual catalog of all the things that have been put away, and an indicator of where it was put. This would make it easy to find objects.

The catalog could trivially be sorted by when the items were put away, which might well make it easy to browse for something put away recently. But the fact that we can't do general object recognition does not mean we can't do a lot of useful things with photographs and sensor readings (including precise weight and other factors) beyond that. One could certainly search by colour, by general size and shape, and by weight and other characteristics like rigidity. The item could be photographed in a 360 view by being spun on a table or in the grasping arm, or which a rotating camera. It could also be laser-scanned or 3D photographed with new cheap 3D camera techniques.

When looking for a specific object, one could find it by drawing a sketch of the object -- software is already able to find photos that are similar to a sketch. But more is possible. Typing in the name of what you're looking for could bring up the results of a web image search on that string, and you could find a photo of a similar object, and then ask the object search engine to find photos of objects that are similar. While ideally the object was photographed from all angles, there are already many comparison algorithms that survive scaling and rotation to match up objects.

The result would be a fairly workable search engine for the objects of your life that were picked up by the robot. I suspect that you could quickly find your item and learn just exactly where it was.

Certain types of objects could be recognized by the robot, such as books, papers and magazines. For those, bar-codes could be read, or printing could be scanned with OCR. Books might be shelved at random in the library but be easily found. Papers might be hard to manipulate but could at least be stacked, possibly with small divider sheets inserted between them with numbers on them, so that you could look for the top page of any collected group of papers and be told, "it's under divider 20 in the stack of papers."

Video windows that simulate 3-D

I'm waiting for the right price point on a good >24" monitor with a narrow bezel to drop low enough that I can buy 4 or 5 of them to make a panoramic display wall without the gaps being too large.

Negative copier for digital camera

As digital cameras have developed enough resolution to work as scanners, such as in the scanning table proposal I wrote about earlier, some people are also using them to digitize slides. You can purchase what is called a "slide copier" which is just a simple lens and holder which goes in front of the camera to take pictures of slides. These have existed for a long time as they were used to duplicate slides in film days.


Scanning table for old digital cameras

I have several sheetfed scanners. They are great in many ways -- though not nearly as automatic as they could be -- but they are expensive and have their limitations when it comes to real-world documents, which are often not in pristine shape.


RV daisy chain power grid

After every RV trip (I'm back from Burning Man) I think of more I want RVs to do. This year, as we have for many years, we built a power distribution system with a master generator rather than having each RV run its own noisy, smelly and inefficient generator. However, this is expensive and a lot of work for a small group, it is cheap and a lot of work for a larger group.

There's been a revolution in small generator design of late thanks to the declining cost of inverters and other power conversion. A modern quality generator feeds the output of its windings to circuits to step up and step down the voltage to produce the required power. The output power is cleaner and more stable, and the generator is spun at different RPMs based on the power load, making it quieter and more efficient. With many models, you can also combine the internal output of two generators to produce a higher power generator.

RVs have come with expensive old-style generators that are quieter than cheap ones, and which produce better power, but today they are moving to inverter generators. With an inverter generator, it's also possible to draw on the RV batteries for power surges (such as starting an AC or microwave) beyond what the generator can do.

I'm interested in the potential for smarter power, so what I would like to see is a way for a group of RVs with new generation power systems to plug together. In this way, they could all make use of the power in the other vehicles, and in most cases only a fraction of the generators would need to be running to provide power to all. (For example, at night, only one generator could power a whole cluster. In the day, with ACs running, several would need to run, but it would be very unlikely to have to run all, or even 75% of them.)

RV water tank should have UV disinfector

RVs all have a fresh water tank. When you rent one, they will often tell you not to drink that water. That's because the tanks are being filled up in all sorts of random places, out of the control of the rental company, and while it's probably safe, they don't want to promise it, nor disinfect the tank every rental.

I recently got a small "pen" which you put in a cup of water and it shines a UV light for 30 seconds to kill any nasties in the water. While I have not tried to test it on infected water, I presume that it works.

Use the battery to power AC startup surge in an RV

Many RVs come with generators, and the air conditioner is the item that demands it be a high power generator. The Generator needs to be big enough to run the AC, and in theory let you do other things like microwave when you run it. It also has to be big enough to handle the surge that the AC motor takes when the AC starts up.


Electronic panorama head with rotation sensor

In my quest for the idea panorama head, I have recently written up some design notes and reviews. I found that the automatic head I tried, the beta version of the Gigapan turned out to be too slow for my tastes. I can shoot by hand much more quickly.

The Kitchen of the Future

In the early days of microprocessors, people selling home computers tried to come up with reasons to have them in the home. The real reason you got one was hobby computing, but the companies wanted to push other purposes. A famous one was use in the kitchen. The computer could story your recipe file, and wonder of wonders, could change the amounts of the ingredients based on how many servings you wanted to make.

This never caught on, but computers have come a long way. But still, I mostly see nonsense applications promoted. For example, boosters of RFID tell us that our fridges will be able to track when things went in the fridge, and when it's time to buy more milk. We should give up huge amounts of privacy to figure out when to order more milk?

With that track record, I should stay away from the area, but let me propose some interesting approaches in the kitchen.

The cooking area should have a screen, of course. Screens are already in the kitchen to watch TV. While you could (and would) put digital recipes up on the screen, I imagine going further, and having TV cooking shows, where you watch a chef prepare a dish. You would be able to pause, rewind and do everything that digital video does, but the show would also come along with encoded instructions tagged to points in the video. When the recipe calls for cooking for 5 minutes, the computer would start appropriate timers.

The computer should have a speech interface, and a good one, allowing you to call out for timers, and to name ingredients and temperatures. More on that later.

The first thing I would like to see is smart, digital wireless scales in a lot of places. A general one on the counter of course, but quite possibly also built into the rack above the burner which holds the pot. You can get scales built into spoons and scoops now, and they could be bluetooth.


4-segment tripod where bottom segment screws in

I have tripods with both 3 segments and 4 segments. A 4-segment tripod has 3 clamps per leg, which means 9 of them to open and close in extending and collapsing the tripod. That's a pain. Enough of one that you sometimes find yourself asking whether a shot is worth setting up the tripod. But even 3 segment tripods are only a bit better.

Extensible sockets for wrench set

Ok, this is something I have to believe somebody else has thought of, but I haven't seen it, so I thought I would ask readers if they have, and if not, to put it forward.

Everybody has a socket wrench set. The wrench heads tend to come with a square hole in the top, typically 1/2" or 3/8" square, into which the square drive from the ratchet inserts. There are sometimes spring-locks to keep it in place.



Subscribe to RSS - Inventions