I’ve been interested in videoconferencing for some time, both what it works well at, and what it doesn’t do well. Of late, many have believed that quality makes a big difference, and HD systems, such as very expensive ones from Cisco, have been selling on that notion.
A couple of years ago Skype added what they call HQ calling — 640 x 480 at up to 30fps. That’s the resolution of standard broadcast TV, though due to heavy compression it never looks quite that good. But it is good and is well worth it, especially at Skype’s price: free, though you are well advised to get a higher end webcam, which they initially insisted on.
So there was some excitement about the new round of 720p HD webcams that are coming out this year, with support for them in Skype, though only on the Windows version. This new generation of cams has video compression hardware in the webcam. Real time compression of 1280x720 video requires a lot of CPU, so this is a very good idea. In theory almost any PC can send HD from such a webcam with minimal CPU usage. Even the “HQ” 640x480 line video requires a fair bit of CPU, and initially Skype insisted on a dual core system if you wanted to send it. Receiving 720p takes far less CPU, but still enough that Skype refuses to do it on slower computers, such as a 1.6ghz Atom netbook. Such netbooks are able to play stored 720p videos, but Skype is judging them as unsuitable for playing this. On the other hand, modern video chips (Such as all Nvidia 8xxx and above) contain hardware for decoding H.264 video and can play this form of video readily, but Skype does not support that.
The other problem is bandwidth. 720p takes a lot of it, especially when it must be generated in real time. Skype says that you need 1.2 megabits for HD, and in fact you are much better off if you have 2 or more. On a LAN, it will use about 2.5 megabits. Unfortunately, most DSL customers don’t have a megabit of upstream and can’t get it. In the 90s, ISPs and telcos decided that most people would download far more than they uploaded, and designed DSL to have limited upload in order to get more download. The latest cable systems using DOCSIS 3 are also asymmetric but offer as much as 10 megabits if you pay for it, and 2 megabits upstream to the base customers. HD video calling may push more people into cable as their ISP.
The Freetalk Everyman HD camera
The Freetalk brand, made by a company call In Store Solutions, is the Skype house brand. The Everyman HD works only with Skype for Windows, there are not even any drivers with it to work with other software that might want to use the compression hardware. The Everyman HD was offered at the surprisingly low price of $50 (now $70) and later this year a “conference room” version of the camera, featuring a wider angle and 4 microphones is coming out at $140. FaceVsion has the N1, a conference style phone available for about $120 that I have not tried.
The Everyman HD shows a lot of promise, but there is a long way to go. Because it has no OS drivers, setup is as trivial as it can be for a webcam. Plug it in. Skype sees it and is ready to go.
The system (Skype and camera) is very demanding. In trying different systems I found I could only receive video on a fairly fast (Core 2 Duo 8400) box. A Thinkpad with a 2ghz Core Duo T2500 would receive the HD video, but it would not work properly, spending almost all the time frozen. This is ahead of the minimum spec, so it may be a bug. The software would not attempt to send HD to an Atom Netbook (or to other OSs like Mac and Linux which need a newer Skype.)
For now technical reason I can fathom, it was also picky on the sending side. For example, even though it takes less CPU to send the in-camera-compressed video, the Atom netbook was never considered able to send it. Skype’s specifications say this but it makes little sense. While the computer may not be able to receive the video, there is no reason to block it from sending.
As it did with HQ, Skype begins each call at a lower resolution. It then starts bumping the bandwidth to see if you have over a megabit. Once it discovers that, it will attempt the switch to HD. This seems to take it around a minute. When it makes the switch, the video changes from 4:3 to 16:9 widescreen, which the sender can see in their own “my video” window. On most receiving computers, Skype has so much stuff in their windows now that if you have not made the video full-screen, you won’t see more resolution. The new Skype 5 seems to have only two sizes for video — small, zoomed and fullscreen. The older version would resize to your window and had an “actual size” mode.
How good is it? Well, good, but a bit disappointing. Better than 640x480, but nothing like watching 720p from a decent camcorder. The fault of this may lie with the webcam lens itself. I’ve found that webcams tend to have pretty poor lenses. They are not particularly sharp, and they often have a non-flat field, so that one side of the field may be sharper than the others. In almost all cases the center is quite a bit sharper than the corners, as is typical of cheap lenses. The lens on my Everyman HD is not particularly bad but it’s not that good either. In addition, the compression is, I suspect, deliberately blurring the image to be able to make due with less bandwidth, as it does with the HQ mode. I found that my face would never quite be sharp — though it seemed to do better when the autofocus went further back.
The camera is autofocus, which is good, but has issues. If you move so that your head is not in the center of the now wider field, the camera will autofocus on the background. It also hunts, which is disconcerting though you can lock it with a bit of tweaking.
With 2.5 megabits on the LAN, however, there was a clear difference and one that is worthwhile. It just was not what you would expect based on your 720P experiences on camcorders. For example, in a conference room, trying to read text on the whiteboards, I found I could make it out slightly better in 720p than in the HQ-640 mode, but only slightly.
Widescreen in videoconferencing
This cameras has a moderately narrow field of view. The way most people sit at their monitor, their head will fill the height of the field. With this camera, it’s better to sit back a bit from the computer for a number of reasons. When you have more resolution, it is sometimes best to use it to give the other person more context rather than increasing the detail on your face. Let them see you as more than a talking head.
While widescreen makes sense for movies though, it’s not the right aspect ratio for 1 on 1 video calls, I think. In some ways I think a 4:3 960x720 might have done just as well and taken a bit less bandwidth. Or it might even make sense for some people to use a tall widescreen, to show a person’s head and upper body. (For that you would want a different mount on the camera and of course software designed to handle it.)
On the other hand, widescreen is just what you want if there are two people, or a conference table with several people. I tried the Everyman HD at the end of a conference table and it isn’t quite a wide enough camera to do that properly if you have more than a few in the meeting. The more expensive conference version presumably will do that.
As noted, the system only goes HD after about a minute of seeing you have a megabit. Unfortunately, when it encounters a problem — such as might occur if you or somebody else starts making heavy use of your connection with a download, or a brief wireless noise period — it will decide to drop the connection back down to lower resolution. In my tests, we were unable to get it to restore to HD, even when bandwidth was plentiful. Stopping the video and restarting it would restore to HD after the minute delay.
I received a report from Jim Courtney that he has seen it restore to HD but it seems hard to do, and the system is too conservative about it.
I would recommend the software keep a database of bandwidth histories for pairs of IP networks. As it learns the typical bandwidth between you and another endpoint, and if it learns it is high, it should be more aggressive at starting HD or going back to it. Indeed, for very reliable connections it should start in HD.
What does videoconferencing need?
High resolution is indeed a good boon for videoconferencing, especially if you want to conference with several people in the same room. Then a wider angle with widescreen is a big plus.
I’m also a big proponent of something Skype and most other systems have never done, which is having a pan motor on the camera and giving the other person control. My Logitech Orbit has such a pan motor, but nobody has ever put in support in videoconferencing. (On linux there is software to put up a little web page that lets another party control it.) No matter what your resolution, the ability to “turn your head” is a big part of telepresence. When you are in a 1 on 1 conference, it is not an issue, as the sole participant makes sure they are readily visible in the view. But in a 1 to many experience, from boardroom to external caller, the camera is not near anybody and stays fixed. Only the caller really cares.
So even with HD I would like to see more cameras like the Orbit with a little rotation servo, and I would like to see support in Skype and other videocall software for moving it. Tilt is also nice but pan is the main thing you want. New experiments with telepresence robots have shown that their ability to both pan their camera but also move their “head” around to address people makes a big difference in how real the experience feels.
Some other factors remain important in such conferencing. Good audio remains vital, and in particular full duplex audio. Skype has always made good audio a focus, and their echo cancellation is very good but it’s still not full duplex. It’s why I recommend that in spite of the good echo cancellation, single callers still use a headset or headphones. Though this is not so easy in a conference room (though I have sometimes used a headphone splitter for a 2 person call) and alas that is where you need it most, as you can’t have the true dynamic of a quick back and forth conversation by speakerphone. I will be very curious to see how Skype does with the new 4 microphone “beam” cameras.
Latency is the other issue. Sadly, since video compression takes time, it affects latency. This is another reason why hardware compression and a fat pipe will be necessary. Fortunately the hardware compression is getting cheaper and the fat pipes more common.
It would be really nice if these cameras could still do hardware compression in 640x480 mode. After all, even though most CPUs can do that, it’s better if the camera does. Though it’s possible the hardware chips can’t get the bandwidth down where it should be.
I like the mount on the Freetalk, which seems able to sit or clamp with CRTs, desktop panel monitors and thin laptop screens.
Sadly, Skype’s new 5 person video conferencing does not support the HD, or even HQ video. I have yet to try this, but as I said above, I have often found that a common situation is to have a meeting room with several people, and a number of solo people calling in. You don’t need much resolution for the talking heads calling in — even 160x120 can be enough for them, and 320x240 is certainly fine. But for the boardroom, having 320x240 means the heads become closer to blobs. I want multi-party conferencing to be able to designate a “main” party, which would be either a meeting room or the principle speaker, and try to give as much resolution and bandwidth to the video from that party as is available, at the expense of the others — even providing HD. For extra credit, the system could note when one person has been speaking for a while and switch them to the featured position.
Naturally I would like to see this on Linux and Mac quickly too.