A Skype Webcam Mother's Day Brunch
A brunch was planned for my mother's house on Sunday, but being 2,500 miles distant, I decided to try to attend by videoconference. Recently Skype has started supporting what it calls a "high quality" videoconference, which is 640x480 at 24 to 30 frames per second. At its base, that's a very good resolution, slightly better than broadcast TV.
This requires fairly modern hardware, which my mother doesn't have. It needs a dual-core processor to be able to compress the video in real time, and a decently fast processor to decompress it. It wants 384K of upstream bandwidth, but ideally even more, which in theory she has but not always. It demands Windows XP. And artificially it demands one of three of Logitech's newest and most expensive webcams, the Orbit AF or the Quickcam Pro for Notebooks or Pro 9000 for desktops. These are the same camera in 3 packages -- I took the Orbit AF which also includes a pan/tilt motor.
Skype's decision to only work with these 3 cameras presumably came from a large kickback from Logitech. Admittedly these are very nice webcams. They are true-HD webcams that can native capture at 1600x1200. They are sharp and better in low light than most webcams, and they come with a decent built in microphone that appears as a USB audio device -- also good. But they aren't the only cameras capable of a good 640x480 image, including many of Logitech's older high-end webcams. They retail for $100 or more, but via eBay sellers I got the Orbit AF for about $75 shipped and the Pro for Notebooks shipped quickly within Canada for $63. Some versions of Skype allow you to hack its config file to tell it to do 640x480 with other quality cameras. That is easy enough for me, but I felt it was not something to push on the relatives quite yet. On the Mac it's your only choice.
Testing on my own LAN, the image is indeed impressive when bandwidth is no object. It is indeed comparable to broadcast TV. That's 4 times the pixels and twice the framerate of former high-end video calls, and 16 times the pixels of what most people are used to. And the framerate is important for making the call look much more natural than older 10fps level calls. Across the open internet, it's not as good. I have 768kbits of upstream, but most people don't, and as a result it is blurry and the system has trouble sustaining 24fps. Skype tries to adjust quality as bandwidth comes and goes. Unfortunately, the main thing it does after initial tests is decrease quality during periods of lower bandwidth (which may be due to wireless packet loss or other apps on the network.) As such it will decide sometimes to drop you to 320x240 pixels, and I never saw it bump you back up -- you have to stop and restart the video, or even end and restart the call.
On Windows XP, getting started was pretty easy. On my sister-in-law's Vista based dual core notebook, however, it was a nightmare. Her installs crashed and put her machine in an unstable state. Once we learned that on Vista you must click "install as administrator" to avoid a confusing numeric error message, it still was unstable. (I am all in favour of the requirement to be administrator to install, I am not in favour of Vista-tested software not handling that issue more gracefully, nor of it making the machine totally unstable with blue screens of death.) In the end, after much frustration, we installed on a nephew's XP based laptop which worked but may not have had as much CPU available.
On my end I used both a Core Duo laptop (which seemed fine) and an E8400 Core 2 Duo (one of Intel's fastest desktop chips) and it seemed to have plenty of CPU. On Linux (with a Core 2 Duo E6600) the CPU is fine but Skype on linux is many versions behind and just got video, it refuses to do more than 320x240 at 15fps as far as I can tell.
Echo cancellation has become nothing short of superb, at least with one side on a speakerphone, particularly if you get some distance between the speakers and the mic. However, since headsets are still always better than speakerphone, on our end I set up dual headphones, and used the mic in the camera as a shared mic. It would improve things even better if it were easy to use two headsets with 2 microphones and mix their audio. You can do that in a couple of ways, none easy enough for me to spend the time:
- Have two mic inputs (for example, due to an extra USB mic input, or a USB headset) and a software mixer that will mix the two and feed it to Skype
- Have an amplified mic, and use the computer's "line in" with it, and "mic in" with the regular mic, and mix.
- Skype might include a mixer when it sees two audio sources itself to help support this. It is common for fancy conference speakerphones to mix multiple microphones. You also need to do some processing to remove the other person's voice from each mic, which you can do best with two channels.
However, for the large group, this is not practical even if it were possible, so speakerphone it is, and that generally worked great, though you can't get the truly full duplex sense-of-being-there audio that fancier tools can deliver.
Being able to go wireless on a laptop was very 21st century, in that we could pick up the laptop and go walking, for a tour around the house. On the other hand, they could take "me" from room to room as people moved. However, there is a risk with wireless of going through patches of poorer signal and even disconnect. Skype might want to have a "mobile wireless mode" that doesn't end calls just because of these sorts of troubles.
I didn't check to see if we were going through a supernode. We both had UPNP based NAT boxes, so I presume not. Next test I will do some deliberate NAT port forwarding to assure that no supernodes are used, since we did indeed get the quality kicked down a notch on a couple of occasions.
I never quite attained the "sitting at the table" dynamic. That's a personal thing we just need to get used to. There is a temptation to either be ignoring the guy on the screen, or feel you must talk to them all the time since this is a "call." Perhaps with time people can learn to have the remote participant be more like a local. Audio may be an important key to that, even more than video -- the ability to easily talk at the same time and have it all come through properly.
The children were completely fascinated with Logitech's "Avatar" and "Face accessories" tools in the camera driver, which find your mouth, nose, eyes and eyebrows, and let you add things to your face or animate a complete avatar. They kept wanting to see them again. It's an irony of course that some of the avatar stuff was developed for low-bandwidth conferencing, on the idea that the transmitter would distill what it saw down to very low-bandwidth motion capture info, and the avatar would be animated at the other end. However, in this case the avatar was just very compressible video.
In the kitchen, I had them connect me to the small HDTV. Being life-size did seem to make a difference in how people responded to me. Big screens also mean the camera can be put near the far-away screen to produce a lower angle between the camera and the face, reducing the distracting "you're not looking at me" aspect of all videoconferences. (Some fancy videoconferencing systems are trying to see if they can use two cameras to artificially re-map the eyes for proper eye contact. Or a way to have semi-transparent screens with sensors somehow behind them.)
Some of Logitech's older cameras are now very cheap on eBay. These have 1.3 megapixel resolution -- still HD -- so I am getting several and will experiment with the config hacks. Could probably make a simple program people could run that edits their Skype config for them, no knowledge of XML text editing needed.
Skype made its mark through having a very easy path to installing and getting running. Their next step should be to produce an all in one "Get a video call station going" which involves:
*The camera of course *A program that downloads the latest Skype and the latest video camera drivers, if needed, and installs it all with one click *Very minimal install questions. For example, perhaps a pre-created Skype ID for those who don't yet have one *Don't even bother with the other Skype config (microphone level checks) until later if you can. One click install. *When doing that, have Skype check all audio sources for you and figure which is best, then let you change it later. *After the mic is determined, play sound out the speakers and confirm it with the mics, no need to ask the user in most cases.
I'm interested in such a "drop and load" approach because another useful tool would be getting videocalls to people in nursing homes and hospitals with minimal fuss, as well as to grandparents living alone who aren't very computer savvy.
Another thing that might be useful is support for webcams which generate compressed video directly in hardware. They could generate both h.264 (for recording) and other packet-loss robust codecs for video calls. While the downside of this is you can't readily change or improve the hardware codecs, it would mean the video calls could take place without requiring much CPU for encoding -- and thus saving the CPU for decoding. Much older computers could do high quality calls.
I think this would be particularly good for the next level, which is going to 720 line HD. This will be practical as people start having a megabit or more of upstream, at least to some degree.
We should also start moving to widescreen webcams. Most screens today are going widescreen, and even 4:3 screens of high res can readily display the 1280 across of widescreen 720 line HD. A widescreen webcam is what you want for any group call. And for a single person call, just turn it on its side for a nice portrait-aspect view.