The failure of the pan-tilt camera in video calls
This year, we stayed with Kathryn's family for the holidays, so I attended dinner in my own mother's home via Skype. Once again, the technology was frustrating. And it need not be.
There were many things that can be better. For those of us who Skype regularly, we don't understand that there is still hassle for those not used to it. Setting up a good videoconferencing setup is still work. As I have found is always the case in a group-to-solos videoconference, the group folks do not care nearly as much about the conference as the remote solos, so a fundamental rule of design here is that if the remotes can do something, they should be the ones doing it, since they care the most. If there is to be UI, leave the UI to the remotes (who are sitting at computers and care) and not to the meeting room locals. Many systems get this exactly backwards -- they imagine the meeting room is the "master" and thus has the complex UI.
In this family setting, however, the clearest problem for me is that no camera can show the whole room. It's like sitting at the table unable to move your head, with blinders on. You can't really be part of the group. You also have to be away from the table so everybody there can see you, since screens are only visible over a limited viewing angle.
One clear answer to this is the pan/tilt camera, which is to say a webcam with servo motors that allow it to look around. This technology is very cheap -- you'll find pan/tilt IP security cameras online for $30 or less, and there are even some low priced Chinese made pan/tilt webcams out there -- I just picked another up for $20. I also have the Logitech Orbit AF. This was once a top of the line HD webcam, and still is very good, but Logitech no longer makes it. Logitech also makes the BCC950 -- a $200 conference room pan/tilt webcam which has extremely good HD quality and a built-in hardware compressor for 1080p video that is superb with Skype. We have one of these, and it advertises "remote control" but in fact all that means is there is an infrared remote the people in the room can use to steer the camera. In our meetings, nobody ever uses this remote for the reason I specify above -- the people in the room aren't the motivated ones.
This is compounded by the fact that the old method -- audio conference speakerphones -- have a reasonably well understood UI. Dial the conference bridge, enter a code, and let the remotes handle their own calling in. Anything more complex than that gets pushback -- no matter how much better it is.
Sadly, in spite of the Orbit AF being one of Skype's first officially blessed cameras, none of the consumer videoconferencing systems offers a means for a remote party to control the camera. The motorized functions are marketed for use in "face tracking" where the camera follows you as you move around the room -- a feature almost nobody turns on -- or for traditional "webcam" or security camera use where you have one-way video streaming. In many cases, there are only Windows drivers to control the pan/tilt function, and Mac users are left out. (Linux users of course have worked out how to do it but Linux is only sparsely used for videoconferencing.)
Getting some remote control going
To make this work you need something seamless, which means integrated with your video tool, be it Skype, Hangout, gotomeeting or any of the others. It needs to be something where the remotes, as part of their window, get a camera control tool. There's really little need to worry about contention between the different remotes -- they all have the same goal and will quickly appoint who is to steer the camera. A simple anti-contention tool that gives exclusive control for a few seconds after a press would stop any fights steering it back and forth.
There are some solutions, but they are too cumbersome for regular use. One small developer has a program called Telerobo for the Logitech Orbit. This program is a Windows binary, and requires the conference computer be on Windows and the single remote operator's computer also run Windows. The former requirement makes some sense as there are no official drivers other than for Windows. The program would be more useful if it used a web control rather than a remote binary client -- in my environment I am often on video calls where none of the remotes run Windows.
For the lovely BCC950 camera, they don't even provide much in the way of drivers for Windows, though another coder built a cloud remote for the BCC950 but it's just a hack, not a finished product, and is again Windows only.
It's a kludge, but another way you can sometimes get remote control is to bring up the pan/tilt/zoom control on the local PC, and then share that computer's screen with one of the remotes using a tool like VNC. Then the remote user can click onto the conference room PC and steer the camera. Obviously this is non-trivial to set up and maintain. Some of the cameras, meant for use as streaming security cams, also offer internal web interfaces that can, with work, be exported to an outsider (especially over a VPN) allowing remote steer. Of course the linux drivers for the Orbit AF have this ability, but again it's all more setup than you can expect from casual conference room users.
A final possible kludge is to use an "infrared over IP" box (used for home automation) to permit remote control of a camera like the BCC950 with its IR remote control.
Nothing approaches the interface that should be, which is plug and play. There should be a standard API on all the OSs for pan/tilt/zoom and all the conference tools should support it. The remote users should get a new control on their screens when in calls to such devices. For security reasons, you may want to have the people with the PTZ camera confirm permission to steer the camera, either permanently (in a conference room) or case-by-case in your home. (I will admit to having done video conferences with a nice shirt and just my underwear when at home, and would not want people to aim the camera down without permission. Admit it, you've done this too.)
Of course there are high end (multi-thousand dollar) video room setups that have nice expensive pan/tilt cameras, though even those see rare use. One nice one (that I am sure cost a fortune) for classrooms has a microphone and button at every desk in a classroom. Push the button and your audio is live and the camera zooms on you. But I'm not talking about what you can do at the high end.
Another interesting approach is the use of multiple cameras in the meeting room. I have done this a few times with tools like Hangout and premium Skype which allow multi-way video calls. The typical meeting room is loaded with cameras that can join the conference -- there is one in every laptop, and one in most smartphones too. You must of course assure that only one device in the room does audio. (It would be a nice trick if the conference tools noticed two devices in the same room based on audio echo and identical source LAN and auto-muted newcomers. Or better still, regularly switch the microphone audio to whichever device is hearing the speaker best.)
You can use multi-camera in several ways. Most simply, you can just make sure you now cover the whole room, and let the remotes switch which camera they want to watch. It can also be very nice to combine a steered/zooming camera with a wide-view camera. The wide-view camera, mounted high on the wall, gives a sense of the whole room, and then the steered camera can do a close-up of whoever is speaking.
Better, but hard to do is to have all the laptops in the room joined with the conference, but not transmitting. You then would like to set it up so that anybody in the room who is in front of a laptop can click to become the video and audio sender (or one of the video senders -- the wide view should still be available.) With permission, it would even be good if one of the remotes could do this switching. Automatic switching is not super practical because people might be sitting next to a laptop but not in its camera view.
Yet another multi-camera appraoch would be to build conference room webcams with a panoramic view, simply by mounting 3 webcams together. This could be done manually -- webcams are very cheap -- or eventually vendors could sell a camera that has 3 or more cameras in it. While a seamless blend would be sweet, it's not really needed. The simple ability to switch quickly between the cameras by the remote viewer would be nice, and faster than using the pan motor. Of course, if bandwidth is available, it would be nice to just see all the cameras and get the wide view. A lower-bandwidth alternative would be to send the selected camera at full bandwidth and send very blurry "peripheral vision" quality video from the other cameras. Here, if a camera were fixed, it could use stereo audio to identify where the speaker is and switch to the best camera automatically -- but you can still let the remotes turn that off or supersede it if it's not working well.
I have 4 different friends who have started telepresence robot companies, all at very different price points. If I have 4 friends who have done this, expect a huge raft of companies to be out with products this year. These are the ultimate expression of giving control to the solo remote, who now can steer not just their view, but also their screen, and even move around in flat buildings. The better ones are pretty nice experiences, though it's still an open question of whether the world will embrace this.
Robots are expensive, but the hardware for Pan/Tilt or multi-camera is cheap and available. The failure has been in getting the programming in order and doing the user interface right. Is it that people just don't care about this, or are we just waiting for the iPhone moment, when somebody makes it seamless enough that people realize they wanted it?