Google Meet and others up the video meeting game, what's next?
New in Meet
Recently, Google showed off some new features for Google Meet. The key new feature, with the odd name of "companion mode" addresses a major problem of meetings which have a central meeting room with multiple people, and a variety of people outside "calling in."
Such meetings, the most common use of corporate video meeting tools, have always treated the remotes as 2nd class members of the meeting, almost as much as audio conference systems have done so. I have always railed against that, because today, every person in the meeting room tends to have a phone or laptop, which include a microphone and camera, right in front of them. But instead there is usually a meeting room system with speaker-phone and big screen monitor and camera showing the whole room.
That sucks. The audio always has issues. The view of the room shows small people and you often can't see some of them except in rare systems where remotes can point the camera. But if you have people in the meeting room join the conference on their own device, it doesn't work because the tools don't understand this, and the audio will fail entirely with microphones and speakers in the same room.
Companion mode presumably solves this, obviously not sending your audio to anybody in the same room as you, and possibly doing some smart audio filtering to only send the audio from the microphone closest to you. (Or maybe it just mutes all personal audio and relies only on the main system, something you could do before if you worked at it -- but ideally it uses the microphones in the laptops for best quality, and even encourages headsets.)
Done right, this makes the participants more equal, though of course those in the physical room can communicate non-verbally and get a better feeling of what is going on in that room.
Oddly, we haven't needed this function for the past year, but we'll need it again soon.
Raft of new functions
It's coming late, but the pandemic year has generated a lot of useful new functions in meeting tools. As yet, they are not all in one tool, but in time they should be.
AI based noise cancel
Google's paid version of Meet now has AI based noise removal. It's impressive. You can be on speaker-phone and it will remove all sounds of keyboards, background noises, dogs barking, airplanes flying by and people in other rooms. That's good, though in a way it could be bad because headsets are always superior, not just for background noise and sound quality, but for allowing truly full duplex communication which makes for much more natural dialogue in a group of 3 or more. With groups, you often get situations where two people want to both jump in for the next sentence, and they step over each other due to latency and echo cancellation. Can't fix the latency but can fix the rest.
Most tools give you two options -- make the speaker (or rarely speakers) fill up the screen, or show a gallery of all attendees the same size. Neither is right. You want to be able to highlight speakers and slides, but still have a sense of the audience, and the speakers themselves definitely should see the audience. Many tools will have a mode where the speaker or slides take up most of the screen, and a few of the audience are visible in a bar along the top or side. That's better. Meet seems to now offer a mode to do this. However, it does not yet do the superior step of merging speaker and slides (Zoom and Teams can do this) or of merging the audience (which Teams now does.)
Use background replacement for other than having fun
Zoom popularized changing your background, and while many enjoy having fun and not having to show the others what their room is like, the truth is the AI based tools tend to wobble and go very strange from time to time, making them distracting. A real greenscreen makes a good virtual background but only a few have that. Simply blurring the background is better and not so distracting. Blurring plus putting in a logo would probably suffice for most.
But there is much more that can be done, and the most obvious mode is to merge speakers and slides, as is done in Zoom advanced screen share and Teams. Both of these only will work with PowerPoint, but ideally they should let you use any window or screen of your computer as the background. Because they shrink the speaker into the corner, the flaws in virtual greenscreen are not nearly as distracting.
Only Teams has moved to step two, using this technology to help display the audience by packing them together to look like they are in a classroom or other space. You can get more people in less screen real estate with less distracting, letting people get a sense of the room which is vital to good dynamic.
Use portrait mode for must group conferences
Not yet done is understanding that while 1:1 conferencing should use landscape mode, any time you are displaying multiple people it makes sense to use portrait mode.
Being able to detect where the person is allows easy cropping of the image for display in landscape mode, both for displays of the group and particularly if displaying a panel of 2-3 speakers, or speakers not merged with slides. Every TV producer showing 2-3 panelists uses portrait mode -- it's the way to go. (In some cases you would keep a person in landscape mode, for example if there are multiple people sharing the device, or if they are showing other than just a talking head or perhaps even click a button to say they can't be portrait. Users should see on their own screen the portrait crop box on their own view so they are clear what is happening.) If a portrait ratio of 7.5:9 is used, you can perfectly display two portrait videos in the space of one landscape, and thus mix and match.
The funny thing is cell phone users often show up in portrait mode, and it's mostly annoying for people on computers seeing them in landscape with pillar-bars on the side. Instead, this should be turned to advantage. People not sending video who just show an icon or name should definitely be made portrait, or in the case of just a name, tiny in their own section.
Simulated 3D by having background rendered by the viewer
When people put in a virtual background, it's usually done by the sender. Instead, it should be done by the viewer. That is to say, the sender would isolate the person from the background and send only the video of that, and in addition send the still or video for the background. Then the receiver would composite the person on the offered background or a different one.
An interesting trick worthy of exploration would be to make simulated 3D by having the foreground person move relative to the background as the viewer moves their head, which is easily tracked by the viewer's camera. This can produce a surprisingly good 3D effect with no glasses or other tech, but it remains to be see if it's positive or just a gimmick. It would be particularly interesting for "audience" modes where you array people in rows.
Of course if stereo 3D displays are also used, it could be combined with that. Also of interest are projects like Google's Project Starline that involves a dedicated 1-on-1 meeting station with a fancy light field display to create the illusion that somebody is really sitting across from you. This is a great thing, but not something that can be done with existing hardware or even moderate extra hardware (like extra cameras.)
Tools to push headsets
As noted, in group meetings, headsets greatly improve the audio dynamic and make the conversation more natural. Tools should know if the user is using a headset (they can tell just from the echo they hear back in the microphone) and change automatic default behaviours, as configured by the host.
In particular, meetings can be set to refuse participation without a headset, or more commonly to allow people with headsets to remain unmuted, while pushing those without headsets to using push-to-talk or automatically muting them after they stop talking and others begin.
Nvidia has demonstrated interesting tools that can alter where a person's eyes and even head appear to be pointing. There is a strong risk of uncanny valley here, but if this can be done, it has interesting uses.
The first use is to make good eye contact, which many people don't get depending on how their camera is set up.
A second use is to not have eye contact when it should not happen. Many believe that one reason for "Zoom fatigue" is that everybody seems to be looking at you, all the time. To fix this, you want the system to look at everybody's eyes and see who they are looking at, and only show their video to that person with eye contact. For everybody else, they should be shown as not looking at you, possibly even turning their head as well as moving their eyes, ideally to look towards the the actual person they are looking at. This would create a more natural table dynamic.
This would also make it more clear when people were not paying attention to the speaker and doing email, just as this is obvious in a meeting room.
Another way to do this would be to have 2 or more cameras. That's fairly inexpensive today. The video of a person looking at camera 1 and not looking at camera 2 would have no uncanny valley, except during the transition. Of course two cameras could also be used to generate true stereo for those with a 3D display, and 3 or more cameras could provide everything. This may seem expensive but it's actually not, and bandwidth is cheaper today too. It could be done by adding a standalone 2nd camera, or could be done best by buying a "camera bar" instead of a single webcam. Since 99% of the reason we buy webcams or put them in laptops today is to do video calls, it's not a big push to start making camera bars just for doing advanced video calls.