There are many tools now being used to replace physical conferences and meetings -- not just Zoom. And no one system is complete, or even best-of-breed in all the various functions it provides. It's time for these tools to develop a way to interoperate, so people can build an event mixing and matching tools, but allowing attendees to flow smoothly between the tools without needing to create different accounts, re-authenticate or have a large learning curve.
Internet economics, technology and issues
Nick Denton was a sleazebag. I knew that within one minute of meeting him, as he described the new web site he was planning, called "Valleywag." He was proud he had learned the name of Larry Page's girlfriend and he could break that story, as if who Larry was dating was worthy news of some kind.
A new service called Red Carpet was announced, which will offer first-run movies in the homes of the very wealthy. You need a $15,000 DRM box and movie rentals are $1,500 to $3,000 per rental. That price is not a typo.
So I wrote an article pondering why that is, and why this could not be done at a price that ordinary people could afford, similar to the price of going to the movies.
Most of the world was wowed by the Google Duplex demo, where their system was able to cold-call a hairdresser and make an appointment with her, with the hairdresser unaware she was talking to an AI. The system included human speech mannerisms and the ability to respond to the random phrases the hairdresser through back.
A huge opportunity awaits a young social media company that is poised to take advantage of the fall of Facebook (and Twitter). Is somebody out there ready to carry the ball and make it happen. It probably has to be somebody already with most of this done, or even operating.
You, by definition, read blog posts. But the era of lots of individual personal web sites seems to be on the wane. It used to be everybody had a "home page" and many had one that updated frequently (a blog) but I, and many other bloggers, have noticed a change of late. It can be seen in the "referer" summaries you get from your web server that show who is making popular links to your site.
I have many things to discuss on the problem of "fake news" (which is to say, deliberately constructed false reports aimed to be spread to deceive) and the way it spreads through social media. This hot topic, seen as one of the largest threats to democracy to ever arise -- especially when combined with automated microtargeting of political propaganda -- is causing people to clamour for solutions.
Having written yesterday about a decision to sell Bitcoin, I want to re-examine two posts I made in the past which are now even more apropos.
My sell decision was (at least temporarily) wise as it dropped to $9300 quickly. I don't think it will necessarily never see $11K again. It is a speculative value with no fundamentals behind it to help judge the right price range.
E-mail is facing a decline. This is something I lament, and I plan to write more about that general problem, but today I want to point out something that is true, but usually not recognized. Namely that E-mail today is often secure in transit, and we can make better use of that and improve it.
The right way to secure any messaging service is end-to-end. That means that only the endpoints -- ie. your mail client -- have the keys and encrypt or decrypt the message. It's impossible, if the crypto works, for anybody along the path, including the operators of the mail servers as well as the pipes, to decode anything but the target address of your message.
We could have built an end-to-end secure E-mail system. I even proposed just how to do it over a decade ago and I still think we should do what I proposed and more. But we didn't.
Along the way, though, we have mostly secured the individual links an E-mail follows. Most mail servers use encrypted SMTP over TLS when exchanging mail. The major web-mail programs like Gmail use encrypted HTTPS web sessions for reading it. The IMAP and POP servers generally support encrypted connections with clients. My own server supports only IMAPS and never IMAP or POP, and there are others like that.
What this means is that if I send a message to you on Gmail, while my SMTP proxy and Google can read that message, nobody tapping the wire can. Governments and possibly attackers can get into those servers and read that E-mail, but it's not an easy thing to do. This is not perfect, but it's actually pretty useful, and could be more useful.
Not everybody loves video calls, but there are times when they are great. I like them with family, and I try to insist on them when negotiating, because body language is important. So I've watched as we've increased the quality and ease of use.
The ultimate goals would be "retinal" resolution -- where the resolution surpasses your eye -- along with high dynamic range, stereo, light field, telepresence mobility and VR/AR with headset image removal. Eventually we'll be able to make a video call or telepresence experience so good it's a little hard to tell from actually being there. This will affect how much we fly for business meetings, travel inside towns, life for bedridden and low mobility people and more.
Here's a proposal for how to provide that very high or retinal resolution without needing hundreds of megabits of high quality bandwidth.
Many people have observed that the human eye is high resolution on in the center of attention, known as the fovea centralis. If you make a display that's sharp where a person is looking, and blurry out at the edges, the eye won't notice -- until of course it quickly moves to another section of the image and the brain will show you the tunnel vision.
Decades ago, people designing flight simulators combined "gaze tracking," where you spot in real time where a person is looking with the foveal concept so that the simulator only rendered the scene in high resolution where the pilot's eyes were. In those days in particular, rendering a whole immersive scene at high resolution wasn't possible. Even today it's a bit expensive. The trick is you have to be fast -- when the eye darts to a new location, you have to render it at high-res within milliseconds, or we notice. Of course, to an outside viewer, such a system looks crazy, and with today's technology, it's still challenging to make it work.
With a video call, it's even more challenging. If a person moves their eyes (or in AR/VR their head) and you need to get a high resolution stream of the new point of attention, it can take a long time -- perhaps hundreds of milliseconds -- to send that signal to the remote camera, have it adjust the feed, and then get that new feed back to you. There is no way the user will not see their new target as blurry for way too long. While it would still be workable, it will not be comfortable or seem real. For VR video conferencing it's even an issue for people turning their head. For now, to get a high resolution remote VR experience would require sending probably a half-sphere of full resolution video. The delay is probably tolerable if the person wants to turn their head enough to look behind them.
One opposite approach being taken for low bandwidth video is the use of "avatars" -- animated cartoons of the other speaker which are driven by motion capture on the other end. You've seen characters in movies like Sméagol, the blue Na'vi of the movie Avatar and perhaps the young Jeff Bridges (acted by old Jeff Bridges) in Tron: Legacy. Cartoon avatars are preferred because of what we call the Uncanny Valley -- people notice flaws in attempts at total realism and just ignore them in cartoonish renderings. But we are now able to do moderately decent realistic renderings, and this is slowly improving.
My thought is to combine foveal video with animated avatars for brief moments after saccades and then gently blend them towards the true image when it arrives. Here's how.
- The remote camera will send video with increasing resolution towards the foveal attention point. It will also be scanning the entire scene and making a capture of all motion of the face and body, probably with the use of 3D scanning techniques like time-of-flight or structured light. It will also be, in background bandwidth, updating the static model of the people in the scene and the room.
- Upon a saccade, the viewer's display will immediately (within milliseconds) combine the blurry image of the new target with the motion capture data, along with the face model data received, and render a generated view of the new target. It will transmit the new target to the remote.
- The remote, when receiving the new target, will now switch the primary video stream to a foveal density video of it.
- When the new video stream starts arriving, the viewer's display will attempt to blend them, creating a plausible transition between the rendered scene and the real scene, gradually correcting any differences between them until the video is 100% real
- In addition, both systems will be making predictions about what the likely target of next attention is. We tend to focus our eyes on certain places, notably the mouth and eyes, so there are some places that are more likely to be looked at next. Some portion of the spare bandwidth would be allocated to also sending those at higher resolution -- either full resolution if possible, or with better resolution to improve the quality of the animated rendering.
The animated rendering will, today, both be slightly wrong, and also suffer from the uncanny valley problem. My hope is that if this is short lived enough, it will be less noticeable, or not be that bothersome. It will be possible to trade off how long it takes to blend the generated video over to the real video. The longer you take, the less jarring any error correction will be, but the longer the image is "uncanny."
While there are 100 million photoreceptors in the whole eye, but only about a million nerve fibers going out. It would still be expensive to deliver this full resolution in the attention spot and most likely next spots, but it's much less bandwidth than sending the whole scene. Even if full resolution is not delivered, much better resolution can be offered.
Stereo and simulated 3D
You can also do this in stereo to provide 3D. Another interesting approach was done at CMU called pseudo 3D. I recommend you check out the video. This system captures the background and moves the flat head against it as the viewer moves their head. The result looks surprisingly good.
Here's the situation: You're in a place with no bandwidth or limited bandwidth. It's just the place that you need to download an app, because the good apps, at least, can do more things locally and not make as much use of the network. But you can't get to the app store. The archetype of this situation is being on a plane with wifi and video offerings over the wifi. You get on board and you connect and it says you needed to download the app before you took off and got disconnected.
I believe we have the potential to eliminate a major fraction of traffic congestion in the near future, using technology that exists today which will be cheap in the future. The method has been outlined by myself and others in the past, but here I offer an alternate way to explain it which may help crystallize it in people's minds.
Today many people drive almost all the time guided by their smartphone, using navigation apps like Google Maps, Apple Maps or Waze (now owned by Google.) Many have come to drive as though they were a robot under the command of the app, trusting and obeying it at every turn. Tools like these apps are even causing controversy, because in the hunt for the quickest trip, they are often finding creative routes that bypass congested major roads for local streets that used to be lightly used.
Put simply, the answer to traffic congestion might be, "What if you, by law, had to obey your navigation app at rush hour?" To be more specific, what if the cities and towns that own the streets handed out reservations for routes on those streets to you via those apps, and your navigation app directed you down them? And what if the cities made sure there were never more cars put on a piece of road than it had capacity to handle? (The city would not literally run Waze, it would hand out route reservations to it, and Waze would still do the UI and be a private company.)
The value is huge. Estimates suggest congestion costs around 160 billion dollars per year in the USA, including 3 billion gallons of fuel and 42 hours of time for every driver. Roughly quadruple that for the world.
Road metering actually works
This approach would exploit one principle in road management that's been most effective in reducing congestion, namely road metering. The majority of traffic congestion is caused, no surprise, by excess traffic -- more cars trying to use a stretch of road than it has the capacity to handle. There are other things that cause congestion -- accidents, gridlock and irrational driver behaviour, but even these only cause traffic jams when the road is near or over capacity.
Today, in many cities, highway metering is keeping the highways flowing far better than they used to. When highways stall, the metering lights stop cars from entering the freeway as fast as they want. You get frustrated waiting at the metering light but the reward is you eventually get on a freeway that's not as badly overloaded.
Another type of metering is called congestion pricing. Pioneered in Singapore, these systems place a toll on driving in the most congested areas, typically the downtown cores at rush hour. They are also used in London, Milan, Stockholm and some smaller towns, but have never caught on in many other areas for political reasons. Congestion charging can easily be viewed as allocating the roads to the rich when they were paid for by everybody's taxes.
A third successful metering system is the High-occupancy toll lane. HOT lanes take carpool lanes that are being underutilized, and let drivers pay a market-based price to use them solo. The price is set to bring in just enough solo drivers to avoid wasting the spare capacity of the lane without overloading it. Taking those solo drivers out of the other lanes improves their flow as well. While not every city will admit it, carpool lanes themselves have not been a success. 90% of the carpools in them are families or others who would have carpooled anyway. The 10% "induced" carpools are great, but if the carpool lane only runs at 50% capacity, it ends up causing more congestion than it saves. HOT is a metering system that fixes that problem.
HBO released a new version of "Westworld" based on the old movie about a robot-based western theme park. The show hasn't excited me yet -- it repeats many of the old tropes on robots/AI becoming aware -- but I'm interested in the same thing the original talked about -- simulated experiences for entertainment.
The new show misses what's changed since the original. I think it's more likely they will build a world like this with a combination of VR, AI and specialty remotely controlled actuators rather than with independent self-contained robots.
One can understand the appeal of presenting the simulation in a mostly real environment. But the advantages of the VR experience are many. In particular, with the top-quality, retinal resolution light-field VR we hope to see in the future, the big advantage is you don't need to make the physical things look real. You will have synthetic bodies, but they only have to feel right, and only just where you touch them. They don't have to look right. In particular, they can have cables coming out of them connecting them to external computing and power. You don't see the cables, nor the other manipulators that are keeping the cables out of your way (even briefly unplugging them) as you and they move.
This is important to get data to the devices -- they are not robots as their control logic is elsewhere, though we will call them robots -- but even more important for power. Perhaps the most science fictional thing about most TV robots is that they can run for days on internal power. That's actually very hard.
The VR has to be much better than we have today, but it's not as much of a leap as the robots in the show. It needs to be at full retinal resolution (though only in the spot your eyes are looking) and it needs to be able to simulate the "light field" which means making the light from different distances converge correctly so you focus your eyes at those distances. It has to be lightweight enough that you forget you have it on. It has to have an amazing frame-rate and accuracy, and we are years from that. It would be nice if it were also untethered, but the option is also open for a tether which is suspended from the ceiling and constantly moved by manipulators so you never feel its weight or encounter it with your arms. (That might include short disconnections.) However, a tracking laser combined with wireless power could also do the trick to give us full bandwidth and full power without weight.
It's probably not possible to let you touch the area around your eyes and not feel a headset, but add a little SF magic and it might be reduced to feeling like a pair of glasses.
The advantages of this are huge:
- You don't have to make anything look realistic, you just need to be able to render that in VR.
- You don't even have to build things that nobody will touch, or go to, including most backgrounds and scenery.
- You don't even need to keep rooms around, if you can quickly have machines put in the props when needed before a player enters the room.
- In many cases, instead of some physical objects, a very fast manipulator might be able to quickly place in your way textures and surfaces you are about to touch. For example, imagine if, instead of a wall, a machine with a few squares of wall surface quickly holds one out anywhere you're about to touch. Instead of a door there is just a robot arm holding a handle that moves as you push and turn it.
- Proven tricks in VR can get people to turn around without realizing it, letting you create vast virtual spaces in small physical ones. The spaces will be designed to match what the technology can do, of course.
- You will also control the audio and cancel sounds, so your behind-the-scenes manipulations don't need to be fully silent.
- You do it all with central computers, you don't try to fit it all inside a robot.
- You can change it all up any time.
In some cases, you need the player to "play along" and remember not to do things that would break the illusion. Don't try to run into that wall or swing from that light fixture. Most people would play along.
For a lot more money, you might some day be able to do something more like Westworld. That has its advantages too:
- Of course, the player is not wearing any gear, which will improve the reality of the experience. They can touch their faces and ears.
- Superb rendering and matching are not needed, nor the light field or anything else. You just need your robots to get past the uncanny valley
- You can use real settings (like a remote landscape for a western) though you may have a few anachronisms. (Planes flying overhead, houses in the distance.)
- The same transmitted power and laser tricks could work for the robots, but transmitting enough power to power a horse is a great deal more than enough to power a headset. All this must be kept fully hidden.
The latter experience will be made too, but it will be more static and cost a lot more money.
Yes, there will be sex
Warning: We're going to get a bit squicky here for some folks.
Westworld is on HBO, so of course there is sex, though mostly just a more advanced vision of the classic sex robot idea. I think that VR will change sex much sooner. In fact, there is already a small VR porn industry, and even some primitive haptic devices which tie into what's going on in the porn. I have not tried them but do not imagine them to be very sophisticated as yet, but that will change. Indeed, it will change to the point where porn of this sort becomes a substitute for prostitution, with some strong advantages over the real thing (including, of course, the questions of legality and exploitation of humans.)