Technology

The failure of the pan-tilt camera in video calls

This year, we stayed with Kathryn’s family for the holidays, so I attended dinner in my own mother’s home via Skype. Once again, the technology was frustrating. And it need not be.

There were many things that can be better. For those of us who Skype regularly, we don’t understand that there is still hassle for those not used to it. Setting up a good videoconferencing setup is still work. As I have found is always the case in a group-to-solos videoconference, the group folks do not care nearly as much about the conference as the remote solos, so a fundamental rule of design here is that if the remotes can do something, they should be the ones doing it, since they care the most. If there is to be UI, leave the UI to the remotes (who are sitting at computers and care) and not to the meeting room locals. Many systems get this exactly backwards — they imagine the meeting room is the “master” and thus has the complex UI.

In this family setting, however, the clearest problem for me is that no camera can show the whole room. It’s like sitting at the table unable to move your head, with blinders on. You can’t really be part of the group. You also have to be away from the table so everybody there can see you, since screens are only visible over a limited viewing angle.

One clear answer to this is the pan/tilt camera, which is to say a webcam with servo motors that allow it to look around. This technology is very cheap — you’ll find pan/tilt IP security cameras online for $30 or less, and there are even some low priced Chinese made pan/tilt webcams out there — I just picked another up for $20. I also have the Logitech Orbit AF. This was once a top of the line HD webcam, and still is very good, but Logitech no longer makes it. Logitech also makes the BCC950 — a $200 conference room pan/tilt webcam which has extremely good HD quality and a built-in hardware compressor for 1080p video that is superb with Skype. We have one of these, and it advertises “remote control” but in fact all that means is there is an infrared remote the people in the room can use to steer the camera. In our meetings, nobody ever uses this remote for the reason I specify above — the people in the room aren’t the motivated ones.

This is compounded by the fact that the old method — audio conference speakerphones — have a reasonably well understood UI. Dial the conference bridge, enter a code, and let the remotes handle their own calling in. Anything more complex than that gets pushback — no matter how much better it is.  read more »

The RV of the future

Over the years, particularly after Burning Man, I’ve written posts about how RVs can be improved. This year I did not use an regular RV but rather a pop-up camping trailer. However, I thought it was a good time to summarize a variety of the features I think should be in every RV of the future.

Smart Power

We keep talking about smart power and smart grids but power is expensive and complex when camping, and RVs are a great place for new technologies to develop.

To begin with, an RV power system should integrate the deep cycle house batteries, a special generator/inverter system, smart appliances and even the main truck engine where possible.

Today the best small generators are inverter based. Rather than generating AC directly from an 1800rpm motor and alternator, they have a variable speed engine and produce the AC via an inverter. These are smaller, more efficient, lighter and quieter than older generators, and produce cleaner power. Today they are more expensive, but not more expensive than most RV generators. RV generators are usually sized at 3,600 to 4,000 watts in ordinary RVs — that size dictated by the spike of starting up the air conditioner compressor when something else, like the microwave is running.

An inverter based generator combined with the RV’s battery bank doesn’t have to be that large. It can draw power for the surge of starting a motor from the battery. The ability to sustain 2,000 watts is probably enough, with a few other tricks. Indeed, it can provide a lot of power even with the generator off, though the generator should auto-start if the AC is to be used, or the microwave will be used for a long time.

By adding a data network, one can be much more efficient with power. For example, the microwave could just turn off briefly when the thermostat wants to start the AC’s compressor, or even the fans. The microwave could also know if it’s been told to cook for 30 seconds (no need to run generator) or 10 minutes (might want to start it.) It could also start the generator in advance of cooling need.

If the master computer has access to weather data, it could even decide what future power needs for heating fans and air conditioning will be, and run the generator appropriately. With a GPS database, it could even know the quiet times of the campsite it’s in and respect them.

A modern RV should have all-LED lighting. Power use is so low on those that the lights become a blip in power planning. Only the microwave, AC and furnace fan would make a difference. Likewise today’s TVs, laptops and media players which all draw very few watts.

A smart power system could even help plugging into shore power, particularly a standard 15a circuit. Such circuits are not enough to start many ACs, or to run the AC with anything else. With surge backup from the battery, an RV could plug into an ordinary plug and act almost like it had a high power connection.

To go further, for group camping, RVs should have the ability to form an ad-hoc power grid. This same ability is already desired in the off-grid world, so it need not be developed just for RVs. RVs able to take all sorts of input power could also eventually get smart power from RV campsites. After negotiation, a campsite might offer 500v DC at 12 amps instead of 115v AC, allowing the largest dual-AC RVs to plug into small wires.  read more »

Augmented Reality as documentation and the "context" button

I’ve been a little skeptical of many augmented reality apps I’ve seen, feeling they were mostly gimmick and not actually useful.

I’m impressed by this new one from Audi where you point your phone (iPhone only, unfortunately) at a feature on your car, and you get documentation on it. An interesting answer to car user manuals that are as thick as the glove compartment and the complex UIs they describe.

Like so many apps, however, this one will suffer the general problem of the amount of time it takes to fumble for your phone, unlock it, invoke an app, and then let the app do its magic. Of course fumbling for the manual and looking up a button in the index takes time too.

I’ve advocated for a while that phones become more aware of their location, not just in the GPS sense, but in the sense of “I’m in my car” and know what apps to make very easy to access, and even streamline their use. This can include allowing these apps to be right on the lock screen — there’s no reason to need to unlock the phone to use an app like this one. In fact, all the apps you use frequently in your car that don’t reveal personal info should be on the lock screen when you get near the car, and some others just behind it. The device can know it is in the car via the bluetooth in the car. (That bluetooth can even tell you if you’re in another car of a different make, if you have a database mapping MAC addresses to car models.)

Bluetooth transmitters are so cheap and with BT Low Energy they can last a year on a watch battery, so one of the more compelling “Internet of Things” applications — that’s also often a gimmick term — is to scatter these devices around the world to give our phones this accurate sense of place.

Some of this philosophy is expressed in Google Now, a product that goes the right way on many of these issues. Indeed, the Google Now cards are one of the more useful aspects of Glass, which otherwise is inherently limited in its user interface making it harder for you to ask Glass things than it is to ask a phone or desktop.

The car app has some wrinkles of course. Since you don’t always have an iPhone (or may not have your phone even if you own an iPhone) you still need the thick manual, though perhaps it can be in the trunk. And I will wager that some situations, like odd lighting, may make it not as fast as in the video.

By and large, pointing your phone at QR codes to learn more has not caught on super well, in part again because it takes time to get most phones to the point where they are scanning the code. Gesture interfaces can help there but you can only remember and parse a limited number of gestures, so many applications call out for being the special one. Still a special shake which means “Look around you in all ways you can to figure out if there is something in this location, time or camera view that I might want you to process.” Constant looking eats batteries which is why you need such a shake.

I’ve proposed that even though phones have slowly been losing all their physical buttons, I would put this back as a physical button I call the “context” button. “Figure out the local context, and offer me the things that might be particularly important in this context.” This would offer many things:

  • Standing in front of a restaurant or shop, the reviews, web site or app of the shop
  • In the car, all the things you like in the car, such as maps/nav, the manual etc.
  • In front of a meeting room, the schedule for that room and ability to book it
  • At a tourist attraction, info on it.
  • In a hotel, either the ability to book a room, or if you have a room, hotel services

There are many contexts, but you can usually sort them so that the most local and the most rare come first. So if you are in a big place you are frequently, such as the office complex you work at, the general functions for your company would not be high on the list unless you manually bumped them.

Of course, one goal is that car UIs will become simpler and self-documenting, as cars get screens. Buttons will still do the main functions you do all the time — and which people already understand — but screens will do the more obscure things you might need to look up in the manual, and document it as they go. You obviously can’t ever do something you need to look up in the manual while driving.

There is probably a trend that the devices in our lives with lots of buttons and complex controls and modes, like home electronics, cars and some appliances, will move to having screens in their UIs and thus not need the augmented reality.

RAID, backyard backup and the future of backup

Had my second RAID failure last week. In the end, things were OK but the reality is that many RAID implementations are much more fragile than they should be. Write failures on a drive caused the system to hang. Hard reset caused the RAID to be marked dirty, which mean it would not boot until falsely marked clean (and a few other hoops,) leaving it with some minor filesystem damage that was reparable. Still, I believe that a proper RAID-like system should have as its maxim that the user is never worse off because they built a RAID than if they had not done so. This is not true today, both due to fragility of systems, and the issues I have outlined before with deliberately replacing a disk in a RAID, where it does not make use of the still-good but aging old disk when rebuilding the replacement.

A few years ago I outlined a plan for disks to come as two-packs for easy, automatic RAID because disks are so cheap that everybody should be doing it. The two-pack would have two SATA ports on it, but if you only plugged in one, it would look like a single disk, and be a RAID-1 inside. If you gave it a special command, it could look like other things, including a RAID-0, or two drives, or a JBOD concatenation. If you plugged into the second port it would look like two disks, with the RAID done elsewhere.

I still want this, but RAID is not enough. It doesn’t save you from file deletion, or destruction of the entire system. The obvious future trend is network backup, which is both backup and offsite. The continuing issue with network backup is that some people (most notably photographers and videographers) generate huge amounts of data. I can come back from a weekend with 16gb of new photos, and that’s a long slog over DSL with limited upstream for network backup. To work well, network backup also needs to understand all databases, as a common database file might be gigabytes and change every time there is a minor update to a database record. (Some block-level incrementalism can work here if the database is not directly understood.)

Network backup is also something that should be automatic. There are already peer-to-peer network backups, that make use of the disks of friends or strangers (encrypted of course) but it would be nice if this could “just happen” when any freshly installed computer unless you turn it off. The user must keep the key stored somewhere safe, which is not zero-UI, though if all they want is to handle file-deletion and rollback they can get away without it.

Another option that might be interesting would be the outdoor NAS. Many people now like to use NAS boxes over gigabit networks. This is not as fast as SATA with a flash drive, or RAID, or even modern spinning disk, but it’s fast enough for many applications.

An interesting approach would be a NAS designed to be placed outdoors, away from the house, such as in the back corner of a yard, so that it would survive a fire or earthquake. The box would be waterproof and modestly fireproof, but ideally it is located somewhere a fire is unlikely to reach. It could either be powered by power-over-ethernet or could have its own power and even use WIFI (in which case it is only suitable for backup, not as a live NAS.)

This semi-offsite backup would be fast and cheap (network storage tends to be much more expensive than local drives.) It would be encrypted, of course, so that nobody can steal your data. Encryption would be done in the clients, not the NAS, so even somebody who taps the outside wire would get nothing.

This semi-offsite backup could be used in combination with network backup. Large files and new files would be immediately sent to the backyard backup. The most important files could then go to network backup, or all of them, just much more slowly.

A backyard backup could also be shared by neighbours, especially on wifi, which might make it quite cost effective. Due to encryption, nobody could access their neighbour’s data.

If neighbours are going to cooperate, this can also be built by just sharing servers or NAS boxes in 2 or more houses. This provides decent protection and avoids having to be outside, but there is the risk that some fires burn down multiple houses depending on the configuration.

A backyard backup would be so fast that many would reverse what I said above, and have no need for RAID. Files would be mirrored to the backyard backup within seconds or minutes. RAID would only be needed for those who need to have systems that won’t even burp in a disk failure (which is a rare need in the home) or which must not lose even a few minutes of data.

The laptop in the tablet world

I have owned a laptop for decades, and I’ve always gone for the “small and light” laptop class because as a desktop user, my laptop is only for travel, and ease of carrying is thus very important. Of course once I get there I have envied the larger screens and better keyboards and other features of the bigger laptops people carry, but generally been happy with the decision.

Others have gone for “desktop replacement” laptops which are powerful, big and heavy. Those folks don’t have a desktop, at most they plug their laptop into an external monitor and other peripherals at home. The laptop is a bitch to carry but of course all files come with it.

Today, the tablet is changing that equation. I now find that when I am going into a situation where I want a minimal device that’s easy to carry, the tablet is the answer, and even better the tablet and bluetooth keyboard. I even carry a keyboard that’s a fair bit larger than the tablet, but still very light compared to a laptop. When I am in a meeting, or sitting attending an event, I am not going to do the things I need the laptop for. Well, not as much, anyway. On the airplane, the tablet is usually quite satisfactory — in fact better when in coach, though technically the keyboard is not allowed on a plane. (My tablet can plug in a USB keyboard if needed.)

Planes are a particular problem. It’s not safe to check LCD screens in your luggage, so any laptop screen has to come aboard with you, and this is a pain if the computer is heavy.

With the tablet dealing with the “I want small and light” situations, what is the right laptop answer?

One obvious solution are the “convertible tablet” computers being offered by various vendors. These are laptops where the screen is a tablet and it can be removed. These tend to be Windows devices, and somewhat expensive, but the approximate direction is correct.

Another option would be to break the laptop up into 3 or more components:  read more »

  • The tablet, running your favourite tablet OS
  • A keyboard, of your choice, which can be carried easily with the tablet for typing-based applications. Able to hold the laptop and connect to it in a permitted way on the plane. Touchpad or connection for mouse.
  • A “block,” whose form factor is now quite variable, with the other stuff.

Documentary on the first six programmers funded and going into production

I am pleased to report that a documentary on the first software developers, the 6 women who were hired to program the ENIAC — the first electronic computer — has after many years received a funding grant sufficient to produce it.

Back in the 90s, my close friend Kathy Kleiman was researching computer history and came upon photos of the ENIAC and wondered who the unnamed women in the photos were. At first, she was told they were models hired to decorate the computer, but further investigation revealed they were the ones programming it.

The six women were professional computers, which was a job title early in the century — people with math degrees hired to perform calculations, in particular ballistic firing tables for the war. Because of the war, skilled women got these jobs, and the best of the team were asked to write software to get the machine to do the tables they were doing by hand. They were given no instruction, just the wiring diagrams and other machine designs, and created the first software applications, including inventing things like the first sort routine and many other things fundamental to our profession.

Because nobody knew the history of these founders of our profession, Kathy sought them out, and was able to record video interviews with 4 of them. These interviews have languished in the can for many years, and alas, all 6 of them are now deceased. I’ve been trying to help for many years, but in a fortuitous lunch, I was able to make the introductions necessary to arrange funding through the efforts and support of my friends Megan Smith, Anne Wojcicki and Lucy Southworth.

Kathy got to make the announcement at Google I/O in a special session about female techmakers featuring an array of accomplished women in technology. She showed a small section of the movie’s trailer. Her section can be seen 9 minutes into the video, and the programmers at 11:30. (Megan accidentally called me Brad Feldman, but I forgive her :-)

Software development is perhaps the most important new profession of the 20th century — and there were many — and the story of the six unsung founders of that profession will finally be presented to a large audience. I’ll announce when the documentary is released.

We need a security standard for USB and other plug-in devices

Studies have shown that if you leave USB sticks on the ground outside an office building, 60% of them will get picked up and plugged into a computer in the building. If you put the company logo on the sticks, closer to 90% of them will get picked up and plugged in.

USB sticks, as you probably know, can pretend to be CD-ROMs and that means on many Windows systems, the computer will execute an “autorun” binary on the stick, giving it control of your machine. (And many people run as administrator.) While other systems may not do this, almost every system allows a USB stick to pretend to be a keyboard, and as a keyboard it also can easily take full control of your machine, waiting for the machine to be idle so you won’t see it if need be. Plugging malicious sticks into computers is how Stuxnet took over Iranian centrifuges, and yet we all do this.

I wish we could trust unknown USB and bluetooth devices, but we can’t, not when they can be pointing devices and mice and drives we might run code from.

New OS generations have to create a trust framework for plug-in hardware, which includes USB and firewire and to a lesser degree even eSata.

When we plug in any device that might have power over the machine, the system should ask us if we wish to trust it, and how much. By default, we would give minimum trust to drives, and no trust to pointing devices or keyboards and the like. CD-Roms would not get the ability to autorun, though it could be granted by those willing to take this risk, poor a choice as it is.

Once we grant the trust, the devices should be able to store a provided key. After that, the device can then use this key to authenticate itself and regain that trust when plugged in again. Going forward all devices should do this.

The problem is they currently don’t, and people won’t accept obsoleting all their devices. Fortunately devices that look like writable drives can just have a token placed on the drive. This token would change every time, making it hard to clone.

Some devices can be given a unique identifier, or a semi-unique one. For devices that have any form of serial number, this can be remembered and the trust level associated with it. Most devices at least have a lot of identifiers related to the make and model of device. Trusting this would mean that once you trusted a keyboard, any keyboard of the same make and model would also be trusted. This is not super-secure but prevents generic attacks — attacks would have to be directly aimed at you. To avoid a device trying to pretend to be every type of keyboard until one is accepted, the attempted connection of too many devices without a trust confirmation should lock out the port until a confirmation is given.

The protocol for verification should be simple so it can be placed on an inexpensive chip that can be mass produced. In particular, the industry would mass produce small USB pass-through authentication devices that should cost no more than $1. These devices could be stuck on the plugs of old devices to make it possible for them to authenticate. They could look like hubs, or be truly pass-through.

All of this would make USB attacks harder. In the other direction, I believe as I have written before that there is value in creating classes of untrusted or less trusted hardware. For example, an untrusted USB drive might be marked so that executable code can’t be loaded from it, only classes of files and archives that are well understood by the OS. And an untrusted keyboard would only be allowed to type in boxes that say they will accept input from an untrusted keyboard. You could write the text of emails with the untrusted keyboard, but not enter URLs into the URL bar or passwords into password boxes. (Browser forms would have to indicate that an untrusted keyboard could be used.) In all cases, a mini text-editor would be available for use with the untrusted keyboard, from where one could cut and paste using a trusted device into other boxes.

A computer that as yet has no trusted devices of a given class would have to trust the first one plugged in. Ie. if you have a new computer that’s never had a keyboard, it has to trust its first keyboard unless there is another way to confirm trust when that first keyboard is plugged in. Fortunately mobile devices all have built in input hardware that can be trusted at manufacture, avoiding this issue. If a computer has lost all its input devices and needs a new one, you could either trust implicitly, or provide a pairing code to type on the new keyboard (would not work for mouse) to show you are really there. But this is only a risk on systems that normally have no input device at all.

For an even stronger level of trust, we might want to be able to encrypt the data going through. This stops the insertion of malicious hubs or other MITM intercepts that might try to log keystrokes or other data. Encryption may not be practical in low power devices that need to be drives and send data very fast, but it would be fine for all low speed devices.

Of course, we should not trust our networks, even our home networks. Laptops and mobile devices constantly roam outside the home network where they are not protected, and then come back inside able to attack if trusted. However, some security designers know this and design for this.

Yes, this adds some extra UI the first time you plug something in. But that’s hopefully rare and this is a big gaping hole in the security of most of our devices, because people are always plugging in USB drives, dongles and more.

Time for the post office to let anybody print postage with no minimums

For some time, the US Postal Service has allowed people to generate barcoded postage. You can do that on the expensive forms of mail such as priority mail and express mail, but if you want to do it on ordinary mail, like 1st class mail or parcel post, you need an account with a postage meter style provider, and these accounts typically include a monthly charge of $10/month or more. For an office, that’s no big deal, and cheaper than the postage meters that most offices used to buy — and the pricing model is based on them to some extent, even though now there is no hardware needed. But for an ordinary household, $120/year is far more than they are going to spend on postage.

There is one major exception I know of — if you buy something via PayPal, they allow you to print a regular postage shipping label with electronic postage. This is nice and convenient, but no good for sending ordinary letters and other small items.

I think the USPS is shooting itself in the foot by not letting people just buy postage online with no monthly fee. The old stamp system is OK for regular letters, and indeed they finally changed things so that old first class stamps still work after price raises, but for anything else you have to keep lots of stamps in supply and you often waste postage, or make a trip to a mailing office. This discourages people from using the post office, and will only hasten its demise. Make it trivial to mail things and people will mail more.

It could be a web printed mailing label as you can use for priority mail, but most software vendors would quickly support such a system. If people wanted, they could even buy “stamps” which were collections of electronic postage in various denominations that could be used by programs so there is no need to handle transactions. Address label printers would all quickly also do postage.

Of course the official suppliers like Endica and stamps.com would fight this completely. They love being official suppliers and charging large fees. They have more lobbying power than ordinary mailers. So the post office is going to quietly slip away into that good night, instead of taking advantage of the fact that it’s the one delivery company that comes to my door every day (for both pick up and delivery) and all the effiencies that provides.

Modify E-book reader designs for digital signs

One of the useful attributes of electronic paper (such as E-Ink) is that it doesn’t take any power to retain an image, it only takes power to change the image. This is good for long-lasting E-readers, and digital signs are one of the other key applications of electronic paper, though today they are sold with a focus on the retail market.

Earlier, I wrote about concepts for a fourth screen which is an always-on wall computer that has effectively no user interface — its purpose is to show you stuff that is probably of interest to you based on time of day and who is looking at the screen. That proposal requires that the display be located where there is power, but there are many locations where wiring in permanent power is not a readily available option.

The typical e-book reader has all the hardware needed to act as a very low-power digital wall display. Such a display would have electronic paper and wifi. It would only wake up very rarely to briefly check, over the wifi (or better still bluetooth) if there is new data to display, in which case it would download it and display it. During these updates, it might also check to see if there is a new updating schedule.

You can do better than wifi, which usually requires a process of associating with an access point, getting an IP address, and then making queries. Bluetooth can connect with lower power. Even better would be a chip which is able to listen constantly at very low power for a special radio pulse (“wake on pulse”) from a powered transmitter, and then power on the rest of the system for data transfer. The panel could be put anywhere, and then a pulse generator would be put somewhere nearby that has power and is close enough to wake up the panel. (It might be something that plugs into a wall outlet and even does networking over the power lines.) This would allow the valuable ability to push information to the panel.

The panel’s battery would of course die in time, so there would need to be a battery swap ability or if need be a means to charge with a temporary extension cord, a battery-powered charger or taking the panel off the wall.

An immediate market for these would be the doors of meeting rooms, so that they can show the schedule for the meeting room. Many hotels and convention centers have screens to do this now, but due to the need for power and other integration, these tend to be quite expensive, while ebook readers are now in the $100 range.

But they would also be useful around the home for 4th screen applications, displaying useful info. They could also be put near fridges or stoves to display recipes and family information. Obviously if you can put in a powered LCD display, that’s going to be able to do more, but without the power constraint more people might use it. They do need to be lit by external light, of course, but also are visible in bright sun in a way that lcds are not. And a product like this might well start eating into the retail digital signage market — anybody know what the price points are these days in that market?

Car users frustrated with their tech

The latest JD Power survey on car satisfaction has a very new complaint that has now the second most annoying item to new car owners namely problems with the voice recognition system in their hands-free interface. This is not too surprising, since voice recognition, especially in cars, is often dreadful. It also reveals that most new tech has lots of UI problems — not every product is the iPod, lauded from the start for its UI.

But one interesting realization in the study is that users have become frustrated at having too many devices with too many UIs. Their car (which now has a touchpad and lots of computer features) uses a different UI from their phone and computer and tablet and whatever. Even if the car has a superb UI, the problem is that it is different, something new to learn and remember.

One might fix this by having the same platform, be it iOS or Android on several of the devices, but that’s a tall order. Car vendors do not want to make a phone one one platform and tick off people used to the other platform.

The answer lies in something the car makers don’t like: Don’t put much of their own smarts in the car at all, and expect the user to slot their own mobile phone or tablet into the car. This might be done with something like Nokia’s “Terminal Mode” where the car’s screen and buttons can be taken over by the phone, or by not having a screen in the car at all, just a standard mounting place.

Some time ago I wrote that cars should stop coming with included radios as they used to 30 years ago, and let the slot in the dashboard where the radio and electronics go become a center for innovation. In particular innovation at the speed of consumer and mobile devices, not innovation at the speed of car companies. But there are too many pressures to stop this from happening. Car companies get to charge a lot for fancy radio and electronics systems in the cars, and they like this. And they like the control over the whole experience. But as they get more complaints they may realize that it’s not the right thing for them to be building. Especially not when the car (and the in-dash system) last for 10 to 15 years, while most consumer electronic devices are obsolete in 1-2 years.

There aren’t that many makes of cars, nor so many mobile platforms, so making custom apps for the car and the mobile platform isn’t that hard. In fact, I would expect you would see lots of competing aftermarket ones if they opened up the market to it. And open source ones too, built by fans of the particular cars.

Terminal mode or a standard mounting port for mobile phones in cars?

It’s very common to use mobile phones for driving activities today. Many people even put in cell phone holders in their cars when they want to use the phones as navigation systems as well as make calls over a bluetooth. There’s even evidence that dashboard mounting reduces the distracted driving phenomenon associated with phones in cars.

Nokia and others are pushing one alternative for the cars that have dashboard screens. This is called “Terminal Mode” and is a protocol so the phone can make use of the display, buttons and touchscreens in the car. Putting the smarts in the phone and making the dash be the dumb peripheral is the right idea, since people upgrade phones frequently and cars not nearly so much. The terminal mode interface can be wireless so the phone does not have to be plugged in, though of course most people like to recharge phones while driving.

Terminal mode will be great if it comes, but it would be good to also push for a standard port on dashboards for mounting mobile phones. Today, most mobile phone holders either stick to the windshield with a suction cup, or clamp onto the vents of the air conditioner. A small port or perhaps flip out lever arm would be handy if standardized on dashboards. The lever arm would offer a standard interface for connecting a specific holder for the specific device. In addition, the port would offer USB wiring so that the holder could offer it to the phone. This would offer power at the very least but could also do data for terminal mode and some interfacing with other elements of the car, including the stereo system, or the onboard-diagnostics bus. Access to other screens in the back (for playing video) and to superior antennas might make sense. While many phones use their USB port to be a peripheral to a PC, some have “USB to go” which allows a device to be either master or peripheral, allowing more interesting functions.

Even with terminal mode, there could be value in having two screens, and more buttons, though of course apps would have to be developed to understand that. However, one simple thing is that a phone could run two apps at once on two screens (or even two apps at once on the larger screen of the car) which would actually be pretty handy.

Watson, game 2

Not much new to report after the second game of the Watson Jeopardy Challenge. I’ve added a few updates to yesterday’s post on Watson and the result was as expected, though Watson struggled a lot more in this game than in the prior round, deciding not to answer many questions due to low confidence and making a few mistakes. In a few cases it was saved by not buzzing fast enough even though it had over 50% confidence, as it would have answered slightly wrong.

Some quick updates from yesterday you will also find in the comments:

  • Toronto’s 2nd busiest airport, the small Island airport, has the official but rarely used name of Billy Bishop. Bishop was one of the top flying aces of WWI, not WWII. Watson’s answer is still not clear, but that it made mistakes like this is not surprising. That it made so few is surprising
  • You can buzz in as soon as Trebek stops speaking. If you buzz early, you can’t buzz again for 0.2 seconds. Watson gets an electronic signal when it is time to buzz, and then physically presses the button. The humans get a light, but they don’t bother looking at it, they try timing when Trebek will finish. I think this is a serious advantage for Watson.
  • This IBM Blog Post gives the details on the technical interface between Watson and the game.
  • Watson may have seemed confident with its large bet of $17,973. But in fact the bet was fixed in advance:
    • Had Jennings bet his whole purse (and got it right) he would have ended up with $41,200.
    • If Watson had lost his bet of 17,973, he would have ended up with $41,201 and bare victory.
    • Both got it right, and Jennings bet low, so it ended up being $77,147 to $24,000.
    • Jennings’ low bet was wise at it assured him of 2nd place and a $300K purse instead of $200K. Knowing he could not beat Watson unless Watson bet stupidly, he did the right thing.
    • Jennings still could have bet more and got 2nd, but there was no value to it, the purse is always $300K
    • If Watson had wanted to 2nd guess, it might have realized Jennings would do this and bet appropriately but that’s not something you can do more than once.
    • As you might expect, the team put a bunch of thought into the betting algorithm as that is one thing computers can do perfectly sometimes. I’ve often seen Jeopardy players lose from bad betting.
  • It still sure seemed like a program sponsored by IBM. But I think it would have been nice if the PI of DeepQA was allowed up on stage for the handshake.
  • I do wish they had programmed a bit of sense of humour into Watson. Fake, but fun.
  • Amusingly Watson got a category about computer keyboards and didn’t understand it.
  • Unlike the human players who will hit the buzzer before they have formed the answer in their minds, in hope that they know it, Watson does not hit unless it has computed a high confidence answer.
  • Watson would have bombed on visual or audio clues. The show has a rule allowing those to be removed from the game for a disabled player, these were applied!
  • A few of the questions had some interesting ironies based on what was going on. I wonder if that was deliberate or not. To be fair, I would think the question-writers would not be told what contest they were writing for.

Watson, come here, I want you

The computer scientist world is abuzz with the game show world over the showdown between IBM’s “Watson” question-answering system and the best human players to play the game Jeopardy. The first game has been shown, with a crushing victory by Watson (in spite of a tie after the first half of the game.)

Tomorrow’s outcome is not in doubt. IBM would not have declared itself ready for the contest without being confident it would win, and they wouldn’t be putting all the advertising out about the contest if they had lost. What’s interesting is how they did it and what else they will be able to do with it.

Dealing with a general question has long been one of the hard problems in AI research. Watson isn’t quite there yet but it’s managed a great deal with a combination of algorithmic parsing and understanding combined with machine learning based on prior Jeopardy games. That’s a must because Jeopardy “answers” (clues) are often written in obfuscated styles, with puns and many idioms, exactly the sorts of things most natural language systems have had a very hard time with.

Watson’s problem is almost all understanding the question. Looking up obscure facts is not nearly so hard if you have a copy of Wikipedia and other databases on hand, particularly one parsed with other state-of-the-art natural language systems, which is what I presume they have. In fact, one would predict that Watson would do the best on the hardest $2,000 questions because these are usually hard because they refer to obscure knowledge, not because it is harder to understand the question. I expect that an evaluation of its results may show that its performance on hard questions is not much worse than on easy ones. (The main thing that would make easy questions easier would be the large number of articles in its database confirming the answer, and presumably boosting its confidence in its answer.) However, my intuition may be wrong here, in that most of Watson’s problems came on the high-value questions.

It’s confidence is important. If it does not feel confident it doesn’t buzz in. And it has a serious advantage at buzzing in, since you can’t buzz in right away on this game, and if you’re an encyclopedia like the two human champions and Watson, buzzing in is a large part of the game. In fact, a fairer game, which Watson might not do as well at, would involve randomly choosing which of the players who buzz in in the first few tenths of a second gets to answer the question, eliminating any reaction time advantage. Watson gets the questions as text, which is also a bit unfair, unless it is given them one word a time at human reading speed. It could do OCR on the screen but chances are it would read faster than the humans. It’s confidence numbers and results are extremely impressive. One reason it doesn’t buzz in is that even with 3,000 cores it takes 2-6 seconds to answer a question.

Indeed a totally fair contest would not have buzzing in time competition at all, and just allow all players who buzz in to answer an get or lose points based on their answer. (Answers would need to be in parallel.)

Watson’s coders know by now that they probably should have coded it to receive wrong answers from other contestants. In one instance it repeated a wrong answer, and in another case it said “What is Leg?” after Jennings had incorrectly answered “What is missing an arm?” in a question about an Olympic athlete. The host declared that right, but the judges reversed that saying that it would be right if a human who was following up the wrong answer said it, but was a wrong answer without that context. This was edited out. Also edited out were 4 crashes by Watson that made the game take 4 hours instead of 30 minutes.

It did not happen in what aired so far, but in the trials, another error I saw Watson make was declining to answer a request to be more specific on an answer. Watson was programmed to give minimalist answers, which often the host will accept as correct, so why take a risk. If the host doesn’t think you said enough he asks for a more specific answer. Watson sometimes said “I can be no more specific.” From a pure gameplay standpoint, that’s like saying, “I admit I am wrong.” For points, one should say the best longer phrase containing the one-word answer, because it just might be right. Though it has a larger chance of looking really stupid — see below for thoughts on that.

The shows also contain total love-fest pieces about IBM which make me amazed that IBM is not listed as a sponsor for the shows, other than perhaps in the name “The IBM Challenge.” I am sure Jeopardy is getting great ratings (just having their two champs back would do that on its own but this will be even more) but I have to wonder if any other money is flowing.

Being an idiot savant

Watson doesn’t really understand the Jeopardy clues, at least not as a human does. Like so many AI breakthroughs, this result comes from figuring out another way to attack the problem different from the method humans use. As a result, Watson sometimes puts out answers that are nonsense “idiot” answers from a human perspective. They cut back a lot on this by only having it answer when it has 50% confidence or higher, and in fact for most of its answers it has very impressive confidence numbers. But sometimes it gives such an answer. To the consternation of the Watson team, it did this on the Final Jeopardy clue, where it answered “Toronto” in the category “U.S. Cities.”  read more »

My phone should know when I start a trip

Every day I get into my car and drive somewhere. My mobile phone has a lot of useful apps for travel, including maps with traffic and a lot more. And I am usually calling them up.

I believe that my phone should notice when I am driving off from somewhere, or about to, and automatically do some things for me. Of course, it could notice this if it ran the GPS all the time, but that’s expensive from a power standpoint, so there are other ways to identify this:

  • If the car has bluetooth, the phone usually associates with the car. That’s a dead giveaway, and can at least be a clue to start looking at the GPS.
  • Most of my haunts have wireless, and the phone associates with the wireless at my house and all the places I work. So it can notice when it disassociates and again start checking the GPS. To get smart, it might even notice the MAC addresses of wireless networks it can’t see inside the house, but which it does see outside or along my usual routes.
  • Of course moving out to the car involves jostling and walking in certain directions (it has a compass.)

Once it thinks it might be in the car, it should go to a mode where my “in the car” apps are easy to get to, in particular the live map of the location with the traffic displayed, or the screen for the nav system. Android has a “car mode” that tries to make it easy to access these apps, and it should enter that mode.

It should also now track me for a while to figure out which way I am going. Depending on which way I head and the time of day, it can probably guess which of my common routes I am going to take. For regular commuters, this should be a no-brainer. This is where I want it to be really smart: Instead of me having to call up the traffic, it should see that I am heading towards a given highway, and then check to see if there are traffic jams along my regular routes. If it sees one, Then it should beep to signal that, and if I turn it on, I should see that traffic jam. This way if I don’t hear it beep, I can feel comfortable that there is light traffic along the route I am taking. (Or that if there is traffic, it’s not traffic I can avoid with alternate routes.)

This is the way I want location based apps to work. I don’t want to have to transmit my location constantly to the cloud, and have the cloud figure out what to do at any given location. That’s privacy invading and uses up power and bandwidth. Instead the phone should have a daemon that detects location “events” that have been programmed into it, and then triggers programs when those events occur. Events include entering and leaving my house or places I work, driving certain roads and so on.

And yes, for tools like shopkick, they can even be entering stores I have registered. And as I blogged at the very beginning of this blog many years ago, we can even have an event for when we enter a store with a bad reputation. The phone can download a database of places and wireless and Bluetooth MACs that should trigger events, and as such the network doesn’t need to know my exact location to make things happen. But most importantly, I don’t want to have to know to ask if there is something important near me, I want the right important things to tell me when I get near them.

Where will 3-D cameras like Kinect lead?

This year, I bought Microsoft Kinect cameras for the nephews and niece. At first they will mostly play energetic X-box games with them but my hope is they will start to play with the things coming from the Kinect hacking community — the videos of the top hacks are quite interesting. At first, MS wanted to lock down the Kinect and threaten the open source developers who reverse engineered the protocol and released drivers. Now Microsoft has official open drivers.

This camera produced a VGA colour video image combined with a Z (depth) value for each pixel. This makes it trivial to isolate objects in the view (like people and their hands and faces) and splitting foreground from background is easy. The camera is $150 today (when even a simple one line LIDAR cost a fortune not long ago) and no doubt cameras like it will be cheap $30 consumer items in a few years time. As I understand it, the Kinect works using a mixture of triangulation — the sensor being in a different place from the emitter — combined with structured light (sending out arrays of dots and seeing how they are bent by the objects they hit.) An earlier report that it used time-of-flight is disputed, and implies it will get cheaper fast. Right now it doesn’t do close up or very distant, however. While projection takes power, meaning it won’t be available full time in mobile devices, it could still show up eventually in phones for short duration 3-D measurement.

I agree with those that think that something big is coming from this. Obviously in games, but also perhaps in these other areas.

Gestural interfaces and the car

While people have already made “Minority Report” interfaces with the Kinect, studies show these are not very good for desktop computer use — your arms get tired and are not super accurate. They are good for places where your interaction with the computer will be short, or where using a keyboard is not practical.

One place that might make sense is in the car, at least before the robocar. Fiddling with the secondary controls in a car (such as the radio, phone, climate system or navigation) is always a pain and you’re really not supposed to look at your hands as you hunt for the buttons. But taking one hand off the wheel is OK. This can work as long as you don’t have to look at a screen for visual feedback, which is often the case with navigation systems. Feedback could come by audio or a heads up display. Speech is also popular here but it could be combined with gestures.

A Gestural interface for the TV could also be nice — a remote control you can’t ever misplace. It would be easy to remember gestures for basic functions like volume and channel change and arrow keys (or mouse) in menus. More complex functions (like naming shows etc.) are best left to speech. Again speech and gestures should be combined in many cases, particularly when you have a risk that an accidental gesture or sound could issue a command you don’t like.

I also expect gestures to possibly control what I am calling the “4th screen” — namely an always-on wall display computer. (The first 3 screens are Computer, TV and mobile.) I expect most homes to eventually have a display that constantly shows useful information (as well as digital photos and TV) and you need a quick and unambiguous way to control it. Swiping is easy with gesture control so being able to just swipe between various screens (Time/weather, transit arrivals, traffic, pending emails, headlines) might be nice. Again in all cases the trick is not being fooled by accidental gestures while still making the gestures simple and easy.

In other areas of the car, things like assisted or automated parking, though not that hard to do today, become easier and cheaper.

Small scale robotics

I expect an explosion in hobby and home robotics based on these cameras. Forget about Roombas that bump into walls, finally cheap robots will be able to see. They may not identify what they see precisely, though the 3D will help, but they won’t miss objects and will have a much easier time doing things like picking them up or avoiding them. LIDARs have been common in expensive robots for some time, but having it cheap will generate new consumer applications.

Mobile

There will be some gestural controls for phones, particularly when they are used in cars. I expect things to be more limited here, with big apps to come in games. However, history shows that most of the new sensors added to mobile devices cause an explosion of innovation so there will be plenty not yet thought of. 3-D maps of areas (particularly when range is longer which requires power) can also be used as a means of very accurate position detection. The static objects of a space are often unique and let you figure out where you are to high precision — this is how the Google robocars drive.

Security & facial recognition

3-D will probably become the norm in the security camera business. It also helps with facial recognition in many ways (both by isolating the face and allowing its shape to play a role) and recognition of other things like gait, body shape and animals. Face recognition might become common at ATMs or security doors, and be used when logging onto a computer. It also makes “presence” detection reliable, allowing computers to see how and where people are in a room and even a bit of what they are doing, without having to object recognition. (Though as the kinect hacks demonstrate, they help object recognition as well.)

Face recognition is still error-prone of course so its security uses will be initially limited, but it will get better at telling among people.

Virtual worlds & video calls

While some might view this as gaming, we should also see these cameras heavily used in augmented reality and virtual world applications. It makes it easy to insert virtual objects into a view of the physical world and have a good sense of what’s in front and what’s behind. In video calling, the ability to tell the person from the background allows better compression, as well as blanking of the background for privacy. Effectively you get a “green screen” without the need for a green screen.

You can also do cool 3-D effects by getting an easy and cheap measurement of where the viewer’s head is. Moving a 3-D viewpoint in a generated or semi-generated world as the viewer moves her head creates a fun 3-D effect without glasses and now it will be cheap. (It only works for one viewer, though.) Likewise in video calls you can drop the other party into a different background and have them move within it in 3-D.

With multiple cameras it is also possible to build a more complete 3-D model of an entire scene, with textures to paint on it. Any natural scene can suddenly become something you can fly around.

Amateur video production

Some of the above effects are already showing up on YouTube. Soon everybody will be able to do it. The Kinect’s firmware already does “skeleton” detection, to map out the position of the limbs of a person in the view of the camera. That’s good for games but also allows motion capture for animation on the cheap. It also allows interesting live effects distorting the body or making light sabres glow. Expect people in their own homes to be making their own Avatar like movies, at least on a smaller scale.

These cameras will become so popular we may need to start worrying about interference by their structured light. These are apps I thought of in just a few minutes. I am sure there will be tons more. If you have something cool to imagine, put it in the comments.

Happy Seasons to all! and a Merry New Year.

I'm loving the Shweeb concept

There was a bit of a stir when Google last week announced that one of the winners of their 10^100 contest would be Shweeb, a pedal-powered monorail from New Zealand that has elements of PRT. Google will invest $1M in Shweeb to help them build a small system, and if it makes any money on the investment, that will go into transportation related charities.

While I had a preference that Google fund a virtual world for developing and racing robocars I have come to love a number of elements about Shweeb, though it’s not robocars and the PRT community seems to not think it’s PRT. I think it is PRT, in that it’s personal, public and, according to the company, relatively rapid through the use of offline stations and non-stop point to point trips. PRT is an idea from the sixties that makes sense but has tried for almost 50 years to get transit planners to believe in it and build it. A micro-PRT has opened as a Heathrow parking shuttle, but in general transit administrators simply aren’t early adopters. They don’t innovate.

What impresses me about Shweeb is its tremendous simplicity. While it’s unlikely to replace our cars or transit systems, it is simple enough that it can actually be built. Once built, it can serve as a testbed for many of PRT’s concepts, and go through incremental improvements.  read more »

Using the phone as its own mouse, and trusting the keyboard

I’ve written a bunch about my desire to be able to connect an untrusted input device to my computer or phone so that we could get hotels and other locations to offer both connections to the HDTVs in the rooms for monitors and a usable keyboard. This would let one travel with small devices like netbooks, tablet computers and smart phones yet still use them for serious typing and UI work while in the hotel or guest area.

I’ve proposed that the connection from device to the monitor be wireless. This would make it not very good for full screen video but it would be fine for web surfing, email and the like. This would allow us to use the phone as its own mouse, either by having a deliberate mouse style sensor on the back, or using the camera on the back of the phone as a reader of the surface. (A number of interesting experiments have shown this is quite doable if the camera can focus close and can get an LED to light up the surface.) This provides a mouse which is more inherently trustable, and buttons on the phone (or on its touchscreen) can be the mouse buttons. This doesn’t work for tablets and netbooks — for them you must bring your own mini-mouse or use the device as a touchpad. I am still a fan of the “trackpoint” nubbins and they can also make very small but usable mice.

The keyboard issue is still tough. While it would seem a wired connection is more secure, not all devices will be capable of such a connection, while almost all will do bluetooth. Wired USB connections can pretend to be all sorts of devices, including CD-Roms with autorun CDs in them. However, I propose the creation of a new bluetooth HID profile for untrusted keyboards.

When connecting to an untrusted keyboard, the system would need to identify any privileged or dangerous operations. If such operations (like software downloads, destructive commands etc.) come from the keyboard, the system would insist on confirmation from the main device’s touchscreen or keyboard. So while you would be able to type on the keyboard to fill text boxes or write documents and emails, other things would be better done with the mouse or they would require a confirmation on the screen. Turns out this is how many people use computers these days anyway. We command line people would feel a bit burdened but could create shells that are good at spotting commands that might need confirmation.  read more »

An open source licence for FOSS platforms only

Here’s a suggestion that will surely rankle some in the free software/GPL community, but which might be of good benefit to the overall success of such systems.

What I propose is a GPL-like licence under which source code could be published, but which forbids effectively one thing: Work to make it run on proprietary operating systems, in particular Windows and MacOS.

The goal would be to allow the developers of popular programs for Windows, in particular, to release their code and allow the FOSS community to generate free versions which can run on Linux, *BSD and the like. Such companies would do this after deciding that there isn’t enough market on those platforms to justify a commercial venture in the area. Rather than, as Richard Stallman would say, “hoarding” their code, they could release it in this fashion. However, they would not fear they were doing much damage to their market on Windows. They would have to accept that they were disclosing their source code to their competitors and customers, and some companies fear that and will never do this. But some would, and in fact some already have, even without extra licence protection.

An alternate step would be to release it specifically so the community and make sure the program runs under WINE, the Windows API platform for Linux and others. Many windows programs already run under WINE, but almost all of them have little quirks and problems. If the programs are really popular, the WINE team patches WINE to deal with them, but it would be much nicer if the real program just got better behaved. In this case, the licence would have some rather unusual terms, in that people would have to produce versions and EXEs that run only under WINE — they would not run on native Windows. They could do this by inserting calls to check if they are running on WINE and aborting, or they could do something more complex like make use of some specific APIs added to WINE that are not found in Windows. Of course, coders could readily remove these changes and make binaries that run on Windows natively, but coders can also just pirate the raw Windows binaries — both would be violations of copyright, and the latter is probably easier to do.  read more »

A BIOS and OS designed for very fast booting (and aborting)

We all know how annoying it is that today’s much faster computers take such a long time to boot, and OS developers are working on speeding it up. Some time ago I proposed a defragmenter that notice what blocks were read in what order at boot and put the contiguous on the disk. I was told that experiments with this had not had much success, but more recently I read reports of how the latest Linux distributions will boot as much a 3 times faster on solid state disks as on rotating ones. There are some SSDs with performance that high (and higher) but typical ones range more in the 120 mb/second rate, better than 80 mb/second HDDs but getting more wins from the complete lack of latency.

However, today I want to consider something which is a large portion of the boot time, namely the power-on-self-test or “POST.” This is what the BIOS does before it gets ready to load the real OS. It’s important, but on many systems is quite slow.

I propose an effort to make the POST multitask with the loading of the real OS. Particularly on dual-core systems, this would be done by having one core do the POST and other BIOS (after testing all the cores of course) and other cores be used by the OS for loading. There are ways to do all this with one core I will discuss below, but this one is simple and almost all new computers have multiple cores.

Of course, the OS has to know it’s not supposed to touch certain hardware until after the BIOS is done initializing it and testing it. And so, there should be BIOS APIs to allow the OS to ask about this and get events as BIOS operations conclude. The OS, until given ownership of the screen, would output its status updates to the screen via a BIOS call. In fact, it would do that with all hardware, though the screen, keyboard and primary hard disk are the main items. When I say the OS, I actually mean both the bootloader that loads the OS and the OS itself once it is handed off to.

Next, the BIOS should, as soon as it has identified that the primary boot hard disks are ready, begin transferring data from the boot area into RAM. Almost all machines have far more RAM than they need to boot, and so pre-loading all blocks needed for boot into a cache, done in optimal order, will mean that by the time the OS kernal takes over, many of the disk blocks it will want to read will already be sitting in ram. Ideally, as I noted, the blocks should have been pre-stored in contiguous zones on the disk by an algorithm that watched the prior boots to see what was accessed and when.

Indeed, if there are multiple drives, the pre-loader could be configured to read from all of them, in a striping approach. Done properly, a freshly booted computer, once the drives had spun up, would start reading the few hundred megabytes of files it needs to boot from multiple drives into ram. All of this should be doable in just a few seconds on RAID style machines, where 3 disks striped can deliver 200mb/second or more of disk read performance. But even on a single drive, it should be quick. It would begin with the OS kernel and boot files, but then pre-cache all the pages from files used in typical boots. For any special new files, only a few seeks will be required after this is done.  read more »

Towards frameless (clockless) video

Recently I wrote about the desire to provide power in every sort of cable in particular the video cable. And while we’ll be using the existing video cables (VGA and DVI/HDMI) for some time to come, I think it’s time to investigate new thinking in sending video to monitors. The video cable has generally been the highest bandwidth cable going out of a computer though the fairly rare 10 gigabit ethernet is around the speed of HDMI 1.3 and DisplayPort, and 100gb ethernet will be yet faster.

Even though digital video methods are so fast, the standard DVI cable is not able to drive my 4 megapixel monitor — this requires dual-link DVI, which as the name suggests, runs 2 sets of DVI wires (in the same cable and plug) to double the bandwidth. The expensive 8 megapixel monitors need two dual-link DVI cables.

Now we want enough bandwidth to completely redraw a screen at a suitable refresh (100hz) if we can get it. But we may want to consider how we can get that, and what to do if we can’t get it, either because our equipment is older than our display, or because it is too small to have the right connector, or must send the data over a medium that can’t deliver the bandwidth (like wireless, or long wires.)

Today all video is delivered in “frames” which are an update of the complete display. This was the only way to do things with analog rasterized (scan line) displays. Earlier displays actually were vector based, and the computer sent the display a series of lines (start at x,y then draw to w,z) to draw to make the images. There was still a non-fixed refresh interval as the phosphors would persist for a limited time and any line had to be redrawn again quickly. However, the background of the display was never drawn — you only sent what was white.

Today, the world has changed. Displays are made of pixels but they all have, or can cheaply add, a “frame buffer” — memory containing the current image. Refresh of pixels that are not changing need not be done on any particular schedule. We usually want to be able to change only some pixels very quickly. Even in video we only rarely change all the pixels at once.

This approach to sending video was common in the early remote terminals that worked over ethernet, such as X windows. In X, the program sends more complex commands to draw things on the screen, rather than sending a complete frame 60 times every second. X can be very efficient when sending things like text, as the letters themselves are sent, not the bitmaps. There are also a number of protocols used to map screens over much slower networks, like the internet. The VNC protocol is popular — it works with frames but calculates the difference and only transmits what changes on a fairly basic level.

We’re also changing how we generate video. Only video captured by cameras has an inherent frame rate any more. Computer screens, and even computer generated animation are expressed as a series of changes and movements of screen objects though sometimes they are rendered to frames for historical reasons. Finally, many applications, notably games, do not even work in terms of pixels any more, but express what they want to display as polygons. Even videos are now all delivered compressed by compressors that break up the scene into rarely updated backgrounds, new draws of changing objects and moves and transformations of existing ones.

So I propose two distinct things:

  1. A unification of our high speed data protocols so that all of them (external disks, SAN, high speed networking, peripheral connectors such as USB and video) benefit together from improvements, and one family of cables can support all of them.
  2. A new protocol for displays which, in addition to being able to send frames, sends video as changes to segments of the screen, with timestamps as to when they should happen.

The case for approach #2 is obvious. You can have an old-school timed-frame protocol within a more complex protocol able to work with subsets of the screen. The main issue is how much complexity you want to demand in the protocol. You can’t demand too much or you make the equipment too expensive to make and too hard to modify. Indeed, you want to be able to support many different levels, but not insist on support for all of them. Levels can include:

  • Full frames (ie. what we do today)
  • Rastered updates to specific rectangles, with ability to scale them.
  • More arbitrary shapes (alpha) and ability to move the shapes with any timebase
  • VNC level abilities
  • X windows level abilities
  • Graphics card (polygon) level abilities
  • In the unlikely extreme, the abilities of high level languages like display postscript.

I’m not sure the last layers are good to standardize in hardware, but let’s consider the first few levels. When I bought my 4 megapixel (2560x1600) monitor, it was annoying to learn that none of my computers could actually display on it, even at a low frame rate. Technically single DVI has the bandwidth to do it at 30hz but this is not a desirable option if it’s all you ever get to do. While I did indeed want to get a card able to make full use, the reality is that 99.9% of what I do on it could be done over the DVI bandwith with just the ability to update and move rectangles, or to do so at a slower speed. The whole screen never is completely replaced in a situation where waiting 1/30th of a second would not be an issue. But the ability to paint a small window at 120hz on displays that can do this might well be very handy. Adoption of a system like this would allow even a device with a very slow output (such as USB 2 at 400mhz) to still use all the resolution for typical activities of a computer desktop. While you might think that video would be impossible over such a slow bus, if the rectangles could scale, the 400 megabit bus could still do things like paying DVDs. While I do not suggest every monitor be able to decode our latest video compression schemes in hardware, the ability to use the post-compression primatives (drawing subsections and doing basic transforms on them) might well be enough to feed quite a bit of video through a small cable.

One could imagine even use of wireless video protocols for devices like cell phones. One could connect a cell phone with an HDTV screen (as found in a hotel) and have it reasonably use the entire screen, even though it would not have the gigabit bandwidths needed to display 1080p framed video.

Sending in changes to a screen with timestamps of when they should change also allows the potential for super-smooth movement on screens that have very low latency display elements. For example, commands to the display might involve describing a foreground object, and updating just that object hundreds of times a second. Very fast displays would show those updates and present completely smooth motion. Slower displays would discard the intermediate steps (or just ask that they not be sent.) Animations could also be sent as instructions to move (and perhaps rotate) a rectangle and do it as smoothly as possible from A to B. This would allow the display to decide what rate this should be done at. (Though I think the display and video generator should work together on this in most cases.)

Note that this approach also delivers something I asked for in 2004 — that it be easy to make any LCD act as a wireless digital picture frame.

It should be noted that HDMI supports a small amount of power (5 volts at 50ma) and in newer forms both it and DisplayPort have stopped acting like digitized versions of analog signals and more like highly specialized digital buses. Too bad they didn’t go all the way.

Protocol

As noted, it is key the basic levels be simple to promote universal adoption. As such, the elements in such a protocol would start simple. All commands could specify a time they are to be executed if it is not immediate.

  • Paint line or rectangle with specified values, or gradient fill.
  • Move object, and move entire screen
  • Adjust brightness of rectangle (fade)
  • Load pre-buffered rectangle. (Fonts, standard shapes, quick transitions)
  • Display pre-buffered rectangle

However, lessons learned from other protocols might expand this list slightly.

One connector?

This, in theory allows the creation of a single connector (or compatible family of connectors) for lots of data and lots of power. It can’t be just one connector though, because some devices need very small connectors which can’t handle the number of wires others need, or deliver the amount of power some devices need. Most devices would probably get by with a single data wire, and ideally technology would keep pushing just how much data can go down that wire, but any design should allow for simply increasing the number of wires when more bandwidth than a single wire can do is needed. (Presumably a year later, the same device would start being able to use a single wire as the bandwidth increases.) We may, of course, not be able to figure out how to do connectors for tomorrow’s high bandwidth single wires, so you also want a way to design an upwards compatible connector with blank spaces — or expansion ability — for the pins of the future, which might well be optical.

There is also a security implication to all of this. While a single wire that brings you power, a link to a video monitor, LAN and local peripherals would be fabulous, caution is required. You don’t want to be able to plug into a video projector in a new conference room and have it pretend to be a keyboard that takes over your computer. As this is a problem with USB in general, it is worth solving regardless of this. One approach would be to have every device use a unique ID (possibly a certified ID) so that you can declare trust for all your devices at home, and perhaps everything plugged into your home hubs, but be notified when a never seen device that needs trust (like a keyboard or drive) is connected.

To some extent having different connectors helps this problem a little bit, in that if you plug an ethernet cable into the dedicated ethernet jack, it is clear what you are doing, and that you probably want to trust the LAN you are connecting to. The implicit LAN coming down a universal cable needs a more explicit approval.

Final rough description

Here’s a more refined rough set of features of the universal connector:

  • Shielded twisted pair with ability to have varying lengths of shield to add more pins or different types of pins.
  • Asserts briefly a low voltage on pin 1, highly current limited, to power negotiator circuit in unpowered devices
  • Negotiator circuits work out actual power transfer, at what voltages and currents and on what pins, and initial data negotiation about what pins, what protocols and what data rates.
  • If no response is given to negotiation (ie. no negotiator circuit) then measure resistance on various pins and provide specified power based on that, but abort if current goes too high initially.
  • Full power is applied, unpowered devices boot up and perform more advanced negotiation of what data goes on what pins.
  • When full data handshake is obtained, negotiate further functions (hubs, video, network, storage, peripherals etc.)
Syndicate content