Selling ads on URLs

Recently, Lauren Weinstein posted a query for a way to bring a certain type of commentary on web sites to the web. In particular, he’s interested in giving people who are the subject of attack web sites, who may even have gotten court judgments against such web sites to inform people of the dispute by annotations that show up when they search in search engines.

I’m not sure this is a good idea for a number of reasons. I like the idea of being able to see 3rd party commentary on web sites (such as Third Voice and others have tried to do) and suspect the browser is a better place than the search engine for it. I don’t like putting any duty upon people who simply link to web sites (which is what search engines do) because the sites are bad. They may want to provide extra info on what they link to as a service to users, but that’s up to them and should be unless they are a monopoly.

In addition, putting messages with an agenda next to search results is what search engines do for a living. However, in that may be the answer.  read more »

HTTP headers to indictate side-effects of forms

You’ve all seen it many times. You hit the ‘back’ button and the browser tells you it has to resubmit a form, which may be dangerous, in order to go back. A lot of the blame for this I presume lies on pages not setting suitable cache TTLs on pages served by forms, but I think we could be providing more information here, even with an accurate cache note.

I suggest that when responding to a form POST, the HTTP response should be able to indicate how safe it is to re-post the form, effectively based on what side-effects (other than returning a web page) posting the form had. There are forms that are totally safe to re-POST, and the browser need not ask the user about it, instead treating them more like they do a GET.

(Truth be told, the browser should not really treat GET and POST differently, my proposed header would be a better way to do it on both of them.)

The page could report that the side effects are major (like completing a purchase, or launching an ICBM) and thus that re-posting should be strongly warned against. The best way to do this would be a string, contained in the header or in the HTML so the browser can say, “This requires resubmitting the form which will ” for example.

This is, as noted, independent of whether the results will be the same, which is what the cache is for. A form that loads a webcam has no side effects, but returns a different result every time that should not be cached.

We could also add some information on the Request, telling the form that it has been re-posted from saved values rather than explicit user input. It might then decide what to do. This becomes important when the user has re-posted without having received a full response from the server due to an interruption or re-load. That way the server can know this happened and possibly get a pointer to the prior attempt.

In addition, I would not mind if the query on the back button about form repost offered me the ability to just see the expired cache material, since I may not want the delay of a re-post.

With this strategy in mind, it also becomes easier to create the deep bookmarks I wrote of earlier, with less chance for error.

Some possible levels of side-effects could be None, Minor, Major and Forbidden. The tag could also appear as an HTML attribute to the form itself, but then it can’t reveal things that can only be calculated after posting, such as certain side effects.

Selection of search engine by text in search box

Most search engines now have a search box in the toolbar, which is great, and like most people mine defaults to Google. I can change the engine with a drop down menu to other places, like Amazon, Wikipedia, IMDB, eBay, Yahoo and the like. But that switch is a change in the default, rather than a temporary change — and I don’t want that, I want it to snap back to Google.

However, I’ve decided I want something even more. I’ll make a plea to somebody who knows how to do firefox add-ons to make a plug-in so I can chose my search engine with some text in the query I type. In other words, if I go to the box (which defaults to Google) I could type “w: foobar” to search Wikipedia, and “e: foobar” to search eBay and so on. Google in fact uses a syntax with keyword and colon to trigger special searches, though it tends not to use one letter. If this bothers people, something else like a slash could be used. While it would not be needed, “g: foobar” would search on Google, so “g: w: foobar” would let you search for “w: foobar” on Google. The actual syntax of the prefix string is something the user could set, or it could be offered by the XML that search engine entries are specified with.

Why is this the right answer? It’s no accident that Google uses this. They know. Whatever your thoughts on the merits of command line interfaces and GUIs, things often get worse when you try to mix them. Once you have me typing on the keyboard, I should be able to set everything from the keyboard. I should not be forced to move back and forth from keyboard to pointing device if I care to learn the keyboard interface. You can have the GUI for people who don’t remember, but don’t make it be the only route.

What’s odd is that you can do this from the Location bar and not the search bar. In Firefox, go to any search engine, and right click on the search box. Select “Add a Keyword for this Search” and this lets you create a magic bookmark which you can stuff anywhere, whose real purpose is not to be a bookmark, but a keyword you can use to turn your URL box into a search box that is keyword driven.

You don’t really even need the search box, which makes me wonder why they did it this way.

Interview with me on Web 2.0 and privacy (and a French/German documentary)

While I was at Tim O’Reilly’s Web 2.0 Expo, I did an interview with an online publication called Web Pro News. I personally prefer written text to video blogging, but for those who like to see video, you can check out:

Video Interview on Privacy and Web 2.0

The video quality is pretty good, if not the lighting.

The main focus was to remind people that as we return to timesharing, which is to say, move our data from desktop applications to web based applications, we must be aware that putting our private data in the hands of 3rd parties gives it less constitutional protection. We’re effectively erasing the 4th Amendment.

I also talk about hints at an essay I am preparing on the evils of user-controlled identity management software. And my usual rant about thinking about how you would design software if you were living in China or Saudi Arabia.

I also was interviewed some time ago about Google and other issues by a French/German channel. That’s a 90 minute long program entitled Faut-il avoir peur de Google ? (Should we fear Google). It’s also available in German. It was up for free when I watched it, but it may now require payment. (I only appear for a few minutes, my voice dubbed over.)

When I was interviewed for this I offered to, with some help, speak in French. I am told I have a pretty decent accent, though I no longer have the vocabulary to speak conversationally in French. I thought it would be interesting if they helped me translate and then I spoke my words in French (perhaps even dubbing myself later if need be.) They were not interested since they also had to do German.

Another video interview by a young French documentarian producing a show called Mix-Age Beta can be found here. The lighting isn’t good, but this time it’s in English. It’s done under the palm tree in my back yard.

Where's a good shared calendar?

I really wish I could find a really good calendaring tool. I’ve seen many of the features I want scattered in various tools, though some are nowhere to be found. I thought it would be good to itemize some of them. I’m mostly interested in *nix — I know that on Windows, MS Outlook is the most common choice, with Exchange for sharing.  read more »

Renting out eBay feedback to first-time sellers

An eBay reputation is important if you’re going to sell there. Research shows it adds a decent amount to the price, and it’s very difficult to sell at all with just a few feedbacks. Usually sellers will buy a few items first to get a decent feedback — sometimes even scam items sold just for feedback. Because savvy buyers insist on selling feedback, it’s harder, and sometimes sellers will also sell bogus items just for feedback as a seller. eBay has considered offering a feedback score based on the dollar volume of positive and negative transactions but has not yet done this. Some plugins will do that.

One thing I recommend to low feedback sellers it to offer to reverse the “normal” payment system. If the seller has little feedback and the buyer has much better feedback, the seller should send the item without payment, and the buyer pay on receipt. Many people find this foreign but in fact it makes perfect sense. In real stores you don’t pay until you get the item, and many big reputation merchants allow payment on credit for known buyers. Another idea is to offer to pay for escrow. This costs money, but will make it back in higher sale prices.

However, here’s a new idea. Allow high-reputation sellers to “lease out” feedback, effectively acting as a co-signer. This means they vouch for the brand new seller. If the new seller gets a negative feedback on the transaction, it goes on both the new seller’s feedback and the guarantor’s. Positive feedback goes on the seller and possibly into a special bucket on the guarantor’s. The guarantor would also get to be involved in any disputes.

Seems risky, and because of that, guarantors would only do this for people they trusted well, or who paid them a juicy bond, which is the whole point of the idea. Guarantors would probably use bonds to issue refunds to badly treated customers to avoid a negative, though you want to be careful about blackmail risks. It’s possible the breakdown of true and as-guarantor negatives might be visible on a guarantor if you look deep, but the idea is the guarantor should be strongly motivated to keep the new seller in line.

With lendable reputation, new sellers could start pleasing customers and competing from day one.

Why do most online discussion packages suck so badly?

Yesterday I attended the online community session of Web2Open, a barcamp-like meeting going on within Tim O’Reilly’s Web 2.0 Expo. (The Expo has a huge number of attendees, it’s doing very well.)

I put forward a number of questions I’ve been considering for later posts, but one I want to make here is this: Where has the innovation been in online discussion software? Why are most message boards and blog comment systems so hard to use?

I know this is true because huge numbers of people are still using USENET, and not just for downloading binaries. USENET hasn’t seen much technical innovation since the 80s. As such, it’s aging, but it shouldn’t be simply aging, it should have been superseded long ago. We’ve gone through a period of tremendous online innovation in the last few decades, unlike any in history. Other old systems, like the Well, continue to exist and even keep paying customers in spite of minimal innovation. This is like gopher beating Firefox, or a CD Walkman being superior in some ways to an iPod. It’s crazy. (The users aren’t crazy, it’s the fact that their choice is right that’s crazy.)  read more »

Transit clock for local shops and cafes

In many cities, the transit systems have GPS data on the vehicles to allow exact prediction of when trains and buses will arrive at stops. This is quite handy if you live near a transit line, and people are working on better mobile interfaces for them, but it's still a lot harder to use them at a remote location.

It would be nice to have a small internet appliance for shops, cafes and other hangouts that are short walks from transit stops. The appliance would be programmed with the typical walking time to the stop, and of course which stop to track. It would then display, on a small screen when a vehicle was coming, and how much time you had before you could walk easily, and then before you could run and make the train or bus.

Failing the live GPS data it could just work on schedules. It might make a low-key but audible noise as well. It need not have its own screen, if the place has a TV already it could do an overlay on that, though flat panel screens are now only about $100.

Some transit lines have placed expensive outdoor "next bus" signs on their stops and shelters for these systems, which is great, but in fact it might make more sense to put an appliance like this behind a local shop window, where it doesn't need to be outdoor rated, and pay the shopowner or local homeowner.

To turn this into a moneymaker, it could be combined with a system to sell transit tickets (presumably through the cash register.) This is a win for the transit system, since transit lines without controlled stations waste a lot of time as the driver collects change and tickets as people get on. People with a pre-paid, pre-timestamped ticket can get on quickly and don't need a transfer. This even works for systems with distance based pricing. I have often wondered why you don't see more selling of transit tickets at the shops around stops in order to save this delay. SF Muni went to "proof of purchase" instead of driver collected tickets so they could put ticket machines at busy stops to save the driver time, but they aren't everywhere.

For a cafe, it's a nice thing to do for customers, and even makes them more willing to stay, safe in the knowledge they can get their vehicle efficiently. A taxi-summoning function could also be added (press a button on the box to call a taxi) which could, in theory, also predict when the taxi will arrive since many of them have GPS networks now.

An airliner mesh network over the oceans

A friend (Larry P.) once suggested to me that he thought you could build a rural mobile phone much cheaper than Iridium network by putting nodes in all the airliners flying over the country. The airliners have power, and have line of sight to ground stations, and to a circle of about 200 miles radius around them. That’s pretty big (125,000 square miles) and in fact most locations will be within sight of an airliner most of the time. Indeed, the airlines already would like to have high speed data links to their planes to sell to the passengers, and relaying to people on the ground makes sense. It would not be a 100% on network, but that’s OK for many users. Phones would be able to warn about outages with plenty of advance notice to handle conversations, and indeed based on live computerized data from the air traffic control system, phones could even display a list of the times they would be connected.

I was thinking more about this in the context of InMarSat, which provides satellite services to ships and planes in the deep ocean. It uses geosynchronous satellites and auto-aiming dishes, but is quite expensive. Few people launch satellites to have footprints over the ocean.

Airliners fly so often these days, spaced often just 40 miles apart along the oceanic routes. It should be possible with modern technology to produce a mesh network that transmits data from plane to plane using line of sight. Two planes should in theory be able to get line of sight at 30,000 feet if they are up to 400 nautical miles apart. The planes could provide data and voice service for passengers at a reasonable price, and also could relay for ships at sea and even remote locations.

One can also use lower bands that can go further, since there is no spectrum competition over the the open ocean, but I suspect planes don’t spend too much time more than 400 miles from any other airliner (or 200 miles from any land station.) In the high bands many megabits of data bandwidth are available, and in theory spectrum allocation is not an issue when out of sight of land, so even hundreds of megabits would be possible. (We would of course not transmit on any band actually in use out there, and could even make a cognitive radio system which detects other users and avoids those bands.) An airline could offer just this service, or at a higher price switch to satellite in the few dead zones — which again, it should be able to predict with some accuracy. Aiming should be easy, since the aircraft all transmit their GPS coordinates regularly on transponder frequencies and can also do so in the data network. In fact, you would be able to know where a new mesh partner will be approaching, and where to point, before you could ever detect it with an omnidirectional antenna. And people could be given enough bandwidth for real internet, including voice. (Though that still means they should perhaps go to a phone lounge to have long conversations.)

Of course, I often find transoceanic flights one of the rare times I get work done without the distraction of the internet, so this could also be a terrible idea.

Some technical notes: Jim Thompson points out that doppler effects make this particularly challenging, which is an issue. I believe that since we know the exact vector of ourselves and the other aircraft, and we have many more bands at our disposal, this should be a tractable problem.

It's OK, the internet will scale fine

I’ve been seeing a lot of press lately worrying that the internet won’t be able to handle the coming video revolution, that as more and more people try to get their TV via the internet, it will soon reach a traffic volume we don’t have capacity to handle. (Some of this came from a Google TV exec’s European talk, though Google has backtracked a bit on that.)

I don’t actually believe that, even given the premise behind that statement, which is traditional centralized download from sites like Youtube or MovieLink. I think we have the dark fiber and other technology already in place, with terabits over fiber in the lab, to make this happen.

However, the real thing that they’re missing is that we don’t have to have that much capacity. I’m on the board of Bittorrent Inc., which was created to commercialize the P2P file transfer technology developed by its founder, and Monday we’re launching a video store based on that technology. But in spite of the commercial interest I may have in this question, my answer remains the same.

The internet was meant to be a P2P network. Today, however, most people do download more than they upload, and have a connection which reflects this. But even with the reduced upload capacity of home broadband, there is still plenty of otherwise unused upstream sitting there ready. That’s what Bittorrent and some other P2P technologies do — they take this upstream bandwidth, which was not being used before, and use it to feed a desired file to other people wishing to download the file. It’s a trade, so you do it from others and they do it for you. It allows a user with an ordinary connection to publish a giant file where this would otherwise be impossible.

Yes, as the best technology for publishing large files on the cheap, it does get used by people wanting to infringe copyrights, but that’s because it’s the best, not because it inherently infringes. It also has a long history of working well for legitimate purposes and is one of the primary means of publishing new linux distros today, and will be doing hollywood major studio movies Feb 26.

Right now the clients connect with whoever they can connect with, but they favour other clients that send them lots of stuff. That makes a bias towards other clients to whom there is a good connection. While I don’t set the tech roadmap for the company, I have expectations that over time the protocol will become aware of network topology, so that it does an even better job of mostly peering with network neighbours. Customers of the same ISP, or students at the same school, for example. There is tons of bandwidth available on the internal networks of ISPs, and it’s cheap to provide there. More than enough for everybody to have a few megabits for a few hours a day to get their HDTV. In the future, an ideal network cloud would send each file just once over any external backbone link, or at most once every few days — becoming almost as efficient as multicasting.

(Indeed, we could also make great strides if we were to finally get multicasting deployed, as it does a great job of distributing the popular material that still makes up most of the traffic.)

So no, we’re not going to run out. Yes, a central site trying to broadcast the Academy Awards to 50 million homes won’t be able to work. And in fact, for cases like that, radio broadcasting and cable (or multicasting) continue to make the most sense. But if we turn up the upstream, there is more than enough bandwidth to go around within every local ISP network. Right now most people buy aDSL, but in fact it’s not out the question that we might see devices in this area move to being soft-switchable as to how much bandwidth they do up and and how much down, so that if upstream is needed, it can be had on demand. It doesn’t really matter to the ISP — in fact since most users don’t do upstream normally they have wasted capacity out to the network unless they also do hosting to make up for it.

There are some exceptions to this. In wireless ISP networks, there is no up and downstream, and that’s also true on some ethernets. For wireless users, it’s better to have a central cache just send the data, or to use multicasting. But for the wired users it’s all 2-way, and if the upstream isn’t used, it just sits there when it could be sending data to another customer on the same DSLAM.

So let’s not get too scared. And check out the early version of bittorrent’s new entertainment store and do a rental download (sadly only with Windows XP based DRM, sigh — I hope for the day we can convince the studios not to insist on this) of multiple Oscar winner “Little Miss Sunshine” and many others.

When should a password be strong

If you’re like me, you select special unique passwords for the sites that count, such as banks, and you use a fairly simple password for things like accounts on blogs and message boards where you’re not particularly scared if somebody learns the password. (You had better not be scared, since most of these sites store your password in the clear so they can mail it to you, which means they learn your standard account/password and could pretend to be you on all the sites you duplicate the password on.) There are tools that will generate a different password for every site you visit, and of course most browsers will remember a complete suite of passwords for you, but neither of these work well when roaming to an internet cafe or friend’s house.

However, every so often you’ll get a site that demands you use a “strong” password, requiring it to be a certain length, to have digits or punctuation, spaces and mixed case, or subsets of rules like these. This of course screws you up if the site is an unimportant site and you want to use your easy to remember password, you must generate a variant of it that meets their rules and remember it. These are usually sites where you can’t imagine why you want to create an account in the first place, such as stores you will shop at once, or blogs you will comment on once and so on.

Strong passwords make a lot of sense in certain situations, but it seems some people don’t understand why. You need a strong password in case it is possible or desireable for an attacker to do a “dictionary” attack on your account. This means they have to try thousands, or even millions of passwords until they hit the one that works. If you use a dictionary word, they can try the most common words in the dictionary and learn your password.  read more »

Social networking sites -- accept you won't be the only one, and start interoperating.

So many social networking sites (LinkedIn, Orkut, Friendster, Tribe, Myspace etc.) seem bent on being islands. But there can’t be just one player in this space, not even one player in each niche. But when you join a new one it’s like starting all over again. I routinely get invitations to join new social applications, and I just ignore them. It’s not worth the effort.

At some point, 2 or more of the medium sized ones should realize that the way to beat #1 is to find a way to join forces. To make it possible on service A to tie to a friend on service B, and to get almost all the benefits you would have if both people were on the same service. Then you can pick a home service, and link to people on their home services.

This is a tall order, especially while protecting highly private information. It is not enough to simply define a file format, like the FOAF format, for transporting data from one service to another. At best that’s likely only to get you the intersection of features of all the services using the format, and an aging intersection at that.

How to do this while preserving the business models and uniqueness of the services is challenging. For example, some services want to charge you for distant contacts or certain types of searches of your social network. And what do you do when a FoF involves the first friend being on service B and the FoF being on service C.

Truth is, we all belong to many social networks. They won’t all be in one system, ever.

You can’t just have routine sharing. This is private information, we don’t want spammers or marketers harvesting it.

The interchange format will have to be very dynamic. That means that as soon as one service supports a new feature, it should be possible for the format to start supporting it right away, without a committee having to bless a new standard. That means different people will do the same thing in different ways, and that has to be reconciled nicely in the future, not before we start using it.

Of course, at the same time I remain curious about just what they hope for us to do with these social networks. So far I have mostly seen them as a source of entertainment. Real live-altering experiences are rare. Some are using them for business networking and job hunting. Mailing FoFs didn’t really work out, it quickly became more spam than anything. Searching a network (the ideal app for Google’s Orkut) has not yet been done well.

Perhaps the right answer is to keep the networks simple and then let the applications build on top of them, independent of how the networks themselves are implemented. This means, however, a way to give an individual application access to your social network and — this is tricky — the social networks of your friends. Perhaps what we need is a platform, implemented by many, upon which social applications can then be built by many. However, each one will need to ask for access, which might encourage applications to group together to ask as a group. The platform providers should provide few applications. In effect, even browsing your network is not an application the provider should offer, as that has to travel over many providers.

Once some smaller networks figure this out, the larger ones will have to join or fall. Because I don’t want to have to keep joining different networks, but I will join new applications based on my network.

The giant security hole in auto-updating software

It’s more and more common today to see software that is capable of easily or automatically updating itself to a new version. Sometimes the user must confirm the update, in some cases it is fully automatic or manual but non-optional (ie. the old version won’t work any more.) This seems like a valuable feature for fixing security problems as well as bugs.

But rarely do we talk about what a giant hole this is in general computer security. On most computers, programs you run have access to a great deal of the machine, and in the case of Windows, often all of it. Many of these applications are used by millions and in some cases even hundreds of millions of users.

When you install software on almost any machine, you’re trusting the software and the company that made it, and the channel by which you got it — at the time you install. When you have auto-updating software, you’re trusting them on an ongoing basis. It’s really like you’re leaving a copy of the keys to your office at the software vendor, and hoping they won’t do anything bad with them, and hoping that nobody untrusted will get at those keys and so something bad with them.  read more »

Online shopping -- set when you need to get it.

I was seduced by Google’s bribe of $20 per $50 or greater order to try their new Checkout service, and did some Christmas shopping on Normally, being based in Southern California, takes only 1 or 2 days by UPS ground to get things to me. So ordering last weekend should have been low risk for items that are “in stock and ship in 1-2 days.” Yes, they cover their asses by putting a longer upper bound on the shipping time, but generally that’s the ship time for people on the other coast.

I got a mail via Google (part of their privacy protection) that the items had been shipped on Tuesday, so all was well. Unfortunately, I didn’t go and immediately check on the tracking info. The new interface with Google Checkout makes that harder to do — normally you can just go to the account page on most online stores and follow links directly to checking. Here the interface requires you to cut and paste order numbers and it’s buggy, reporting incorrect shipper names.

Unfortuantely it’s becoming common for online stores to keep things in different warehouses around the country now. Some items I ordered, it turns out, while shipped quickly, were shipped from far away. They’ll arrive after Christmas. So now I have to go out and buy the items at stores, or different items in some cases, at higher prices, without the seductive $20 discount — and I then need to arrange return of items ordered after they get here. And I’ll probably be out not only the money I paid for shipping (had I wanted them after christmas I would have selected the free saver shipping option of course) but presumably return shipping.

A very unsatisfactory shopping experience.

How could this have been improved (other than by getting the items to me?)

  1. When they e-mail you about shipment, throw in a tracking link and also include the shipper’s expected delivery day. UPS and Fedex both give that, and even with the USPS you can provide decent estimates.
  2. Let me specify in the order, “I need this by Dec 23.” They might be able to say right then and there that “This item is in stock far away. You need to specify air shipping to do that.”
  3. Failing that, they could, when they finally get ready to ship it, look at what the arrival date will be, and, if you’ve set a drop-dead date, cancel the shipment if it won’t get to you on time. Yes, they lose a sale but they avoid a very disappointed customer.

This does not just apply around Christmas. I often go on trips, and know I won’t be home on certain days. I may want to delay delivery of items around such days.

As I blogged earlier, it also would simplify things a lot if you could use the tracking interface of UPS, Fedex and the rest to reject or divert shipments in transit. If I could say “Return to sender” via the web on a shipment I know is a waste of time, the vendor wins, I win, and even the shipping company can probably set a price for this where they win too. The recipient saves a lot of hassle, and the vendor can also be assured the item has not been opened and quickly restock it as new merchandise. If you do a manual return they have to inspect, and even worry about people who re-shrinkwrap returns to cheat them.

Another issue that will no doubt come up — the Google discount was $20 off orders of $50 or more. If I return only some of the items, will they want to charge me the $20? In that case, you might find yourself in a situation where returning an item below $20 would cost you money! In this case I need to return the entire order except one $5 item I tossed on the order, so it won’t be an issue.

Jolly December to all. (Jolly December is my proposal for the Pastafarian year-end holiday greeting, a good salvo in the war on Christmas. If they’re going to invent a war on Christmas, might as well have one.)

Towards a Zero User Interface backup system

I’ve spoken before about ZUI (Zero User Interface) and how often it’s the right interface.

One important system that often has too complex a UI is backup. Because of that, backups often don’t get done. In particular offsite backups, which are the only way to deal with fire and similar catastrophe.

Here’s a rough design for a ZUI offsite backup. The only UI at a basic level is just installing and enabling it — and choosing a good password (that’s not quite zero UI but it’s pretty limited.)

Once enabled, the backup system will query a central server to start looking for backup buddies. It will be particularly interested in buddies on your same LAN (though it will not consider them offsite.) It will also look for buddies on the same ISP or otherwise close by, network-topology wise. For potential buddies, it will introduce the two of you and let you do bandwidth tests to measure your bandwidth.

At night, the tool would wait for your machine and network to go quiet, and likewise the buddy’s machines. It would then do incremental backups over the network. These would be encrypted with secure keys. Those secure keys would in turn be stored on your own machine (in the clear) and on a central server (encrypted by your password.)

The backup would be clever. It would identify files on your system which are common around the network — ie. files of the OS and installed software packages — and know it doesn’t have to back them up directly, it just has to record their presence and the fact that they exist in many places. It only has to transfer your own created files.

Your backups are sent to two or more different buddies each, compressed. Regular checks are done to see if the buddy is still around. If a buddy leaves the net, it quickly will find other buddies to store data on. Alas, some files, like video, images and music are already compressed, so this means twice as much storage is needed for backup as the files took — though only for your own generated files. So you do have to have a very big disk 3 times bigger than you need, because you must store data for the buddies just as they are storing for you. But disk is getting very cheap.

(Another alternative is RAID-5 style. In RAID-5 style, you distribute each file to 3 or more buddies, except in the RAID-5 parity system, so that any one buddy can vanish and you can still recover the file. This means you may be able to get away with much less excess disk space. There are also redundant storage algorithms that let you tolerate the loss of 2 or even 3 of a larger pool of storers, at a much more modest cost than using double the space.)

All this is, as noted, automatic. You don’t have to do anything to make it happen, and if it’s good at spotting quiet times on the system and network, you don’t even notice it’s happening, except a lot more of your disk is used up storing data for others.

It is the automated nature that is so important. There have been other proposals along these lines, such as MNET and some commercial network backup apps, but never an app you just install, do quick setup and then forget about until you need to restore a file. Only such an app will truly get used and work for the user.

Restore of individual files (if your system is still alive) is easy. You have the keys on file, and can pull your file from the buddies and decrypt it with the keys.

Loss of a local disk is more work, but if you have multiple computers in the household, the keys could be stored on other computers on the same LAN (alas this does require UI to approve this) and then you can go to another computer to get the keys to rebuild the lost disk. Indeed, using local computers as buddies is a good idea due to speed, but they don’t provide offsite backup. It would make sense for the system, at the cost of more disk space, to do both same-LAN backup and offsite. Same-LAN for hardware failures, offsite for building-burns-down failures.

In the event of a building-burns-down failure, you would have to go to the central server, and decrypt your keys with that password. Then you can get your keys and find your buddies and restore your files. Restore would not be ZUI, because we need no motiviation to do restore. It is doing regular backups we lack motivation for.

Of course, many people have huge files on disk. This is particularly true if you do things like record video with MythTV or make giant photographs, as I do. This may be too large for backup over the internet.

In this case, the right thing to do is to backup the smaller files first, and have some UI. This UI would warn the user about this, and suggest options. One option is to not back up things like recorded video. Another is to rely only on local backup if it’s available. Finally, the system should offer a manual backup of the large files, where you connect a removable disk (USB disk for example) and transfer the largest files to it. It is up to you to take that offsite on a regular basis if you can.

However, while this has a UI and physical tasks to do, if you don’t do it it’s not the end of the world. Indeed, your large files may get backed up, slowly, if there’s enough bandwidth.

Generic internet appliances

Normally I’m a general-purpose computing guy. I like that the computer that runs my TV with MythTV is a general purpose computer that does far more than a Tivo ever would. My main computer is normally on and ready for me to do a thousand things.

But there is value in specialty internet appliances, especially ones that can be very low power and small. But it doesn’t make sense to have a ton of those either.

I propose a generic internet appliance box. It would be based on the same small single-board computers which run linux that you find in the typical home router and many other small network appliances. It would ideally be so useful that it would be sold in vast quantities, either in its generic form or with minor repurposings.

Here’s what would be in level 1 of the box:

  • A small, single-board linux computer with low power processor such as the ARM
  • Similar RAM and flash to today’s small boxes, enough to run a modest linux.
  • WiFi radio, usually to be a client — but presumably adaptable to make access points (in which case you need ethernet ports, so perhaps not.)
  • USB port
  • Infrared port for remote control or IR keyboard (optionally a USB add-on)

Optional features would include:

  • Audio output with low-fi speaker
  • Small LCD panel
  • DVI output for flat panel display
  • 3 or 4 buttons arranged next to the LCD panel

The USB port on the basic unit provides a handy way to configure the box. On a full PC, write a thumb-drive with the needed configuration (in particular WiFi encryption keys) and then move the thumb drive to the unit. Thumb drives can also provide a complete filesystem, software or can contain photo slide shows in the version with the video output. Thumb drives could in fact contain entire applications, so you insert one and it copies the app to the box’s flash to give it a personality.

Here are some useful applications:

  • In many towns, you can see when a bus or train will arrive at your stop over the internet. Program the appliance with your stop and how long it takes to walk there after a warning. Press a button when you want to leave, and the box announces over the speaker a countdown of when to go to meet the transit perfectly.
  • Email notifier
  • MP3 output to stereo or digital speakers
  • File server (USB connect to external drives — may require full ethernet.)
  • VOIP phone system speakerphone/ringer/announcer
  • Printer server for USB printers
  • Household controller interface (X10, thermostat control, etc.)

Slap on the back of cheap flat panel display mounted on the wall, connected with video cable. Now offer a vast array of applications such as:

  • Slide show
  • Security video (low-res unless there is an mpeg decoder in the box.)
  • Weather/News/Traffic updates
  • With an infrared keyboard, be a complete terminal to other computer apps and a minimal web browser.

There are many more applications people can dream up. The idea is that one cheap box can do all these things, and since it could be made in serious quantities, it could end up cheaper than the slightly more specialized boxes, which themselves retail for well under $50 today. Indeed today’s USB printer servers turn out to be pretty close to this box.

The goal is to get these out and let people dream up the applications.

In Edmonton

I'm in Edmonton. Turns out to be the farthest north I've been on land (53 degrees 37 minutes at the peak) after another turn through the Icefields Parkway, surely one of the most scenic drives on the planet. My 4th time along it, though this time it was a whiteout. Speaking tomorrow at the CIPS ICE conference on privacy, nanotechnology and the future at 10:15.

Idea of the day. I joined Fairmont Hotels President's Club while at the Chateau Lake Louise because it gave me free internet. When I got to the Fairmont Jasper Lodge my laptop just worked with no login, and I was really impressed -- I figured they had recorded my MAC address as belonging to a member of their club, and were going to let me use it with no login. Alas, no, the Jasper lodge internet (only in main lobby) was free for all. But wouldn't that be great if all hotels did that? Do any of the paid wireless roaming networks do this? (I guess they might be afraid of MAC cloning.) It would also allow, with a simple interface, a way for devices like Wifi SIP phones to use networks that otherwise require a login.

Of course, as we all know, the more expensive the hotel, the more likely the internet is not only not included, it's way overpriced. At least Fairmont gave one way around this. Of course I gave them a unique E-mail address created just for them, so if they spam me I can quickly disable them. But once again I, like most of us, find myself giving up privacy for a few hotel perks.

Wire-crawling robot that lays optical fiber

In thinking about how to reduce the cost of bringing fiber to everybody (particulaly for block-area-networks built by neighbours) I have started wondering if we could build a robot that is able to traverse utility poles by crawling along wires — either power, phone or cable-TV wires. The robot would unspool fiber optic cable behind it and deploy wire-ties to keep it attached. Human beings would still have to eventually climb the poles and install taps or junctions and secure these items, but their job would be much easier.

Robots that can crawl along cables already exist. The hard part is traversing the poles. Now it turns out finding live electric wires is something that’s very easy for a robot to do. They stick out like a live wire in the EM spectrum. The poles of course have insulators, junctions, tie downs and other obstacles. Crossing them may be hard in certain cases (in which case a human would have to help, either by tele-operation, or by climbing the pole.) It may be possible to have a very small robot that is able to follow the current (easy to tell the lines to the houses from the main lines) and cross a pole like a bug and then, once safely on the other side, pulls the larger robot with a small tether. Again, it won’t always work but if you can get it to work enough of the time, you can install fiber with far less time and labour than the manual approach. Fiber of course can be tied to power lies because it is non-conductive material, though it’s even better if you can run it along phone or cable lines.

Not that any of these companies will want to give permission to competitors. And you want to pull multiple fibers, not so much for the bandwidth — we can do terabits in a single fiber if we want to — but for the backup when one fiber breaks.

If the robots get good enough, they could even string fiber into rural areas, following long chains of power or phone lines with just a single human assistant. Of course overhead wires are going to be more prone to breakage, but with these robots, repairs could be fast and cheap.

There are already robots out there which can crawl storm sewers to install fiber. This is another alternative, though that’s good too. Indeed, a robot that can even crawl real sewage lines to put in fiber which comes out your household stack is not out of the question, if it’s in a strong enough casing.

Time for RSS and the aggregators to understand small changes

Over 15 years ago I proposed that USENET support the concept of “replacing” an article (which would mean updating it in place, so people who had already read it would not see it again) in addition to superseding an article, which presented the article as new to those who read it before, but not in both versions to those who hadn’t. Never did get that into the standard, but now it’s time to beg for it in USENET’s successor, RSS and cousins.

I’m tired of the fact that my blog reader offers only two choices — see no updates to articles, or see the articles as new when they are updated. Often the updates are trivial — even things like fixing typos — and I should not see them again. Sometimes they are serious additions or even corrections, and people who read the old one should see them.

Because feed readers aren’t smart about this, it not only means annoying minor updates, but also people are hesitant to make minor corrections because they don’t want to make everybody see the article again.

Clearly, we need a checkbox in updates to say if the update is minor or major. More than a checkbox, the composition software should be able to look at the update, and guess a good default. If you add a whole paragraph, it’s major. If you change the spelling of a word, it’s minor. In addition to providing a good guess for the author, it can also store in the RSS feed a tag attempting to quantify the change in terms of how many words were changed. This way feed readers can be told, “Show me only if the author manually marked the change as major, or if it’s more than 20 words” or whatever the user likes.

Wikis have had the idea of a minor change checkbox for a while, it’s time for blogs to have it too.

Of course, perhaps better would be a specific type of update or new post that preserves thread structure, so that a post with an update is a child of a parent. Which means it is seen with the parent by those who have not yet seen the parent, but as an update on its on for those who did see it. For those who skipped the parent (if we know they skipped) the update also need not be shown.

RSS aggregator to pull threads from multiple intertwined blogs

It’s common in the blogosphere for bloggers to comment on the posts of other bloggers. Sometimes blogs show trackbacks to let you see those comments with a posting. (I turned this off due to trackback spam.) In some cases we effectively get a thread, as might appear in a message board/email/USENET, but the individual components of the thread are all on the individual blogs.

So now we need an RSS aggregator to rebuild these posts into a thread one can see and navigate. It’s a little more complex than threading in USENET, because messages can have more than one parent (ie. link to more than one post) and may not link directly at all. In addition, timestamps only give partial clues as to position in a thread since many people read from aggregators and may not have read a message that was posted an hour ago in their “thread.”

At a minimum, existing aggregators (like bloglines) could spot sub-threads existing entirely among your subscribed feeds, and present those postings to you. You could also define feeds which are unsubscribed but which you wish to see or be informed of postings from in the event of a thread. (Or you might have a block-list of feeds you don’t want to see contributions from.) They could just have a little link saying, “There’s a thread including posts from other blogs on this message” which you could expand, and that would mark those items as read when you came to the other blog.

Blog search tools, like Technoratti could also spot these threads, and present a typical thread interface for perusing them. Both readers and bloggers would be interested in knowing how deep the threads go.

Syndicate content