Yahoo announced that in a few days they will shut down the altavista web site. This has prompted a few posts on the history of internet search, to which I will add an anecdote.
The first internet search engine predated the “web” and was called Archie search engine. Archie (an archive search) was basic by today’s standards. The main protocol for getting files on the internet in those days was FTP. Many sites ran an open FTP server, which you could connect to and download files from. If you had files or software to share with people, you put it up on an FTP server, in particular one that allowed anonymous login to get public files. The Archie team (from Montreal) built a tool to go to all the open servers, read their indexes and generate a database. You could then search, and get a pointer to all the places you could get a file. It was hugely popular for the day.
(You will probably note that this is almost exactly the way Napster worked, the only difference being that Napster was a bit more sophisticated and people used it to share files that were copyrighted. FTP servers had copyrighted material, but mostly they had open source software and documents.)
Around the same time, a lot of folks were building full-text search engines for use on large collections of documents. You could find these on private databases around the world, and the WAIS protocol was developed by Brewster Kahle to make a standardized interface to text search and his own text search tools.
Not long after the web started to grow, Fuzzy Mauldin at CMU made Lycos which was a full-text search engine applied to documents gathered from the web. The ability to search the web generated much attention, and a few other competing spiders and search engines appeared. Everybody had a favourite. (To add to my long list of missed opportunities, in April of 95 I wrote a few notes to Fuzzy looking to get his spider index so we could sort web pages based on how many incoming links they had. Nothing ever came of that but as you may know that concept later had some value. :-) And I also turned down a $4M offer from Lycos to buy ClariNet (which would have turned into $40M when their stock shot up in the bubble. Sigh.)
In 1995, for many people that favourite changed to Alta Vista, a new search engine from Digital Equipment Corp. DEC was a huge name at the time, the biggest name in minicomputers, and it was just losing the Unix crown to Sun. The team at DEC put a lot of computing power into Alta Vista, and so it had two useful attributes. First, they spidered a lot more pages, and thus were more likely to find stuff. They were also fast compared to most of the other engines. In a precursor to other rapid turnarounds in the internet business, you could switch your favourite search engine in a heartbeat and many did. It was big and fast due to DEC putting a lot of fancy computer hardware on it, and DEC eventually justified the money they were spending on it (there was no revenue for search in those days) by saying it showed off just how powerful DEC’s computers with big address spaces were. Indeed the limits of Alta Vista were the limits of the architecture, using the 64-bit Alpha to address 130gb of RAM and 500gb of disk — huge for the day.
On Alta Vista’s home page, they gave you a sample query to type in the search box, to show you how to use it. That query was:
kayak sailing “san juan islands”
Indeed, if you typed that, you got a nice array of pages which talked about kayaking up in the San Juan islands, tour operators, etc. — just what you wanted to get from a query.
My devious mind wondered, “what if I put up a page on my own web site with this as the title?” I created the Kayak Sailing “San Juan Islands” home page on the rec.humor.funny site, which was already a very popular site in those days. (Indeed it’s around 1995 that RHF fell behind Yahoo as the most widely read thing on the internet, but that refers to the USENET group, not the page.)
You will note as you look at the page that it contains the words in the title and headers, and repeated many times in invisible comments. In those days the search engines were ranking higher simply based on where words were, and if they were repeated many times. So I gave it a whirl. This was an early attempt at what is now called “black hat search engine optimization” though I was doing it for fun, rather than nefarious gain.
The results didn’t change though. Alta Vista relied on huge computer power, but it only rebuilt the index by hand. It would be a month or more before Alta Vista recalculated its index. One day I went to type in the query and bingo — there was my page on the first page of search results. Along with a dozen other people who had tried the same thing, and a few pages that were articles writing about Alta Vista and giving the example query, or which were copying its search page which of course had that string.
More to the point, not a single item on the results page was about actual Kayaking! The sample query was ruined, though the results were quite amusing. Not long after, Alta Vista changed the example to Pizza “deep dish” Chicago and of course I added it to my page as well. So not much longer after that AV switched to showing different examples from a rotating and changing collection so people could not play this game any more.
While Alta Vista ruled Search, in spite of efforts from Infoseek, Inktomi/Hotbot and others, we all know that a few years later, Google was born at Stanford, and it proved again how quickly people could switch to a new favourite search engine, and lives under that fear (but with great success) to this day. And Google’s dominance turned SEO into a giant industry.