
Before there were search engines, there were directories. The biggest and best-known was Yahoo. On the first graphical browser (Mosaic), it looked like this:
The directory idea made sense because the Web is laid out like the directory in your computer. There is a “domain” with a “location” or a “site,” containing something after the last / in a path of /something/something/something. Geeks call these directories too, and the string of somethings a path.
While this design is boundlessly flexible, it also suggests durability, if not permanence, because it’s good to find stuff where it rightly goes and find it in the same place every time you return to it.
That was what Yahoo assumed in the early days of the Web—as did everyone who bought a domain name. I’ve had searls.com since 1995. Dave Winer (father of outlining and progenitor of much else) has had Scripting.com for even longer (and has a lot more at that domain).
But we don’t own domain names. We rent them. And the World Wide Web isn’t a library. It’s a whiteboard with stuff written on it. Some of that stuff is located on directory paths. A lot more is coughed up by database systems on an as-needed basis.
The Yahoo directory failed. In its place came search engines, which don’t catalog the Web like a library might. They index it. That means they send crawlers down the Web’s directory paths, recording everything they see into a searchable index. I explain here how that works and where this went:
The Web is a haystack.
This isn’t what Tim Berners-Lee had in mind when he invented the Web. Nor is it what Jerry Yang and David Filo had in mind when they invented Jerry and David’s Guide to the World Wide Web, which later became Yahoo. Jerry and David’s model for the Web was a library, and Yahoo was to be the first catalog for it. This made sense, given the prevailing conceptual frames for the Web at the time: real estate and publishing.
Both of those are still with us today. We frame the Web as real estate when we speak of “sites” with “locations” in “domains” with “addresses” you can “visit” and “browse”—then shift to publishing when we speak of “files” and “pages,” that we “author,” “edit,” “post,” “publish,” “syndicate” and store in “folders” within a “directory.” Both frames suggest durability if not permanence. Again, kind of like a library.
But once we added personal movement (“surf,” “browse”) and a vehicle for it (the browser), the Web became a World Wide Free-for-all. Literally. Anyone could publish, change and remove whatever they pleased, whenever they pleased. The same went for organizations of every kind, all over the world. And everyone with a browser could find their way to and through all of those spaces and places, and enjoy whatever “content” publishers chose to put there. Thus the Web grew into billions of sites, pages, images, databases, videos, and other stuff, with most of it changing constantly.
The result was a heaving heap of fuck-all.*
Back in 2005, I wrote in Linux Journal about a split between the “static” Web that was like a library (with its “locations,” “sites,” and “domains” you could “visit” and “browse”), and the “live” Web of blogs and posts. Then social media came along, and the live branch of the Web outgrew the static Web’s trunk.
Last week came news that a leak revealed lots of interesting poop about how Google ranks search results. Here are two things I don’t need those leaked documents to tell me:
- Google favors the present over the past, and the current over the archival.
- Google no longer indexes, or ranks, very old Web pages.
I speak from experience here, because I have some old pages on the Web that Google no longer indexes, so searches don’t find them. I also have Easter eggs on a couple of those pages: words that exist in no language but made those pages easy to find when I did keyword searches for them. Now I get “No results found for _____.” (I won’t reveal the word because I want to keep testing Google.)
Countless publications have also come and gone on the Web without leaving a trace. Upside was a gigantic tech industry publication from the Nineties through the dotcom boom. Not a trace of it remains. Far as I know, nothing remains of Fast Company‘s early issues.
But hey, God bless the Internet Archive. Here’s a piece I wrote for PC Magazine in December 1982 about a PC application that taught card counting in blackjack:

As the evanescence of “content” increases, so does the importance of archives.
So maybe stop reading here and start reading here. We have a lot of work to do.


Leave a Reply to Jim Pasquale Cancel reply