The Web is a haystack.
This isn’t what Tim Berners-Lee had in mind when he invented the Web. Nor is it what Jerry Yang and David Filo had in mind when they invented Jerry and David’s Guide to the World Wide Web, which later became Yahoo. Jerry and David’s model for the Web was a library, and Yahoo was to be the first catalog for it. This made sense, given the prevailing conceptual frames for the Web at the time: real estate and publishing.
Both of those are still with us today. We frame the Web as real estate when we speak of “sites” with “locations” in “domains” with “addresses” you can “visit” and “browse”—then shift to publishing when we speak of “files” and “pages,” that we “author,” “edit,” “post,” “publish,” “syndicate” and store in “folders” within a “directory.” Both frames suggest durability, if not permanence. Again, kind of like a library.
But once we added personal movement (“surf,” “browse”) and a vehicle for it (the browser), the Web became a World Wide Free-for-all. Literally. Anyone could publish, change and remove whatever they pleased, whenever they pleased. The same went for organizations of every kind, all over the world. And everyone with a browser could find their way to and through all of those spaces and places, and enjoy whatever “content” publishers chose to put there. Thus the Web grew into billions of sites, pages, images, databases, videos, and other stuff, with most of it changing constantly.
The result was a heaving heap of fuck-all.*
How big is it? According to WorldWebSize.com, Google currently indexes about 41 billion pages, and Bing about 9 billion. They also peaked together at about 68 billion pages in late 2019. The Web is surely larger than that, but that’s the practical limit because search engines are the practical way to find pieces of straw in that thing. Will the haystack be less of one when approached by other search engines, such as the new ad-less (subscription-funded) Neeva? Nope. Search engines do not give the Web a card catalog. They certify its nature as a haystack.
So that’s one practical limit. There are others, but they’re hard to see when the level of optionality on the Web is almost indescribably vast. But we can see a few limits by asking some questions:
- Why do you always have to accept websites’ terms? And why do you have no record of your own of what you accepted, or when‚ or anything?
- Why do you have no way to proffer your own terms, to which websites can agree?
- Why did Do Not Track, which was never more than a polite request not to be tracked off a website, get no respect from 99.x% of the world’s websites? And how the hell did Do Not Track turn into the Tracking Preference Expression at the W2C, where the standard never did get fully baked?
- Why, after Do Not Track failed, did hundreds of millions—or perhaps billions—of people start blocking ads, tracking or both, on the Web, amounting to the biggest boycott in world history? And then why did the advertising world, including nearly all advertisers, their agents, and their dependents in publishing, treat this as a problem rather than a clear and gigantic message from the marketplace?
- Why are the choices presented to you by websites called your choices, when all those choices are provided by them? And why don’t you give them choices?
- Why would Apple’s way of making you private on your phone be to “Ask App Not to Track,” rather than “Tell App Not to Track,” or “Prevent App From Tracking You“?
- Why does the GDPR call people “data subjects” rather than people, or human beings, and then assign the roles “data controller” and “data processor” only to other parties? (Yes, it does say a “data controller” can be a “natural person,” but more as a technicality than as a call for the development of agency on behalf of that person.)
- Why are nearly all of the billion results in a search for GDPR+compliance about how companies can obey the letter of that law while violating its spirit by continuing to track people through the giant loophole you see in every cookie notice?
- Why does the CCPA give you the right to ask to have back personal data others have gathered about you on the Web, rather than forbid its collection in the first place? (Imagine a law that assumes that all farmers’ horses are gone from their barns, but gives those farmers a right to demand horses back from those who took them. It’s kinda like that.)
- Why, 22 years after The Cluetrain Manifesto said, we are not seats or eyeballs or end users or consumers. we are human beings and our reach exceeds your grasp. deal with it. —is that statement (one I helped write!) still not true?
- Why, 9 years after Harvard Business Review Press published The Intention Economy: When Customers Take Charge, has that not happened? (Really, what are you in charge of in the marketplace that isn’t inside companies’ silos and platforms?)
- And, to sum up all the above, why does “free market” on the Web mean your choice of captor?
It’s easy to blame the cookie, which Lou Montulli invented in 1994 as a way for sites to remember their visitors by planting reminder files—cookies—in visitors’ browsers. Cookies also gave visitors a way to remember where they were when they last visited. For sites that require logins, cookies take care of that as well.
What matters, however, is not the cookie. What matters is why the cookie was necessary in the first place: the Web’s architecture. It’s called client-server, and is represented graphically like this:
This architecture was born in the era of centralized mainframes, which “users” accessed through client devices called “dumb terminals”:
On the Web, as it was in the old mainframe world, we clients—mere users—are as subordinate to servers as are calves to cows:
(In fact I’ve been told that client-server was originally a euphemism for “slave-master.” Whether true or not, it makes sense.)
In the client-server paradigm, our agency—our ability to act with effect in the world—is restricted to what servers allow or provide for us. Our choices are what they provide. We are independent only to the degree that we can also be clients to other servers. In this paradigm, a free market is “your choice of captor.”
Want privacy? You have to ask for it. And, if you go to the trouble of doing that—which you have to do separately with every site and service you encounter (each a mainframe of its own)—your client doesn’t keep a record of what you “agreed” to. The server does. Good luck finding whatever it is the server or its third parties remember about that agreement.
Want to control how your data (or data about you) gets processed by the servers of the world? Good luck with that too. Again, Europe’s GDPR says “natural persons” are just “data subjects,” while “data controllers” and “data processors” are roles reserved for servers.
Want a shopping cart of your own to take from site to site? My wife asked for that in 1995. It’s still barely thinkable in 2021. Want a dashboard for your life where you can gather all your expenses, investments, property records, health information, calendars, contacts, and other personal information? She asked for that too, and we still don’t have it, except to the degree that large server operators (e.g. Google, Apple, Microsoft) give us pieces of it, hosted in their clouds, and rigged to keep you captive to their systems.
That’s why we don’t yet have an Internet of Things (IoT), but rather an Apple of Things, a Google of Things, and an Amazon of Things.
Is it possible to do stuff on the Web that isn’t client-server? Perhaps some techies among us can provide examples, but practically speaking, here’s what matters: If it’s not thinkable by the owners of the servers we depend on, it doesn’t get made.
From our position at the bottom of the Web’s haystack, it’s hard to imagine there might be a world where it’s possible for us to have full agency: to not be just users of clients enslaved to as many servers as we deal with every day.
But that world exists. It’s called the Internet, And it can support a helluva lot more than the Web—with many ways to interact other than those possible in through client-server alone.
Digital technology as we know it has only been around for a few decades, and the Internet for maybe half that time. Mobile computers that run apps and presume connectivity everywhere have only been with us for a decade or less. And all of those will be with us for many decades, centuries, or millennia to come. We are not going to stop living digital lives, any more than we are going to stop speaking, writing, or using mathematics. Digital technology and the Internet are granted wishes that won’t go back into the genie’s bottle.
Credit where due: the Web is excellent, but not boundlessly so. It has limits. Thanks to the client-server model, full personal agency is not a grace of life on the Web. Not until we have servers or agents of our own. (Yes, we could have our own servers back in Web1 days—my own Web and email servers lived under my desk and had their own static IP addresses from roughly 1995 until 2003—and a few alpha geeks still do. But since then we’ve mostly needed to live as digital serfs, by the graces of corporate overlords.)
So now it’s time to think and build outside the haystack.
Models for that do exist, and some have been around for a long time. Email is one example. While you can look at your email on the Web, or use a Web-based email service (such as Gmail), email itself is independent of those. My own searls.com email has been at servers in my home, on racks elsewhere, and in a hired cloud. I can move it anywhere I want. You can move yours as well, because the services we hire to host our personal email are substitutable. That’s just one way we can enjoy full agency on the Internet.
- ProjectVRM, which I started as a fellow of the Berkman Klein Center at Harvard in 2006, and which is graciously still hosted (with this blog) by the Center there. Our mailing list currently has more than 550 members. We also meet twice a year with the Internet Identity Workshop, which I co-founded, and still co-organize, with Kaliya Young and Phil Windley, in 2005). Immodestly speaking, IIW is the most leveraged conference I know.
- Customer Commons, where we are currently working on building out what’s called the Byway. Go there and follow along as we work to toward better answers to the questions above than you’ll get from inside the haystack. Customer Commons is a 501(c)3 nonprofit spun out of ProjectVRM.
- The Ostrom Workshop at Indiana University, where Joyce (my wife and fellow founder and board member of Customer Commons) and I are both visiting scholars. It is in that capacity that we are working on the Byway and leading a salon series titled Beyond the Web. Go to that link and sign up to attend. I look forward to seeing and talking with you there.
[Later…] More on the Web as a haystack is in FILE NOT FOUND: A generation that grew up with Google is forcing professors to rethink their lesson plans, by Monica Chin (@mcsquared96) in The Verge, and Students don’t know what files and folders are, professors say, by Jody MacGregor in PC Gamer, which sources Monica’s report.
*I originally had “heaving haystack of fuck-all” here, but some remember it as the more alliterative “heaving heap of fuck-all.” So I decided to swap them. If comments actually worked here†, I’d ask for a vote. But feel free to write me instead, at my first name at my last name dot com.
†Now they do. Thanks for your patience, everybody.