Keep the Internet DRY

Created: 2024-Jan-14

DRY (a common programming acronym meaning “Don’t Repeat Yourself”) can and should be applied to the internet as a whole.

It’s a great practice if you know it, but also one that takes practice to be great at, ya know? Avoiding useless or detrimental repetition is an easy-to-understand premise, but worth exploring, as the devil is in the details.

The Problem Space

We are living in an era of mass information repackaging, perpetuated by people seeking easy streams of passive income (among other things), and enabled by state-of-the-art tools designed to capture attention and drag that attention across an ad-riddled path. This doesn’t sound great, but there’s nothing malicious in it, it’s just a problem in need of solving. We do not suffer from a lack of valuable information, but from some kind of signal issue similar to noise or positive feedback, and in this environment of abundant information, the way we present our own information becomes that much more important.

How does the attention we give a presentation differ if we think the presentation contains no information that’s new to us?

The Anti-Problem

Novel presentation is art. Novel presentaiton is functional. Even if the information doesn’t change, the way we present something to an audience can in them trigger new insight. Even if we had the tools to merge every bit of repeated information on the web, would we be able to do so with the care to preserve novel presentations and the benefits they bring?

In software, enacting DRY evokes related questions: when does repetition need to be collapsed for simplicity? How do we recognize scenarios that look like repetition but in which merging is detrimental because the concepts are actually novel?

Out of Scope

There is of course usefulness to some informational replication to increase the durability of information across the network, but there’s a big difference between mirroring information for durability’s sake and repackaging information for personal gain. If we assume that most information is already mirrored via sites like the Internet Archive, then we can say we don’t have to worry about replicating information. We can simply focus on what we’re wanting to add to the human body of work.

Beyond intentional repackaging, we also unintentionally repackage information. That issue is related, but is different enough we’ll skip it. For now, it suffices to consider the question, “Once we become aware of prior art, how do we handle it?“. Have you ever begun a programming task where you created a new class only to realize that the concept was already captured under a synonymous class name? Not a great feeling, but usually one with a clear solution.

Looking back to The Anti-Problem, we need to acknowledge the uncomfortable fact that we currently have a difficult time accurately ascertaining whether information is needlessly repeated, or whether its presentation differs in ways that add value. I’m as yet unaware of tools that might make this easy, and even if such a tool did exist, the distributed ownership of the internet would make an internet-wide cleanup impractical for regular folks or turn regular folks into tyrants. The most reasonable conclusion seems to be that even if we choose to keep our little corner of the internet DRY, we cannot effectively worry about the DRYness of others. Now the question becomes “If the rest of the internet isn’t going to adhere to any sort of DRYness, why should I?”

Incentives

A lot of the benefits of keeping code DRY also apply to keeping the internet DRY. While we have some tools to influence others in software teams (such as peer review), in both scenarios we are ultimately dealing with people who aren’t guaranteed to adhere to DRY. In this scenario, we can sink to the lowest common denominator, or we can build a key differentiator. For DRY to be a key differentiator, it’s important to understand how it differentiates us in a good way:

A greater percentage of our writing is novel. Anecdotally, this leads to a greater “interest per word” on the part of the reader as it reduces the risk they see something they already know and tune out or start skimming.
Giving attribution where it’s relevant frames our ideas as part of a network of related ideas, which increases reader confidence. It also shows readers we care enough to read about the topic we’re writing about.
We have to write less for the same impact!
If the source changes, we’re up to date.

Caveats

Looking at the list of Incentives, of course the last two are also risks.

If we love to write, DRY means writing less! More seriously, if we use writing as a learning tool, sticking too strictly to DRY might mean we skip opportunities to absorb concepts as we reword them.

If we’re relying on the source and it is removed, or is modified in a way that matters for our usage, then our design has backfired. We need to be aware of how the source is likely to change, assess the risk, and plan accordingly. Some strategies to mitigate this risk without violating DRY are to summarize, or to use keywords helpful for finding the same or similar sources via search engine.

We could also consider the consequences of performant system design theory, in that thoughtful replication of information to key locations can vastly increase the performance of a given system. In the case of hosting content, “performance” might mean how we hold the attention of the reader through a particular thought versus risking them drilling down a rabbit hole due to linked information. This might be a good reason to use a footnote link in some places rather than an in-line link.

DRY Internet

Instead of DRY, maybe DRI (Don’t Repeat the Internet). We should set the expectation that our role is to link existing information where possible, and rely on brief summary and relevant search terms for scenarios in which links die or are for whatever reason impractical. We cannot unilaterally decide to apply DRY to the internet, we can only do our part to keep the internet DRY, and we have personal incentive to do so.

All those words just to conclude what we already knew: wiki-style linking in articles is where it’s at. Have I spent all this time rehashing Jimmy Wales or the founder of an earlier wiki? Better yet, in a distant past did Tim Berners-Lee take a similar train of thought on his way to inventing HTTP? If he did, I hope to find it and link it here, and I hope my presentation has still managed some novel value.