How the Internet Fails Science - And How We Can Fix It for Good

In the current age, nearly every piece of knowledge ends up online. But here’s the irony - the Internet, as it exists today, is a terrible place to store our scientific record.

The Internet was never designed to preserve information indefinitely. It was built to deliver content in the moment rather than maintain it for decades or centuries. And that’s a big problem when your career, or humanity’s collective knowledge, depends on old links staying reliable.

The Twin Threats: Link Rot and Content Drift

If you’ve ever clicked a link in a research paper only to find a 404 page, you’ve met link rot. Less apparent, but equally dangerous, is content drift: when the link works but the content has changed from what the author originally cited.

This is not rare. A study by Martin Klein and Herbert Van de Sompel found that 75% of referenced URLs in scientific articles have changed somewhat in three years. The older the citation, the worse it gets. Links from the 1990s? Almost certainly gone.

This undermines science’s foundations. It makes reproducing research harder, verifying claims more time-consuming, and leaves the door wide open for misinformation.

Wait, Don’t DOIs Fix This?

Kind of. DOIs (Digital Object Identifiers) were invented to tackle this very problem. They provide digital content with a unique, “persistent” link that’s intended to outlast server shutdowns or website redesigns.

And they mostly work, for now. Over 100 million DOIs have been minted, and they’re baked into the workflows of publishers, databases, and citation tools worldwide.

But here’s the catch: a DOI is only as good as the people maintaining it. If a journal goes out of business or a publisher fails to update its links, that “persistent” identifier might lead to a dead end. Worse, each DOI needs to be manually registered and maintained.

Klein and Balakireva (2020) found that about half of all DOI requests don’t successfully lead to the intended resource. Their study highlights that DOI resolution can vary widely depending on the network you’re using. For example, accessing a DOI from your office computer might yield different results than trying the same link on your phone while traveling.

As we move into an era of FAIR science, where every dataset, code snippet, and lab note needs a unique, findable ID, the scale becomes mind-boggling. Imagine trying to maintain trillions of these one-to-one links. It’s not just inefficient, it’s unsustainable.

Why the Internet’s Architecture Is the Real Culprit

At the heart of all this is a basic design flaw: the web is based on location addressing. Every link asks, “Where is this file stored?” However, storage locations are fragile - servers fail, websites undergo redesigns, and organizations close down.

What we need is content addressing: a way to say, “Give me exactly this file no matter where it lives.” That’s how you stop both link rot and content drift.

Enter dPIDs: Persistence Without the Pitfalls

This is where decentralized persistent identifiers, or dPIDs, come in.

dPIDs flip the script. Instead of pointing to a specific location, they point to a fingerprint - a unique hash of the content itself. Change the content, and you get a different fingerprint. It’s like an incorruptible DNA test for files.

Here’s how this works in practice:

Files get unique fingerprints: Any change, even a single pixel, creates a new ID.
They live on a decentralized network: Systems like IPFS store multiple copies on peer-to-peer nodes. No single server means no single point of failure.
They’re versioned by design: dPIDs log every change, with timestamps and signatures, so you can see exactly how a paper or dataset evolved.
They’re future-proof: DOIs don’t need to be scrapped; they can plug right in. A DOI can resolve to a dPID-backed folder system.

Beyond Links: Building Trust, Transparency, and Scale

A decentralized, content-addressed system does more than keep old links alive. It helps tackle some of the biggest headaches in science today:

Trustworthy provenance: No more guesswork about whether data was tweaked. dPIDs maintain a transparent and verifiable record of changes.
Reproducibility: Editors, peer reviewers, and future researchers can see not just a final paper, but the whole trail of drafts, data, and code.
Less gatekeeping: Libraries, universities, and researchers can host content themselves — no vendor lock-in, no single point of control.
Cheaper FAIR data: Trillions of unique PIDs become manageable because you don’t need to maintain fragile URL mappings manually.

And with the rise of AI and fake research, this extra layer of traceability could be the difference between trust and chaos.

What’s Under the Hood

If you’re the technically curious type, here’s a peek behind the scenes:

IPFS (InterPlanetary File System) is a peer-to-peer network that stores content by its fingerprint.
IPLD keeps related files linked - so data, code, and manuscripts stay connected.
DIDs (Decentralized Identifiers) enable individuals and organizations to control their own identities within this system.
Blockchain and smart contracts provide tamper-proof records of changes.

It’s all open-source and free to use - no paywalls, no proprietary traps.

The Good News: We’re Not Starting from Scratch

This idea isn’t just theoretical. There’s already a growing community, including the dPID Working Group, ****pushing this vision forward and building on strong foundations, such as ORCID (for researcher identity) and RoR (for research organizations). Together, they form a backbone that’s open, interoperable, and resilient.

Science builds on what came before. If we let the links rot and the data drift away, we’re not just losing information, we’re eroding trust, wasting money, and slowing progress. And that’s too big a cost to pay.

With dPIDs and decentralized storage, we have a clear and achievable path to address this issue. The technology exists. The community exists. Now, it’s a question of adoption and collective action.

The Internet we have isn’t good enough for the science we need. However, with tools like dPIDs, decentralized storage, and platforms like DeSci Publish, we can address this issue. Let’s build a lasting scientific record that is resilient, transparent, and open to all.

‍