Bonum Certa Men Certa

Archiving Web Sites to Ensure They Last Decades, Not Years, Outliving or Outlasting Various Disruptive Events

Video download link | md5sum b29da11a5ae25c7597c459e8e4c320b2



Summary: Today we upload 15 years' worth of blog posts to the Internet Archive (IA), or close to 32,000 stories along with Daily Links; we suggest that other sites do the same in order to tackle 'Internet rot' and preserve information (otherwise there's room for obscene revisionism)

THE INTERNET won't stay around forever. The Soviets, back in the old days, tried to develop something similar to it. The Internet will probably survive the next decade or two, but fifty years is a stretch; as for the World Wide Web, it has already devolved into a transport layer for JavaScript and DRM, having been rendered bloated and malicious in practice (albeit not in theory; one can still produce elegant Web sites).



Earlier this year we moved to Gemini and more than a year ago we adopted IPFS, which is used to circulate daily bulletins and IRC logs in a decentralised fashion. Our IRC channels all became self-hosted (in our network) earlier this year -- an ambition that we've had for years but didn't get around to until Freenode collapsed.

Archiving a Web site isn't the same as format changes and protocol changes. It's also not about making more copies, especially if those copies are as vulnerable to censorship as one another. Here in this site we have some public domain (PD) works that are of relevance to us and can be accessed in gemini://. Most of the works, however, use a Creative Commons licence. We are not a curation site per se, but it helps to keep copies of historical material, such as antitrust material demonstrating Microsoft's crimes (as tactics barely change over time). Well, by Internet standards we have enjoyed a long span of 15 years (articles and daily links) and we remain active on the daily basis. The same is true for Tux Machines, which turns 18 this coming summer, so a lot of the material we have here is no longer available anywhere else, except the Internet Archive (IA).

A few years ago we started making site archives in IA and we also recommended the site to people, dubbing it the most important site on the Web. It's no eternal site however; as an associate of ours explains, "the IA is very important but it will succumb as the WWW is phased out in favor of obfuscated, proprietary JavaScript."

IA can barely cope with (e.g. spider/index/save/navigate) many of the "modern" Web. When you add DRM to the mix (EME), then it's not a "format-shifting'" task as that too becomes an impossibility. Sites need to evolve or perish, which may mean getting off the Web and one day planning for the demise of the Internet as a whole. Like IA, our associate explains, "archive.is is interesting, but it'll die one day. In the long run they will all pass away. In formal archives, one of the initial decisions the institution has to make about any given artifact is that of how long it shall be preserved for. Nothing lasts forever, but there are ways of stretching things out and the duration determines the methods of preservation."

For a site such as ours it makes sense to keep the material available for 50 years, which is maybe how much longer I can live (if I'm lucky).

"Media shifting will obviously be involved," the associate notes, "but at a loss for some items. The plan pre-dates AWA by a great many years."

Last weekend we turned 15. "Already in 15 short years," our associate remarks, "many whole sites are gone. And of the sites that remain, many have lost all their old articles in clumsy reorgs. Of that which is left, some of those have purged documents with "inconvenient" messages or themes... even Groklaw purged its comments. I suppose few to none of the Groklaw comments made it into the Library of Congress archives."

At the time of writing I'm still uploading 205 MB of archives (as shown in the video above). We hope it can inspire other sites to think ahead and do the same. It's not a big task and it's better done before it's "too late"...

Our associate concludes by saying that "many programmers and even engineers are conscientious in erasing anything "old" even important records. Now with electronic media, there is often only a single copy of anything any more and that introduces, obviously, a single point of failure. So in the old days, one could maintain a relevant personal or professional archive. Now those are all centralized and continue to exist only at the whim of participant consensus. Anyone with administrative privileges, can "tidy" up and easily erases the world's last copy of a standard or other evidence or similar material."

We are going to add more material to IA and it can be found here as that piles up along with some material that isn't ours.

Recent Techrights' Posts

"Maybe the Problem is You"
they probably felt like they had no choice because they really needed this Microsoft money
GNU OS, Powered by Hurd
Choice is good, as long as choices exist that respect the users' freedom
European Patent Office (EPO) Reformation Project
It's a stain on the EU's reputation
Slopwatch: Google News and Other Slopfarms
Google News is rewarding sites that misuse LLMs and cheat the Web
Moral Standards From the Masters of Linux
They get hung up on minor language issue and promote this crazy theory that racism will go away if only everyone spoke a little differently (no matter where he or she came from)
 
Microsoft Problems in Palestinian Territory and Israel
Microsoft stock (share price) goes up when market share goes down
Microsoft is getting ready to cause many employees to resign
Having already laid off many workers earlier this month, it now tries another approach
Slave is Not a Bad Word, We Need to Use It Sometimes
Who does such exclusion of words benefit? What sort of expression will be deemed impermissible and subjected to CoC enforcement?
National Day of Action
"This Friday, August 15th, there is an organized, petition-based, protest of Wells Fargo in major cities across the US," Richard Stallman wrote
Our Gemini Editions Now Contain 100,000+ GemText Pages
Our Gemini Editions aren't small, even if Gemini Protocol is still the 'underdog'
The Relations Between the United States and Europe Deteriorate, Should Europe Continue to Rely on American Tech Giants?
The shallow notion that made-in-USA software is fairly safe for Europe to rely to is coming to a standstill
Techrights and Tux Machines Running as Usual During Vacations
No interruptions, maybe temporarily slowdowns
Gemini Links 15/08/2025: ADHD and "Random Weird Things"
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Thursday, August 14, 2025
IRC logs for Thursday, August 14, 2025
"Article 52. PATENTABLE INVENTIONS" in the European Patent Convention
Some time tomorrow we'll have a complete local copy of the EPC
Serial Slopper (SS) Still at It, Still Misusing Plagiarism Tools and Cheatware for Images and Text About "Linux"
All the slopfarms are a very big problem
Reddit Deletes Stuff, But Not for Being False or Misleading
Yet another one of those articles that speak of a man in his 50s as if he's terminally ill
Times of India and India.com Are Clickbait and LLM Slop
Google continues to reward bad actors
The More "Market Share" Microsoft Loses, The Higher the Shares Go
People joke about the same sort of thing in relation to IBM
To OIN, Software Patents Are Not a Problem
Had software patents ceased to exist, OIN too would cease to exist and its staff would be unemployed.
Microsoft's Bankruptcy in Russia is Only the Beginning
Due to politics it mostly makes sense that Windows is being phased out, also in part due to policy changes
Microsoft-Funded Publishers Lied to Us About Vista 10 and Now Advocate Us Owning Nothing
They want you to own nothing, but they also want you to buy a PC on which to become Microsoft's slave and they make it harder if not practically impossible to remove Windows
Articles Promoting and Celebrating Wayland Are LLM Slop
New example (100% slop)
The Register MS, Dominated by American Editors, Says UK Should be Run (Digitally) by Microsoft US
The Register MS is sponsored by American money, run by Americans, and its chief editor is a Microsofter from the US
Gemini Links 14/08/2025: Drought, Climate Experiments, and LLM Slop Considered Detrimental
Links for the day
Links 14/08/2025: Second-hand ThinkPad and Enhanced Surveillance on Chipsets from the United States
Links for the day
Links 14/08/2025: Data Brokers Hiding Opt-Out Pages From Google, "Fight Chat Control"
Links for the day
FSF Infrastructure Under Constant Attack
The disconnect (literally) has had an effect on credibility
Feels Like The Register MS is Trying to Diversify a Bit
If The Register MS goes back to being The Register US (or UK), that will be a nice improvement
Gemini Links 14/08/2025: Reading Journal and LLM Fatigue Revisited
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Wednesday, August 13, 2025
IRC logs for Wednesday, August 13, 2025
Hopping From One Set of Buzzwords to the Next
Rotating hype and vapourware
Currys PCWorld Hates GNU/Linux Even Though It Runs the World
If more and more people choose to remove Windows, then Currys PCWorld will feel the financial impact of its dumb policies
Internet Relay Chat and Gemini Protocol Help Us Relive the Net of the Dial-Up Era
The kids were alright
The Register MS Takes More Money to Boost Slop Hype, This Time From Snyk, a Notorious FUD Source
At some stage or at some point they might even decide to stop doing so
"GPT-5" is Another Microsoft Dead Cat Trying to Bounce
The hype, the momentum (or the inertia) is wearing off
Microsoft Windows Losing Its Grip Near Turkey and Russia
The 'corridor' nations connecting Iran to Europe
Slopwatch: LinuxSecurity, Google News, and Serial Slopper (SS)
The slop, the bad, and the ugly
Links 13/08/2025: The “Incriminating Video” Scam and Corruption in South Korea
Links for the day
Gemini Links 13/08/2025: Movie Memories and Mystery Machine Bus
Links for the day
"AI" Hype or LLM Slop is Not About Efficiency, It's About Lowering Standards
It does not seem like IBM is genuinely committed to the same goals (or commitments) as the original Red Hat
Links 13/08/2025: GitHub Trouble and Openwashing by Microsoft OSI With the Typical Buzzwords
Links for the day
If Free/Libre Software is Adding Trillions in Value to the European Economy, Then the European Commission Must Crush Software Patents
Further to what we wrote yesterday
Microsoft Swallows GitHub Losses
Only Microsoft knows how much money it has already lost on GitHub
Gemini Links 13/08/2025: Climate, Coffee, and Deploying Troops in Washington DC After Pardoning 1,000+ Insurrectionists in Washington DC
Links for the day
The Register MS Lowered MS Focus This Week
We hope The Register recognises its errors and tries to make up for them
Learning Ethics From Jeffrey Epstein's Enabler/Client/Ally, Coca-Cola, and Microsoft Accenture
Whatever merits vocabulary changes initially had are being tainted or obscured by later iterations, which tell us to avoid word like "normal", which apparently offend some people (so they argue)
Personal Attacks From Rust People Serve to Confirm They Have Lost the Argument
"The discussion I find around the net so far has no technical merit and centers around ad hominem"
Physical Meters and Purely Mechanical Meters Aren't Dumb; It's Dumb to Mock or Dismiss Them as Antiquated
I've learned a lot this week, both online and over the telephone
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Tuesday, August 12, 2025
IRC logs for Tuesday, August 12, 2025