Bonum Certa Men Certa

Improving Site Navigation and Discovery

posted by Roy Schestowitz on Dec 12, 2023

An open book

THE site is growing fast and people have a hard time searching for much older material. We fully recognise this limitation. It's a real peril. Many sites have the exact same limitation. This problem isn't limited to digital media, either (volumes of material, some of it outdated or unlinked).

In WordPress we used code that checks references in reverse; for any given article, it would (at the bottom) show later (future) articles that link to it. This was very CPU-intensive (at the database level), resulting in pages taking far longer to load. Unless properly cached, it would require scanning about 10 GB of text (or 40,000 blog posts' bodies, not counting drafts/revisions).

We needed to move on. Better sooner than later. Having a server screaming 24/7 to serve requests (whose growing proportion is rogue bots) is not a long-term strategy. Running a Web server on a machine with almost 100 CPU cores isn't cheap.

Before the very final post from Pamela Jones of Groklaw (just over 10 years ago) she wrote about the challenges of preserving old material. She had quit before, then came back, then retired. Fair enough, she wasn't getting young, but it was important for her to ensure the information remains accessible for many years to come (debunking lies about the GPL and origins of Linux). Some time later the site was converted into static pages (still hosted at ibiblio.org), but some material such as old comments disappeared in the process. Geeklog had its share of limitations and apparently it's still being maintained.

Anyway, unlike Groklaw we're still going. I'm 41 and in good health. I receive help from many people and we're good to go. Nothing can stop us, even though some extremists are trying. We won't let wackadoodles waste our time. They just validate what we wrote months ago and they try to attack my wife. Misogynists are like that; they love picking on women.

So what next for search? We've long envisioned this site having self-hosted search, not that lousy WordPress search our blog used to have (it's just some lousy WordPress database scan, which is notoriously weak at delivering relevant results).

No, we don't want to rely on third parties either. We don't want to hear, "how about Google?" or "why not ClownFlare?" (Wherever or whenever there are DDoS attacks)

Any third party means Outsourcing. Outsourcing does not solve the issue; it typically creates additional issues, even if they are temporarily not visible (ClownFlare does not make money yet, so a "big squeeze" is impending and Google is not search anymore).

Several of our articles this month got over 3,000 views and we do not depend on Google, social control media, Gulag Noise (Google News), "Hacker" "News" etc. We have our loyal readership, i.e. people who come back not because "Google told me to..." (so-called 'search')

Many people don't know this, but way back in 2006 we made a "download site" option available (our database was relatively small back then and a WordPress plugin existed to make a database available sans sensitive things like user accounts). For about a year this whole site was available for download, but the site grew too big and it was no longer feasible to generate the dump on the fly and serve requests. These requests were nightmarish. They caused PHP timeouts and MySQL strain.

So what next for data?

Well, we considered what we can install for self-hosted search, seeing what's available that is Free software and is also more potent than just a database scan (over fields like title and body).

Search can help, wiki pages can help even more, but ideally we may go back in time and turn the site into a kind of hierarchical 'book' (a big project! Big but still feasible). It's still debated in IRC.

I quit my job so that I can devote more of my time to promotion of Software Freedom, abolition of software patents etc.

While we continue to discuss the best way to organise information in this site (suggestions welcome, IRC would work best) we remind readers that we're actively seeking help with server bills. We want to keep going for more than a decade to come and help from readers enables us to spend more time researching, writing, tidying up existing material (lots of wiki refactoring to come over the Christmas period), maybe adding a self-hosted search facility.

Dog Golden Retriever Card: Watercolor painting of a golden retriever dog holding a leash

Other Recent Techrights' Posts

Libya's Share on the Web: 5.2% GNU/Linux
GNU/Linux has hit an all-time high there
Codecs and Software Patents - Part VI - The European Patent Office, Nokia, Microsoft, Sisvel, and More
Whatever Nokia used to be, it's certainly not an ally and a lot of the turmoil at the EPO is the fault of companies like Nokia
 
Links 11/05/2026: Another Oracle Setback and Mass Layoffs in Iran
Links for the day
Gemini Links 11/05/2026: Older Can Be Faster and Textmode Workflow
Links for the day
Links 11/05/2026: The Solicitors Regulation Authority (SRA) Admits It Only Reacts When It's Too Late (Damage Already Done), Ombudsman’s Animal Cruelty HK Report
Links for the day
If It Takes You a Second to Serve (or Receive) a Page, That's Definitely Too Slow
For speeds at milliseconds (e.g. for pages to fully load in a tenth of a second) the pages must be ready to be sent as soon as they're requested
It's Not About Speed, It is About Patience and Adherence to Truth, Principles, Scientific Integrity
attacks on us only ever made us stronger - a lesson that our adversaries have learned the hard way
Cyber Show Does it Like Techrights: Static and Gemini Protocol as 'First-Class Citizen'
HTML and GemText (over Gemini Protocol) would be rendered in tandem
SLAPP Censorship - Part 73 Out of 200: Microsoft's Graveley and Garrett Remain Closely Connected in May 2026 ("Tag-Teaming" Against Bloggers in Another Continent)
The phrase "judge a person by their friends" seems applicable here
Discussions About When the Axe Falls at IBM/Kyndryl (11,000 Layoffs Estimated)
"Kyndryl restructuring should reduce overhead functions and reduce the number of managers that lack technical knowledge"
A World After Microsoft (and GAFAM) and After GitHub Shuts Down
the only growth area is debt
Fake News, Propaganda, and Misinformation: Microsoft Investing Money It Does Not Have in "Hey Hi" (for "Entertainment Purposes" Only)
This will not end well
Today the Whole European Patent Office (EPO) is on Strike and Next Monday an Even Bigger Strike
the media refuses to cover these and is thus complicit
The Corrupt Lecture the Non-Corrupt - Part IXX - EPO Management Speaks of Reputation and Integrity While Putting Cocaine Addicts in Management
If the EPO values its "reputation", then it needs to start by ousting the management
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, May 10, 2026
IRC logs for Sunday, May 10, 2026
Links 11/05/2026: Security Breaches, Politics, and Energy Crunch
Links for the day
Gemini Links 10/05/2026: "Accidental Cameras" and "Addictive" Interfaces in Social Control Media
Links for the day
Codecs and Software Patents - Part V - A Reminder That GAFAM and the European Patent Office (Which Serves American Monopolists) Do Considerable Harm to the Commons and Culture
some 'breaking' developments
Gemini Links 10/05/2026: Inkscape, Guix, and Alhena 5.5.8
Links for the day
The "Alicante Mafia" at the European Patent Office (EPO) Experiments With New Methods for Crushing Industrial Actions
Open letter to VP1 and the COO [...] What does this tell us about the status quo at the European Patent Office, Europe's second-largest institution?
The Corrupt Lecture the Non-Corrupt - Part XVIII - "The European Patent Office (EPO) has a zero-tolerance policy for fraud" (except when managers do it)
The guidebook of the EPO says fraud is not to be tolerated, but who enforces or revisits such "Red Lines"?
Links 10/05/2026: Hantavirus Brings Back 'Contact Tracing' Surveillance, "Staple Food Prices Soar in Iran"
Links for the day
Microsoft XBox Staff Know They're in Trouble, They Try to Unionise Ahead of Mass Layoffs
As the slang goes, it's going to be a "bloodbath"
Links 10/05/2026: Fake Suicide Notes and New EU Restrictions on Slop
Links for the day
SLAPP Censorship - Part 72 Out of 200: Microsoft's Graveley and Garrett Signed Documents That Hold Them Accountable to Truth and Liable for Lies
Such collaborations are unsavoury and apparently unprofessional, too
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, May 09, 2026
IRC logs for Saturday, May 09, 2026
Gemini Links 10/05/2026: Travelling to Van and "Dark Mode" as Passing Fad
Links for the day
IBM's Kyndryl Holdings Inc Sank 70-75% in 'Value' in 10 Months, Will IBM Follow?
Kyndryl Holdings Inc now has a debt considerably higher than this company is said to be 'worth'!
Belated Sovereignty: GNU/Linux in Iran Skyrockets to 6% Amid Armed Conflict
unless they're truly in control of their networks, hardware and software, somebody else can control them
Gemini Links 09/05/2026: Liberation, The Nocturnals, Rediscovering Internet Radio, and More
Links for the day
Links 09/05/2026: Kremlin’s Biggest Day of the Year and FBI's Attack on the Media (to Save Face)
Links for the day
Google is "Bullshit"
Fix your slop, Google. It's broken.
SLAPP Censorship - Part 71 Out of 200: 5RB Barristers Made Tens of Thousands of Pounds by Changing From Plural to Singular for Microsoft's Graveley and Garrett
Could not even get the client's name right
Links 09/05/2026: "Grand Theft Oil Futures" and Mass Layoffs at Verizon
Links for the day
Gemini Links 09/05/2026: Inkscape "Copy Text Style" and NomadNet
Links for the day
The Corrupt Lecture the Non-Corrupt - Part XVII - European Patent Office (EPO) Management Not Sharing Responsibility for Financial Resources
For those who wonder, EPO strikes are still going on
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, May 08, 2026
IRC logs for Friday, May 08, 2026