Improving Site Navigation and Discovery
THE site is growing fast and people have a hard time searching for much older material. We fully recognise this limitation. It's a real peril. Many sites have the exact same limitation. This problem isn't limited to digital media, either (volumes of material, some of it outdated or unlinked).
In WordPress we used code that checks references in reverse; for any given article, it would (at the bottom) show later (future) articles that link to it. This was very CPU-intensive (at the database level), resulting in pages taking far longer to load. Unless properly cached, it would require scanning about 10 GB of text (or 40,000 blog posts' bodies, not counting drafts/revisions).
We needed to move on. Better sooner than later. Having a server screaming 24/7 to serve requests (whose growing proportion is rogue bots) is not a long-term strategy. Running a Web server on a machine with almost 100 CPU cores isn't cheap.
Before the very final post from Pamela Jones of Groklaw (just over 10 years ago) she wrote about the challenges of preserving old material. She had quit before, then came back, then retired. Fair enough, she wasn't getting young, but it was important for her to ensure the information remains accessible for many years to come (debunking lies about the GPL and origins of Linux). Some time later the site was converted into static pages (still hosted at ibiblio.org
), but some material such as old comments disappeared in the process. Geeklog had its share of limitations and apparently it's still being maintained.
Anyway, unlike Groklaw we're still going. I'm 41 and in good health. I receive help from many people and we're good to go. Nothing can stop us, even though some extremists are trying. We won't let wackadoodles waste our time. They just validate what we wrote months ago and they try to attack my wife. Misogynists are like that; they love picking on women.
So what next for search? We've long envisioned this site having self-hosted search, not that lousy WordPress search our blog used to have (it's just some lousy WordPress database scan, which is notoriously weak at delivering relevant results).
No, we don't want to rely on third parties either. We don't want to hear, "how about Google?" or "why not ClownFlare?" (Wherever or whenever there are DDoS attacks)
Any third party means Outsourcing. Outsourcing does not solve the issue; it typically creates additional issues, even if they are temporarily not visible (ClownFlare does not make money yet, so a "big squeeze" is impending and Google is not search anymore).
Several of our articles this month got over 3,000 views and we do not depend on Google, social control media, Gulag Noise (Google News), "Hacker" "News" etc. We have our loyal readership, i.e. people who come back not because "Google told me to..." (so-called 'search')
Many people don't know this, but way back in 2006 we made a "download site" option available (our database was relatively small back then and a WordPress plugin existed to make a database available sans sensitive things like user accounts). For about a year this whole site was available for download, but the site grew too big and it was no longer feasible to generate the dump on the fly and serve requests. These requests were nightmarish. They caused PHP timeouts and MySQL strain.
So what next for data?
Well, we considered what we can install for self-hosted search, seeing what's available that is Free software and is also more potent than just a database scan (over fields like title and body).
Search can help, wiki pages can help even more, but ideally we may go back in time and turn the site into a kind of hierarchical 'book' (a big project! Big but still feasible). It's still debated in IRC.
I quit my job so that I can devote more of my time to promotion of Software Freedom, abolition of software patents etc.
While we continue to discuss the best way to organise information in this site (suggestions welcome, IRC would work best) we remind readers that we're actively seeking help with server bills. We want to keep going for more than a decade to come and help from readers enables us to spend more time researching, writing, tidying up existing material (lots of wiki refactoring to come over the Christmas period), maybe adding a self-hosted search facility. █