EditorsAbout the SiteComes vs. MicrosoftUsing This Web SiteSite ArchivesCredibility IndexOOXMLOpenDocumentPatentsNovellNews DigestSite NewsRSS

04.16.20

The Latest GitHub-Free Research: An Introduction or Update

Posted in Free/Libre Software, GNU/Linux, Microsoft at 11:23 pm by Guest Editorial Team

Article by figosdev

The gearscape
Microsoft interjects itself as dependency to control the competition

Summary: An examination of how dependent on Microsoft’s proprietary jail (GitHub) GNU/Linux distros have become

When I started to find old favourites were hosted on GitHub, I became increasingly interested in what was there. The more I looked, the more I found. I’ve tried to get a better idea of the scope of the problem ever since.

Automation (scripting, mostly) can and does assist me in this, but there is no automated way to determine if a project is GitHub-based or not. I’ll check DistroWatch, Wikipedia, frequently the project’s own website, but some of the discoveries require going into the file — particularly when that file is an ISO image.

“The goal is not to get perfect data (simply impossible with the number of people involved) but to get finer resolution and hopefully gain accuracy with each pass and research stage.”I started with a list that reached about 100 to 120 popular applications and distros that are based on GitHub. It was a first effort, also the most casual, and when I “audited” the list later on, I found it “relying on GitHub” if someone happens to do development and issues there, and relying on it for hosting are all a problem in my opinion. That’s what I’m looking for — projects that can actually be hurt if Microsoft either pulls the plug or reduces access.

I’m certainly not impressed by their recent invitation to use formerly premium features. GitHub is a trap, and now they’re opening the trap a little wider.

This is an imperfect science, but the more information we have, the better. Some of the research I’m doing builds on what I know, but some of the research is redundant by design, and acts as a check on previous research. The goal is not to get perfect data (simply impossible with the number of people involved) but to get finer resolution and hopefully gain accuracy with each pass and research stage.

“When I was ranking levels of distro dependency on GitHub, naturally I put actually being hosted and developed on GitHub as a key problem, if not the worst.”The big picture matters as much as the details. Syslinux is more or less necessary in my opinion, and is not GitHub-based. But I found out while writing this that syslinux uses mkdiskimage — a Perl script. Am I going to count this as needing GitHub? Probably not, but I’m going to make note of it here.

It is necessary, during each stage of this research, to choose a methodology. When I was ranking levels of distro dependency on GitHub, naturally I put actually being hosted and developed on GitHub as a key problem, if not the worst.

I’m trying to build a practical, useful idea of where the hope is, and where it isn’t — and each stage is practical in the sense that previous research sheds light on what I’m doing now, and vice-versa. Looking for applications helped me rank dependent distros. Ranking distro dependency helped me find new (and more vital) applications and frameworks. This is the best kind of research, where nearly every bit of it helps in some way.

“Tiny Core was one of the 33 most GitHub-free distros, of the 275 examined in my previous research.”Ideally, we want all of the bad news we can find, and all of the good news we can find. As for how it’s done, every methodology has pros and cons. Right now I’m trying to find the best way to examine Tiny Core Linux, and I’ve spent days looking at CorePlus in particular — because I’m also trying to migrate to a less GitHub-dependent distro.

Tiny Core was one of the 33 most GitHub-free distros, of the 275 examined in my previous research. It’s also one I’m familiar with; I was one of the early Tiny Core users after having purchased the book Shingledecker and Andrews did. Their disagreement over how to do things led to Tiny Core, and I think it’s one of the better (beneficial) separations and forks that have happened in Free software. It’s also fun to take apart and explore.

I discovered (or rediscovered, after years of being away) that TC uses .info files to map package dependencies. If you’ve ever written a recursive function, even if you don’t do it all the time, it’s very helpful for this sort of thing (I sometimes teach beginner-level coding, and recursion comes up but not a lot.)

This hastily-constructed python function let me recursively create lists of packages that needed packages that needed packages that needed for example, libffi.tcz:

#### license: creative commons cc0 1.0 (public domain) 
#### http://creativecommons.org/publicdomain/zero/1.0/ 
which = sys.argv[1:]
if not which: which = ["libffi.tcz"]

def wget(ls):
    copy = []
    for which in ls:
        now = "ls *.dep" 
        for each in figarrshell(now):
            if each: 
                pf = open(each)
                p = pf.read().replace(chr(13) + chr(10), chr(10)).replace(chr(13), 
    chr(10)).split(chr(10)) ; pf.close()
                for ln in p:
                    if ln == which and len(ln) > 0:
                        pr = each.replace(".dep", "")
                        if pr not in copy and pr not in ls: copy += [pr] ; print pr
    if copy:
        wget(copy)

wget(which)

I also found that open().read() works slightly differently in CPython (GitHub) and PyPy (foss.heptapod.net) although figarropen() does work with PyPy as a drop-in replacement for Python 2. What’s different is the limit of open files you can have; I always assumed (without evidence to the contrary) that open().read() closes the file (as does “with” in Python.)

In PyPy, it got to about 1,200 open files (I was opening 2,321 files but assumed they would be closed when read) before complaining that too many were open. I simply opened the file separately from the read() method, so I could run close() after read(). This would be a good change to add to fig as well, so it works better with PyPy. Even fig (most recent update, 2017) stands to benefit from this research.

But the methodology I’m using is to look at the 2,321 packages in TC 11.x, of which I’ve already dealt with 1,200 — and find out in what ways Tiny Core “needs” GitHub, or what could be removed to make it less dependent.

I think we could strip out most of the dependencies — not likely all — but which ones? And where are they? That’s the goal of this stage of research, to find out what is what and which is which.

“The fact that the most-needed “optional” package is based on GitHub (since May 2016) is quite relevant — 739 (39%) of the 2,321 TC packages need it!”I’ve already got a list of how many packages need each package, ranked from most to least. Tiny Core is 16mb, CorePlus is 206mb. I know the biggest difference; CorePlus includes a lot more packages. But I’ve also used Tiny Core more than CorePlus, I know a lot of its limitations, and I’m largely focused on the packages themselves.

Most or all of these packages are technically “optional” (many of them need several others, but that’s more or less so with packages in practically every distro) and of course, for example if you have a gtk3 app, it needs gtk3 of reasons that should be obvious. Be assured that I am no fan of doing dependencies to excess, but some dependencies have to be considered reasonable.

I don’t know enough about libffi to judge its technical merits or lack thereof. I discovered it through this research (I write scripts mostly, not C or C++ and I don’t use ctypes, though I know what it is) because it came up as the “most-needed optional package” in all of this. The fact that the most-needed “optional” package is based on GitHub (since May 2016) is quite relevant — 739 (39%) of the 2,321 TC packages need it!

Of those 739 libffi-needing packages, 712 of them need glib2 as well (that’s GNOME lib, not GNU libc) and ALL of the glib2-based packages need libffi, so perhaps we have GNOME to thank once again. As much as I love blaming them for things, I am unconvinced they’re to really blame this time, though the fact remains that if you were to get rid of glib2 then you would also get rid of all but 27 libffi-based packages. You would also get rid of gtk I think, but I’m not suggesting we do that. We build the practical decisions on top of the hard data — I’m more immediately interested in the data, but the time for decision-making will come.

“Libffi is GitHub, glib2 is Gitlab-based (at any rate, not GitHub) though glib2 does pull in libffi.”These are far from isolated examples, indeed the currently methodology is being built on examples like this, which makes them an ideal illustration. Libffi is GitHub, glib2 is Gitlab-based (at any rate, not GitHub) though glib2 does pull in libffi.

We want to show:

* As many of these relationships as possible
* With some method of ranking priority, so we know what to check
* In some way that shows possible outcomes based on various possible decisions

So the imperfect methodology used here tries to do all of that.

First, I created a folder called libffi. Libffi is GitHub so if we want to get rid of GitHub entirely, we would have to get rid of every package that needs libffi. I don’t expect that to happen, I do want to provide as much information on that as necessary so people can at least determine / rate / track / plan / troubleshoot how practical it is for Free software to be independent of a monopolistic company that has always wanted to enslave, tax, control and own it.

I moved all 739 (39%) of the TC packages which need libffi.tcz to the libffi folder. This seems almost too simple to be useful, but we keep going on our mission of discovery.

The next step was to create a folder for glib2. It’s not because glib2 depends on libffi, I didn’t know that. I learned that because glib2.tcz is the next on the list (712 packages need it) and none of them were in the pool — they were all in the libffi folder. Now we’re learning something.

“The numbers alone do not excite me, though the picture I’m trying to uncover, quantify and find the “shape” of, is of interest.”So instead of creating a glib2 folder, I created a libffi/glib2 folder. Most (in fact, all) of the glib2-based packages need libffi, so I just put them under libffi — but now we know which libffi packages do need glib2 and which don’t. Our little hierarchy shows all of that data for us to base decisions on.

And I’m well aware that this system isn’t going to stay perfect or neat as it moves forward, it isn’t designed to be. My only goal here is to outdo (build on) the previous GitHub-free research, so that new strategies can be gleaned from this. I learned plenty from examining 275 distros, now I want to examine this one in detail. It just happens to be very good for the purpose — this will tell us about a lot more than just Tiny Core. It already tells us something about libffi and (most likely) about gtk applications. Some of these things will turn out to be specific to Tiny Core. Many will apply broadly. That could be another research project.

Moving forward, I’ll share the current hierarchy as it is created. It’s not perfectly self-documenting, so I’ll document what’s there so far (this part is the “introduction” referred to in the title.)

core/libffi: the packages that need libffi.tcz

core/libffi/gitlab: major (next/high on the list) gitlab-based rather than GitHub-based packages that need libffi.tcz

core/libffi/gitlab/glib2: glib2 is gitlab-based; the packages in this folder need both glib2.tcz and libffi.tcz

core/selfhost: (next/high on the list) self-hosted (not GitHub or gitlab) packages

core/selfhost/liblzma: liblzma.tcz is next on the list, self-hosted (part of xz utils) and here are the packages that need it.

“People who do databases might correctly assume that an RDB would be better way to organise this data.”The list of course, is incomplete — that’s acceptable for this methodology. We already have a full list of required packages for each package; that’s how we know there are 689 packages that need liblzma (making it #3 on the list, after glib2) — we counted them!

For the purpose of this research, it’s (only sometimes arbitrarily) more relevant that a package needs a more-needed package than a less-needed package. So instead of the 689 packages we already have a list of, the liblzma sub-hierarchy simply holds the ones “remaining” after “larger” hierarchies are counted. Including subfolders, core/selfhost/liblzma contains 97 packages; the others are needed also by “bigger” “more important” (debatable, hence “sometimes arbitrary”) packages that we already sorted.

We could simply get all the data on every package. By “we”, I mean people I don’t know who haven’t signed up to do anything about this. Since I’m doing this, I’m using this imperfect system to discover new areas of focus — I probably can’t make myself do detailed research on EVERY one of the 2,321 packages, so I’m using this system to discover the “most important” problems, based on a methodology that has both pros and cons.

I wouldn’t bother with this if I didn’t think this approach would shed additional light on the bigger picture. The numbers alone do not excite me, though the picture I’m trying to uncover, quantify and find the “shape” of, is of interest.

“As of this writing, GitHub is only being used for squashfs-tools that create the file systems; the development of squashfs support for the kernel is still on kernel.org.”Since this methodology “reveals” that liblzma is important — and I’ve learned other things too, this method helps me decide where to pay more attention: I learned that liblzma is part of xz utils, which I didn’t know; and I didn’t know that xz utils was started by Slackware enthusiasts — which is both cool, and maybe says something else (something fundamental) about liblzma. You decide what it means to you. This hierarchical system serves as a score — and for me, a curriculum.

Which leads to the next concept — noting projects that use GitHub within the hierarchy (where I can spot them or find new ones.)

core/selfhost/liblzma/github: are projects based on GitHub that need liblzma.tcz. It’s liblzma that is self-hosted, not the projects in this subfolder; otherwise this path would contradict itself.

People who do databases might correctly assume that an RDB would be better way to organise this data. They would probably be technically correct (I don’t normally use databases) but they would be missing the point that we don’t actually have the data yet — the only reason core/selfhost/liblzma/github exists is because core/selfhost/liblzma “told me” to take some time to look for GitHub-based things that needed liblzma.

Will I find them all now? Probably not, but this system informed me to take time on this. We keep trying to shape the data based on relevance, not unlike the earliest versions of the (once fairly straightforward, but also easily-gamed if you want website prominence) PageRank algorithm.

“I figured that if I wanted to focus on fixing this GitHub dependency, I could simply make a live distro that uses a file img formatted with ext3 instead.”Incidentally, core/selfhost/liblzma/github includes squashfs-tools (a feature very important to Tiny Core and also to most Live distros, as noted in my previous article) but because it came up again here, I had another look. As of this writing, GitHub is only being used for squashfs-tools that create the file systems; the development of squashfs support for the kernel is still on kernel.org.

In practical terms, this suggests a hypothetical, completely GitHub-free distro could be used to create a tool that reads and converts .sfs files to some other compressed filesystem, though a GitHub-free tool could not (at this time) be made to produce new .sfs images — only convert them to something else.

I figured that if I wanted to focus on fixing this GitHub dependency, I could simply make a live distro that uses a file img formatted with ext3 instead. How to do compression on the fly could be a separate issue, but I know that alternatives exist.

core/selfhost/ncursesw: means that ncursesw is self-hosted, and this is where the packages that need ncurses.tcz go. Originally there were 665 of them, though now we have 156 remaining due to “more important” packages grabbing those in the hierarchy.

core/selfhost/ncursesw/github: packages moved from ../ to ./ which are based on GitHub, or 25 of 156 packages including: python.tcz (CPython, GitHub), urwid.tcz, tmux (sorry Roy,) inxi.tcz, htop.tcz, freebasic.tcz and vim.tcz.

I think Vim is one of those few things where I’m never sure to say whether it’s really relying on GitHub or not. Since I’d hate for Microsoft to end the editor wars with a cure far worse than the disease, I hope someone can give me some truly authoritative evidence that Vim is in fact, GitHub-free. Another thing I found out while doing this is that the person who maintains ncurses is maintained by the same person who maintains the Lynx browser — and Vile. Vile appears to be GitHub-free, but this research will help determine the validity of that statement. (Vile is not packaged in TC 11.x.)

core/github: was created at this point, for the 43 packages that are actually listed in the .info file as being GitHub-based, minus at least one (pax-utils.tcz, as Gentoo’s GitHub is a mirror.)

“With over 1,200 files in these folders, more than 50% of the packages in TC are now sorted into the hierarchy.”core/selfhost/libXau-and-libXdmcp: is related to X and these two packages had identical lists, except for libXau-dev.tcz and libXdmcp-dev.tcz, respectively.

core/selfhost/libXau-and-libXdmcp/github contains 12 packages, including wbar.tcz, i3.tcz, aterm.tcz (AfterStep is GitHub-based) and fltk-1.3.tcz.

core/libxcb: was created, and should probably be moved to selfhost, though it has no packages anyway because libxcb-dev.tcz is already in core/selfhost/libXau-and-libXdmcp.

core/libX11: was created and should probably be moved to selfhost, though there’s nothing in it.

core/bzip2-lib: has 48 files, including a bunch of Perl-related packages in core/bzip2-lib/github — Perl is GitHub-based.

With over 1,200 files in these folders, more than 50% of the packages in TC are now sorted into the hierarchy. Some remain undiscovered ties to GitHub, though this process has helped find and rank new ones that are obviously important in some way.

I’m still interested in moving further down the list; the next is libXext.tcz and there are 585 packages that need it. If we try to discover how many of those 585 packages remain…

for p in $(cat ../libXext.tcz.dp) ; do ls ../$p 2> /dev/null ; done | cat -n

…Nope. Nothing there that isn’t already in the hierarchy. libXext.tcz.dep is the file that TC provides that shows a single level of dependency, libXext.tcz.dp is the file that the Python code in this article created for libXext.tcz, which shows all the packages that need it.

“Days into this, we’ve confirmed that TinyCore is indeed one of the least-GitHub-dependent distros, but we’ve also identified some the more important ways in which it is still dependent indirectly on GitHub.”We can use this to create a graph of diminishing returns on this research. Days into this, we’ve confirmed that TinyCore is indeed one of the least-GitHub-dependent distros, but we’ve also identified some the more important ways in which it is still dependent indirectly on GitHub.

I thought about making that graph, but since it’s likely to be typical and not reveal anything that isn’t obvious, I’m just going to watch a movie, eat some eggs and maybe think about getting back to this research. I’m sure it sounds terribly boring, but I continue to learn more about this subject as I explore it.

When this started, I hadn’t even thought to start with the most needed packages — the first thing I wanted to know how many packages pulled in mono.tcz or Perl or Python. Mono is not only GitHub-based, it’s one of the worst dependencies you can have. Fortunately, the only packages that pull in mono.tcz are gtk-sharp-dev.tcz, gtk-sharp.tcz, mono-dev.tcz and mono-locale.tcz. I’m only guessing that wine-mono.tcz assumes mono.tcz is installed.

If you’re trying to figure out how we can be GitHub-free in the future, I can probably save you some work — and if you have information that could be useful, by all means, let us know. With luck, this is going to help round out the wiki pages a bit as well.

Long live Stallman, and happy hacking.

Licence: Creative Commons CC0 1.0 (public domain)

Share this post: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • co.mments
  • DZone
  • email
  • Google Bookmarks
  • LinkedIn
  • NewsVine
  • Print
  • Technorati
  • TwitThis
  • Facebook

If you liked this post, consider subscribing to the RSS feed or join us now at the IRC channels.

Pages that cross-reference this one

What Else is New


  1. IRC Proceedings: Wednesday, May 27, 2020

    IRC logs for Wednesday, May 27, 2020



  2. Allegations That Microsoft Will Ruin Besieged Clinics and Hospitals to Retaliate Against Those Who Name the Culprit

    With a broader picture coming into view, as per the above index, we're starting to wrap up the series while issuing a call for more stories and eyewitness testimonies, exposing the nature of attacks on hospitals (those almost always target Microsoft and others' proprietary software, which is technically unfit for purpose)



  3. Microsoft Has Ideas...

    Based on the pattern of media coverage, composed by Microsoft MVPs and Microsoft-affiliated blogs/sites, confusing the public about the meaning of GNU/Linux is reminiscent of an "Extend" phase



  4. ZDNet Proves Our Point by Doing Not a Single Article About Linux (RC7), Only About Linus and Windows Clickbait Junk

    It seems abundantly clear that nobody wants to cover the actual news about Linux and instead it’s all about which PC Linus Torvalds is using (gossip/tabloid); ZDNet‘s latest two articles are an example of this…



  5. UPC Lies That Make One Laugh...

    IP Kat and Bristows (overlaps exist) are still pretending that the UPC is coming because reality doesn’t seem to matter anymore, only self-serving agenda



  6. Canonical Continues to Help Promote Windows Instead of GNU/Linux or Ubuntu

    Thrice in the past week alone Canonical used the official “Ubuntu Blog” to help Microsoft instead of GNU/Linux and it is part of a disturbing trend which lends credibility to jokes or rumours about a Microsoft takeover; it's not like many people use this thing, either (Canonical helps Microsoft shore up a dying/languishing EEE attempt)



  7. Links 27/5/2020: CoreOS Container Linux Reaches Its End-Of-Life, 2020 GNOME Foundation Elections Coming

    Links for the day



  8. IRC Proceedings: Tuesday, May 26, 2020

    IRC logs for Tuesday, May 26, 2020



  9. GNEW Seedlings vs. Free Software Deforestation

    “The idea of the GNEW Project really is about keeping the goals of the GNU Project alive — hopefully, they won’t destroy or co-opt too much of the GNU Project, that people like the Hyperbola devs can’t fix it with BSD.”



  10. Joi Ito Already Admitted on the Record That Bill Gates Had Paid MIT Through Jeffrey Epstein

    An important exhibit for the accurate historical record (because MIT has been trying to deny truth itself)



  11. It's Convenient to Call All Your Critics Nuts and/or Jealous

    Bill Gates antagonists are not motivated by hatred or jealousy but a sense of injustice; spoiled brats who break the law aren’t a source of envy any more than mass murderers are subject of admiration



  12. Real History of Microsoft and How It Became 'Successful'

    New video that contains a portion about the history of Microsoft -- the part paid-for 'journalists' (paid by Microsoft and Bill Gates) rarely or never speak about



  13. Hostility and Aggression Towards Staff That Does Not Use Windows After Windows Takes Entire Hospital Down

    Microsoft Windows, with NSA back doors, continues to take hospitals offline (with records copied by criminals if not stolen by effectively locking the originals out of reach for ransom money); but guess who’s being punished for it…



  14. They Came, They Saw, We Died...

    It cannot be overstated that we're under attack (or a "Jihad" against Linux as Bill Gates himself put it) and failing to act upon it will be costly as time may be running out and our groups are being 'bought off' by Microsoft in rapid succession, as per the plan/strategy



  15. The GitHub Takeover Was an Extension of Microsoft's War on GPL/Copyleft (Because Sharing Code to Anyone But Microsoft is 'Piracy')

    Licences that make it easier for Microsoft to 'steal' (or a lot harder for Free software to compete against proprietary software) are still being promoted by Microsoft; its GitHub tentacles (see GitHub's logo) further contribute to this agenda



  16. ZDNet is Totally a Microsoft Propaganda Machine

    The site ZDNet has become worse than useless; it lies, defames and launders the reputation of famous criminals (that's the business model these days)



  17. When Microsoft's Mask Falls (or When Times Are Rough)

    Microsoft loves Linux in the same sense that cats love mice (they might play with them until they get hungry)



  18. Careers in Free Software Aren't Careers in the Traditional Sense

    With historic unemployment rates and people 'stranded' inside their homes there's still demand and need for technology; these times of adaptation present an opportunity for Software Freedom



  19. Embrace, Extend, Extinguish 2020 Edition

    Embrace, Extend, Extinguish (E.E.E.) is alive and well, but the corrupt (paid by Microsoft) media isn't talking about it anymore; in fact, it actively cheers and encourages people/companies to enter the trap



  20. Links 26/5/2020: SHIFT13mi GNU/Linux Tablet, Linux Kodachi 7.0 and Some Qt Releases

    Links for the day



  21. EPO Propaganda on Steroids (or on EPO)

    What EPO management is saying and what is actually happening



  22. Breton (EU) 'Joins' Team UPC to Help His Buddy Battistelli... Again

    As expected, Breton acts as little but an EPO tool, looking to prop up supremacy of patent litigation over science and innovation



  23. Removing Free/Libre Software as an Inadequate Response to Microsoft Windows (With Back Doors) Getting Compromised, Killing People

    GNU/Linux takes the blame (in a sense) for incidents that are purely the fault of Microsoft and its deficient software with deliberate back doors; it's believed that this boils down to opportunistic retaliation against those looking for a solution to the problem (or merely speaking about the problem)



  24. IRC Proceedings: Monday, May 25, 2020

    IRC logs for Monday, May 25, 2020



  25. Under Distributed Denial of Service Attacks Lately, But We're Too Robust For Those

    Efforts to take Techrights offline have been ramped up lately; but it's not working and it hardly even distracts us from publishing



  26. The Art of Giving: Why Free Software Will Inevitably Survive Attacks Against It

    Societies that share and look after their peers/neighbours will always be better off than predatory societies, which breed exploitation, distrust, discord and eventually systemic collapse



  27. 'Journalism' in 2020: Far More Articles About What Computer Linus Torvalds Bought Than About Linux Releases

    Yesterday's (or late Sunday's) Linux announcement (RC7) is symptomatic of a broader issue we've long spoken about; it restricts people's ability to express an opinion, which can cloud any meritorious and substantial debate about technical matters journalists cannot grasp or comment on (it takes more effort and research)



  28. Links 25/5/2020: Wrapland Redone, DebConf20 Plans, Many More Games

    Links for the day



  29. Media Covers WSL Like People Actually Use This Trash (a Failed Distro Which Only Works With Windows)

    Lots of abundantly redundant puff pieces have appeared in paid-for (by Microsoft) media this past week covering WSL/2, but that's grossly disproportional to the people who care and actually use those types of things (because money talks, not technical substance)



  30. Working From Home on Patent Monopolies Would Lower Their Quality and Perceived Legitimacy

    The patent system wherein people grant monopolies from their sofas and bedrooms isn't helping the already-eroded perception/image of patent offices that mostly grant patents to massive multinationals (and far too many patents overall)


RSS 64x64RSS Feed: subscribe to the RSS feed for regular updates

Home iconSite Wiki: You can improve this site by helping the extension of the site's content

Home iconSite Home: Background about the site and some key features in the front page

Chat iconIRC Channel: Come and chat with us in real time

Recent Posts