11.08.10

Gemini version available ♊︎

Commentary: StatCounter ‘Global’ Statistics

Posted in GNU/Linux at 8:10 pm by Dr. Roy Schestowitz

StatCounter bias

Summary: How StatCounter turns 4-5% of the world’s population into 25% and reduces the world’s largest Internet population (China) to just 2.46%, then claims to be measuring global market share (other surveys do the same thing)

AL submits: “Thank you for all your hard work in bringing us news through Techrights. I am reading it daily and find lots of interesting information.

“I read one of the comments from Mad Hatter in which he was talking about Wikipedia article on OS market share. I went to check it out and found that they use 1% for Linux (globally) based on the research by StatCounter Global. I was interested to see how this group is gathering their statistical data. If you go to their FAQ section they talk about sample size per country/region and there is a link to the full list of all countries. As they stated themselves their pool is 16,3 bln hits. Quite large I would say. But there is something interesting – the biggest group (region) is United States with 3,965,972,279 hits. That is almost 25% of the total pool. Now, my days of statistical studies are long gone but I still remember that in order to have accurate result you cannot over-represent one group. The result will be obviously skewed. We have one country that contributes almost 25% to the result compared to the rest of the world. As StatCounter states that they choose randomly that makes it very likely that lots of data on hits would be taken from USA. You know, for example, how much is the share of hits from China? 2,46%! In fact, looking at the whole list you can see that starting from Korea and further down the share is less than 1%! That includes countries like Poland, Greece, Japan, Russia, Switzerland etc.

“The result will be obviously skewed. We have one country that contributes almost 25% to the result compared to the rest of the world.”
      –Al
“I know some can say that there are many more computers sold in USA than in other countries (can’t be true). But market share is more complex. If we have 95% (example) Linux presence on desktops in China, they would hardly make any influence with representation of only 2,46% on the StatCounter data. Do you see what I mean? There are of course many more problems with that. What kind of websites StatCounter is using to get hits? If we put hit counter on the website with Silverlight I don’t think we will get many hits from Linux OS desktops, right? And even if the websites are getting hits from same amount of Linux OS and other OS desktops what will happen? StatCounter will randomly select hits from global pool and as data from USA will be more likely to get selected it will greatly skew the result and linux will always get under-represented. Lets say you have two crates: one with 10 pears and one with 250 tomatoes + 150 pears and you draw five times. However 3 times from first crate and 2 times from the second. You will have selected more pears than tomatoes. Even though there are 250 tomatoes and 150+10=160 pears. Is this reliable representation?”

Share in other sites/networks: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • email

Decor ᶃ Gemini Space

Below is a Web proxy. We recommend getting a Gemini client/browser.

Black/white/grey bullet button This post is also available in Gemini over at this address (requires a Gemini client/browser to open).

Decor ✐ Cross-references

Black/white/grey bullet button Pages that cross-reference this one, if any exist, are listed below or will be listed below over time.

Decor ▢ Respond and Discuss

Black/white/grey bullet button If you liked this post, consider subscribing to the RSS feed or join us now at the IRC channels.

8 Comments

  1. StatCounterGlobalStats said,

    November 9, 2010 at 3:59 am

    Gravatar

    Hi there,

    I’ve just stumbled across your article and hope to clear up some confusion.

    Our StatCounter Global Stats measure various market share and other stats for all countries across the globe… hence the name.

    Our methodology is very simple and we’ve purposefully kept it that way. Specifically, our stats are based on more than 15 billion hits per month to our 3 million+ member sites. We’re not aware of any other publicly available service providing market share stats that has a bigger sample size on which they base their information.

    You’re absolutely correct about us NOT weighting our data. We do not impose artificial weightings on our stats and this is a conscious and deliberate decision. Weighting stats means that the stats are only as good as the weighting methodology used. If the weighting data is inaccurate or out of date, then it renders the data completely incorrect. For these reasons again, we choose NOT to weight our data in any way and instead we report it as we record – other commentators can, however, weight the data as they wish. All our work is shared under a Creative Commons Attribution-Share Alike License (http://creativecommons.org/licenses/by-sa/3.0/) for this specific purpose – so please feel free to download our data and apply whatever weights you see fit.

    StatCounter Global Stats came about because we decided to publicly share interesting trends that we were monitoring in-house. We aim to make our stats and methodology as clear as possible and appreciate all comments, queries and suggestions. If you have any questions for us, please don’t hesitate to contact us via our feedback form (http://gs.statcounter.com/feedback) or by direct email.

    Dr. Roy Schestowitz Reply:

    Hi,

    A few quick points:

    1. The size of the sample does not matter at all. Other companies like NetApps also brag about the number of UIPs, but this number is meaningless unless distributed correctly (nature of the sites sampled, geography, etc.)
    2. How does the data account for dynamic IPs, proxies/squid, and the imbalanced use of the Web browser depending on the user (e.g. # of page requests; this can be correlated to operating systems and browser, e.g. does it support tabs? What is the connection speed?)
    3. How are zombies PCs and other ‘junk’ traffic removed from the dataset?

    There are many other challenges/deficiencies, but it’s commendable when the data and methods (preferably code) are made publicly available for independent audits, provided of course they don’t violate privacy rule (which is a hard problem when browsers are specified very precisely with locational information too). That’s why such surveys cannot reach privacy-conscious sites, many of which appeal to civil rights-aware users (many would GNU/Linux), and that’s just one example. I wrote an article on the subject 3 years ago:

    http://itmanagement.earthweb.com/osrc/article.php/3687616/Can-Linux-Adoption-Ever-be-Accurately-Gauged.htm

  2. Matsi said,

    November 9, 2010 at 1:12 pm

    Gravatar

    Here are the statcounter results for Finland

    http://gs.statcounter.com/#os-FI-monthly-200910-201010
    (showing about 2,5-2,7% for Linux)
    …and here are the results of my homepage (non-geek, 95% non-computer, mostly politics, social discussion…)

    Windows 2785 83.18%
    Linux 302 9.02%
    Macintosh 185 5.53%
    Unknown 73 2.18%
    FreeBSD 2 0.06%
    CPM 1 0.03%

    (visiters: Finland 94,4%, Sweden 0,9, USA 1,1% others 3,4%)

    No big differences between clicks, unique visitors, unique sessions…

    There are several finns telling the same story: Linux has about 7-10% marketshare on their homepages, blogsites…

    So one thing is sure – Linux has much bigger marketshare in Finland than statcounter is telling. My guess is 8% +/- 1%. There is no doubt that situation is quite the same in other regions. Linux is some 3 times bigger than statscounter is claiming.

    Dr. Roy Schestowitz Reply:

    Statscounter needs to tell which actual sites in Finland it is sampling from. These are not randomly selected Finns.

    StatCounterGlobalStats Reply:

    There is still considerable confusion here!

    Our methodology is here:
    http://gs.statcounter.com/faq#methodology

    We are NOT sampling websites from Finland.

    Geo is determined via IP address based on location of visitor NOT location of website. Nothing is randomly selected either – we publish everything we track.

    In other words our stats for Finland are based on all hits we track (approx 60 million per month) from Finland (i.e. IP addresses in Finland) to all our 3 million plus member sites.

    If anyone has further questions, please do submit them to us directly – we’re more than happy to deal with any and all queries.

    Thanks!

    Dr. Roy Schestowitz Reply:

    That does not change my point of argument. What are those 3m+ sites? There are far more sites than that on the Web (IIRC, over 100m domains registered).

    What is the geographical distribution of these 3m+ sites? What proportion of them is Finnish for example? What proportion is Chinese or Brazilian?

    Danielh Reply:

    If i get this right you are sampling 3 million sites out of the 220 million sites available?

    I really hope those sites are very spread out in target groups etc because else there seems to be a great margin for error. Do they include any of the bigger sites like Google, Facebook, Slashdot, Youtube, QQ, Baidu, Blogger, Twitter etc or is it just smaller sites?

    Dr. Roy Schestowitz Reply:

    To be fair to StatCounter, it is an exceptionally hard problem to solve because of its massive scale. I just hope they make all of their experimental data and methods public. In academia we can hardly even publish a paper without this most fundamental requirement, not to mention rigourous scrutiny (no statistician would accept StatCounter’s charts without a challenge).

DecorWhat Else is New


  1. Links 24/10/2021: GPS Daemon (GPSD) Bug and Lots of Openwashing

    Links for the day



  2. Links 24/10/2021: XWayland 21.1.3 and Ubuntu Linux 22.04 LTS Daily Build

    Links for the day



  3. IRC Proceedings: Saturday, October 23, 2021

    IRC logs for Saturday, October 23, 2021



  4. Links 24/10/2021: Ceph Boss Sage Weil Resigns and Many GPL Enforcement Stories

    Links for the day



  5. GAFAM-Funded NPR Reports That Facebook Let Millions of People Like Trump Flout the So-called Rules. Not Just “a Few”.

    Guest post by Ryan, reprinted with permission



  6. Some Memes About What Croatia Means to the European Patent Office

    Before we proceed to other countries in the region, let’s not forget or let’s immortalise the role played by Croatia in the EPO (memes are memorable)



  7. Gangster Culture in the EPO

    The EPO‘s Administrative Council was gamed by a gangster from Croatia; today we start the segment of the series which deals with the Balkan region



  8. The EPO’s Overseer/Overseen Collusion — Part XXI: The Balkan League – The Doyen and His “Protégée”

    The EPO‘s circle of corruption in the Balkan region will be the focus of today’s (and upcoming) coverage, showing some of the controversial enablers of Benoît Battistelli and António Campinos, two deeply corrupt French officials who rapidly drive the Office into the ground for personal gain (at Europe’s expense!)



  9. Links 23/10/2021: FreeBSD 12.3 Beta, Wine 6.20, and NuTyX 21.10.0

    Links for the day



  10. IRC Proceedings: Friday, October 22, 2021

    IRC logs for Friday, October 22, 2021



  11. [Meme] [Teaser] Crime Express

    The series about Battistelli's "Strike Regulations" (20 parts thus far) culminates as the next station is the Balkan region



  12. Links 23/10/2021: Star Labs/StarLite, Ventoy 1.0.56

    Links for the day



  13. Gemini on Sourcehut and Further Expansion of Gemini Space

    Gemini protocol is becoming a widely adopted de facto standard for many who want to de-clutter the Internet by moving away from the World Wide Web and HTML (nowadays plagued by JavaScript, CSS, and many bloated frameworks that spy)



  14. Unlawful Regimes Even Hungary and Poland Would Envy

    There’s plenty of news reports about Polish and Hungarian heads of states violating human rights, but never can one find criticism of the EPO’s management doing the same (the mainstream avoids this subject altogether); today we examine how that area of Europe voted on the illegal "Strike Regulations" of Benoît Battistelli



  15. The EPO’s Overseer/Overseen Collusion — Part XX: The Visegrád Group

    The EPO‘s unlawful “Strike Regulations” (which helped Benoît Battistelli and António Campinos illegally crush or repress EPO staff) were supported by only one among 4 Visegrád delegates



  16. [Meme] IBM Has Paid ZDNet to Troll the Community

    Over the past few weeks ZDNet has constantly published courses with the word "master" in their headlines (we caught several examples; a few are shown above); years ago this was common, also in relation to IBM itself; clearly IBM thinks that the word is racially sensitive and offensive only when it's not IBM using the word and nowadays IBM pays ZDNet — sometimes proxying through the Linux Foundation — to relay this self-contradictory message whose objective is to shame programmers, Free software communities etc. (through guilt they can leverage more power and resort to projection tactics, sometimes outright slander which distracts)



  17. [Meme] ILO Designed to Fail: EPO Presidents Cannot be Held Accountable If ILOAT Takes Almost a Decade to Issue a Simple Ruling

    The recent ILOAT ruling (a trivial no-brainer) inadvertently reminds one of the severe weaknesses of ILOAT; what good is a system of accountability that issues rulings on decisions that are barely relevant anymore (or too late to correct)?



  18. Links 22/10/2021: Trump's AGPL Violations and Chrome 95 Released

    Links for the day



  19. [Meme] How Corporate Monopolies Demonise Critics of Their Technically and Legally Problematic 'Products'

    When the technical substance of some criticism stands (defensible based upon evidence), and is increasingly difficult to refute based on facts, make up some fictional issue — a straw man argument — and then respond to that phony issue based on no facts at all



  20. Links 22/10/2021: Global Encryption Day

    Links for the day



  21. [Meme] Speaking the Same Language

    Language inside the EPO is misleading. Francophones Benoît Battistelli and António Campinos casually misuse the word “social”.



  22. António Campinos Thinks Salary Reductions Months Before He Leaves is “Exceptional Social Gesture”

    Just as Benoît Battistelli had a profound misunderstanding of the concept of “social democracy” his mate seems to completely misunderstand what a “social gesture” is (should have asked his father)



  23. IRC Proceedings: Thursday, October 21, 2021

    IRC logs for Thursday, October 21, 2021



  24. Links 21/10/2021: MX Linux 21 and Git Contributors’ Summit in a Nutshell

    Links for the day



  25. [Meme] [Teaser] Miguel de Icaza on CEO of Microsoft GitHub

    Our ongoing series, which is very long, will shed much-needed light on GitHub and its goals (the dark side is a lot darker than people care to realise)



  26. Gemini Protocol and Gemini Space Are Not a Niche; for Techrights, Gemini Means Half a Million Page Requests a Month

    Techrights on gemini:// has become very big and we’ll soon regenerate all the pages (about 37,500 of them) to improve clarity, consistency, and general integrity



  27. 'Satellite States' of EPO Autocrats

    Today we look more closely at how Baltic states were rendered 'voting fodder' by large European states, looking to rubber-stamp new and oppressive measures which disempower the masses



  28. [Meme] Don't Mention 'Brexit' to Team UPC

    It seems perfectly clear that UPC cannot start, contrary to what the EPO‘s António Campinos told the Council last week (lying, as usual) and what the EPO insinuates in Twitter; in fact, a legal challenge to this should be almost trivial



  29. The EPO’s Overseer/Overseen Collusion — Part IXX: The Baltic States

    How unlawful EPO rules were unsurprisingly supported by Benoît Battistelli‘s friends in Baltic states; António Campinos maintained those same unlawful rules and Baltic connections, in effect liaising with offices known for their corruption (convicted officials, too; they did not have diplomatic immunity, unlike Battistelli and Campinos)



  30. Links 21/10/2021: GIMP 2.99.8 Released, Hardware Shortages, Mozilla Crisis

    Links for the day


RSS 64x64RSS Feed: subscribe to the RSS feed for regular updates

Home iconSite Wiki: You can improve this site by helping the extension of the site's content

Home iconSite Home: Background about the site and some key features in the front page

Chat iconIRC Channel: Come and chat with us in real time

Recent Posts