Summary: Microsoft’s long battle against character encoding standards such as Unicode, which bridge the gap for communication between people, not just applications
HALF A decade ago we spent a lot of time here promoting open standards — the grooves for connectivity between applications, operating systems, and pertinent pieces of code. Without standards, there is little collaboration because the cost of connecting separate pieces of software is quite high.
“But to Microsoft consistency was an evil threat; it threatened its monopoly.”Assuming that collaboration is the key to rapid advancement and innovation — reusing knowledge, pooling human resources, etc. — standards are important everywhere we look, e.g. electrics, plumbing, energy, automobiles and so on. Encoding of characters is not everyone’s field of expertise; it is a low-level area of computing, akin to assembly code and little/big endian. But the principles of standards stay the same across fields and standards are almost always beneficial. I have wasted many hours of my life trying to overcome issue associated with Microsoft’s broken character encodings. It was a long time ago that people appreciated the value of consistency in some areas (not to be confused with monoculture or monopoly). But to Microsoft consistency was an evil threat; it threatened its monopoly. The Scientist published a piece called “Standards Needed”  not too long ago and Linux Journal praised Unicode , which helps bridge character encoding barriers. Thanks to Unicode, many of us out there can access and render pages in almost any language, even rare languages (and even if we cannot understand them). The Register, however, thought it would be productive to bash Unicode . And watch who wrote the piece: a Windowshead. What a surprise! █
Related/contextual items from the news:
Let’s give credit where credit’s due: Unicode is a brilliant invention that makes life easier for millions—even billions—of people on our planet. At the same time, dealing with Unicode, as well as the various encoding systems that preceded it, can be an incredibly painful and frustrating experience. I’ve been dealing with some Unicode-related frustrations of my own in recent days, so I thought this might be a good time to revisit a topic that every modern software developer, and especially every Web developer, should understand.
In the beginning – well, not in the very beginning, obviously, because that would require a proper discussion of issues such as parity and error correction and Hamming distances; and the famous quarrel between the brothers ASCII, ISCII VISCII and YUSCII; and how in the 1980s if you tried to send a £ sign to a strange printer that you had not previously befriended (for example, by buying it a lovely new ribbon) your chances of success were negligible; and, and…
But you are a busy and important person.
So in the beginning that began in the limited world of late MS-DOS and early Windows programming, O best beloved, there were these things called “code pages”.
To the idle anglophone Windows programmer (ie: me) code pages were something horrible and fussy that one hoped to get away with ignoring. I was dimly aware that, to process strings in some of the squigglier foreign languages, it was necessary to switch code page and sometimes, blimey, use two bytes per character instead of just one. It was bad enough that They couldn’t decide how many characters it took to mark the end of a line.
As far as I know, there isn’t a creation myth associated with the unification of the world’s character sets.
For Windows C++ programmers, the manifesto identifies specific techniques to make one’s core code UTF-8 based, including a proto-Boost library designed for the purpose. (Ironically, the first thing you have to do is turn the Unicode switch in the Visual C++ compiler to ‘on’.)
Next weekend I will be scraping all my Unicode files off my hard disk, taking them to the bottom of the garden, and burning them. As good citizens of the digital world, I urge you all to do the same.