EditorsAbout the SiteComes vs. MicrosoftUsing This Web SiteSite ArchivesCredibility IndexOOXMLOpenDocumentPatentsNovellNews DigestSite NewsRSS

10.02.08

OOXML Leaked: The Stuff ISO Doesn’t Want You to Have (Updatedx9)

Posted in Europe, IBM, ISO, Microsoft, Standard at 8:47 pm by Dr. Roy Schestowitz

[Update: Marius has produced this HTML version which is easiest to browse and requires no large-sized downloads. Another reader, Tony Manco, has produced this HTML version (another mirror… and another) of the core of OOXML so that you can access the specs quickly.]

In light of the systematic abuse and the demise of ISO, which IBM loudly protested against [1, 2], we shall no longer let this process remain secretive. We finally have complete copies of the documents which the shenanigans keep behind passwords (unlike ODF which they attack). This includes 6 files, namely:

  1. 1080.pdf
  2. OfficeOpenXML-WordprocessingMLArtBorders.zip
  3. OfficeOpenXML-SpreadsheetMLStyles.zip
  4. OfficeOpenXML-DrawingMLGeometries.zip
  5. OfficeOpenXML-RELAXNG-Strict.zip
  6. OfficeOpenXML-XMLSchema-Strict.zip

[Note: appended at the bottom of this post we now have 1081c, 1082c, and 1083c.]

[Note #2: we now have a mirror listed at the bottom.]

For those who forgot the opposition to ISO’s bad behaviour, here is another new article about IBM’s action.

In a recent announcement IBM said that it would reconsider its membership in the hundreds of bodies that create global standards for everything from software to servers.

Another article says that “IBM Nixes Standards Shenanigans” and further to the exodus in Norway we also have Glyn Moody’s take.

A little while back I noted a provocative call from IBM for standards bodies to do better – a clear reference to the ISO’s handling of OOXML. Here are some other people who are clearly very unhappy with the same: 13 members of the Norwegian technical committee that actually took part in the process.

[...]

This particular saga is only just beginning…

Feel free to pass around (or even ridicule) those ~60 megabytes of lock-in, which Microsoft won’t let you see. This probably still contains many of the known flaws, which stayed in tact awaiting and even deserving scrutiny.

flickr:2400867976

Update (03/10/2008): we’ve just added 1081c, 1082c, and 1083c.

Update #2 (04/10/2008): this Web server sporadically goes down due to heavy load (over 10 GB of traffic today, plus lots of CPU and RAM). We’ve made a mirror available, so please use it instead, if possible.

Update #3 (04/10/2008): we now have an HTML version of the core of OOXML, but please use this mirror (HTML), which should be faster.

Update #4 (04/10/2008): the first mirror was downed by the load (thousands of OOXML pages combined with the Slashdot effect can do that), so here is a second mirror. If it’s down as well, come back later when there’s less hammering on the servers.

Update #5 (04/10/2008): third mirror of the HTML version, just in case.

Update #6 (04/10/2008): here is a mirror of the PDF (1080.pdf).

Update #7 (05/10/2008): here is a much better HTML version of OOXML (1080). We will have another one soon, but it comprises over 11,000 files, so this may put strain on the server.

Update #8 (06/10/2008): now that the load on the server has declined somewhat (tens of gigabytes in days), we decided that it’s safe to upload this graphics-rich HTML version of 1080 (comprising over 11,000 pertinent files).

Update #9 (07/10/2008): due to legal intimidation from ISO or its cronies, we have removed OOXML (also from the mirrors).

Share this post: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • co.mments
  • DZone
  • email
  • Google Bookmarks
  • LinkedIn
  • NewsVine
  • Print
  • Technorati
  • TwitThis
  • Facebook

If you liked this post, consider subscribing to the RSS feed or join us now at the IRC channels.

Pages that cross-reference this one

208 Comments

  1. Mike Brown said,

    October 3, 2008 at 12:01 am

    Gravatar

    5,500+ pages. This is from page 2304:

    “For legacy reasons, an implementation using the 1900 backward compatibility date base system shall treat 1900 as though it was a leap year. [Note: That is, serial value 59 corresponds to February 28, and serial value 61 corresponds to March 1, the next day, allowing the (non-existent) date February 29 to have the serial value 60. end note] A consequence of this is that for dates between January 1 and February 28, WEEKDAY shall return a value for the day immediately prior to the correct day, so that the (non-existent) date February 29, 1900, has a day-of-the-week that immediately follows that of February 28, and immediately precedes that of March 1, 1900.”

    Really, you couldn’t make this stuff up.

  2. Roy Schestowitz said,

    October 3, 2008 at 2:25 am

    Gravatar

    Wonderful. Software bugs (Microsoft Office) are part of “the standard”, which rather than being fixed are just lumped in with the rest of the pile of bugs.

    Maybe OOXML should also explicitly state that 850 * 77.1 = 100,000.

    http://www.downloadsquad.com/2007/09/25/excel-2007-cant-do-math-unless-850-77-1-100-000/

  3. AlexH said,

    October 3, 2008 at 3:19 am

    Gravatar

    Actually, bugs are part of the standard if the standard is already out there.

    In the case of spreadsheet data, having an app re-interpret the data as something different is clearly, definitely, and obviously wrong. “Correctness” doesn’t matter if “fixing” it actually breaks user data.

    Come on, there are better criticisms of OOXML than its legacy support….

  4. Darren said,

    October 3, 2008 at 5:12 am

    Gravatar

    Hang on, OOXML (ISO/IEC 29500:2008) has NO legacy support as there are NO apps that curently implement it. This being the case, there should not be any bugs left in there like this. Even MS has no roadmap of when they will support the ISO “standard”.

  5. DanielHedblom said,

    October 3, 2008 at 5:16 am

    Gravatar

    @AlexH

    Thats why nobody besides Microsoft wanted this “standard” go trough the fast track process. It is/was badly broken, unspecified, impossible to implement and really a pure pile of manure.

    The “standard” is just a dump of how one specific implementation of a document format works, bugs and all. Thats so wrong that its not even funny.

    That the standard contains bugs and that the only halfway implementation contains piles of bugs is actually the best argument against it there could ever be.

  6. Roy Schestowitz said,

    October 3, 2008 at 5:22 am

    Gravatar

    AlexH, what’s with the Microsoft apologism again? Are you again going to take Microsoft’s side with spin?

  7. AlexH said,

    October 3, 2008 at 5:27 am

    Gravatar

    @Darren: the file format isn’t, but the data is. This isn’t a file format issue, this is a data issue.

    @Roy: it’s not “apologism”. Free software implements this same bug as well, because it makes spreadsheets actually work. If we broke people’s spreadsheets, that would rightly make them angry.

  8. Roy Schestowitz said,

    October 3, 2008 at 5:33 am

    Gravatar

    Issues need to be mended, not reckoned with. I fail to understand your logic.

  9. AlexH said,

    October 3, 2008 at 5:37 am

    Gravatar

    @Roy: the point is, you can’t just “mend” this issue. If you change the way the software interprets the formula, the data comes out different, and often different is wrong. You can’t automatically fix up the data because spreadsheets do not have a concept of type, only of formatting.

    OpenDocument 1.2 is going to standardise the exact same bug that you deride OOXML for, and I’m sure Microsoft will somehow catch the blame for that as well. However, it’s just not that simple a problem: you can’t play fast and loose with people’s existing spreadsheets because this is not a file format issue.

  10. Roy Schestowitz said,

    October 3, 2008 at 5:49 am

    Gravatar

    I thought you were referring to the calculation bugs and the leap year.

    Anyway, you used similar logic to justify Microsoft’s disobeying of Web standards.

    http://boycottnovell.com/2008/09/13/microsoft-admitted-mono-trap/#comment-24236

  11. AlexH said,

    October 3, 2008 at 5:59 am

    Gravatar

    I am referring to the calculation bugs.

    My point is you can’t say “it’s calculating it wrong therefore all existing spreadsheets must be wrong”: many of the people who care will have adjusted for that bug already, and correcting the bug will actually silently wreck existing data.

    And, no, my logic on web standards was completely different. Not least because Microsoft were following web standards, and even though I asked you many times what they should be doing, you had no answer. You like to bash them no matter what they do, which is fine, but trying to pretend like you have a good reason is a sham.

  12. AlexH said,

    October 3, 2008 at 6:00 am

    Gravatar

    .. and anyway, if you don’t like it in OOXML, I suggest you get onto office-formula TC at OASIS and ask them to remove it, because they’re putting the same thing into OpenDocument, for the exact same very good reasons.

  13. Roy Schestowitz said,

    October 3, 2008 at 6:22 am

    Gravatar

    “Very good reasons”? Deliberately accepting bugs is good reasons? Or is it Microsoft’s unwillingness to get its act together? I’s feet-dragging.

    Same with the Web by the way. Microsoft had almost a decade to fix its problems, but it didn’t until it lost market share.

  14. AlexH said,

    October 3, 2008 at 6:44 am

    Gravatar

    I’ve outlined the reason. If you don’t think it’s a good one, that’s your call, but the vendors of office suites disagree with you.

    If we were talking about the ugly text runs that OOXML does, that would be one thing. But we’re not talking about the file format in any way here – we’re talking about user data. That’s totally and utterly different, and I fail to see why you can’t grasp that.

    And since you brought up the web thing again, do you want to outline what action you think Microsoft should have taken? Or are you still pleading the 5th on that?

  15. DanielHedblom said,

    October 3, 2008 at 6:59 am

    Gravatar

    Would be nice to be able to moderate away astroturfers like AlexH. Paid shrills have no place here.

  16. AlexH said,

    October 3, 2008 at 7:05 am

    Gravatar

    @DanielHedblom: please don’t make personal accusations that are known to be untrue.

    The fact that I have a different opinion to other people here doesn’t make me a “shill”, paid or otherwise.

  17. Roy Schestowitz said,

    October 3, 2008 at 7:09 am

    Gravatar

    AlexH, I remember many other things you wrote here in the comments about OOXML, including your defense of the actual process.

    How can one be so blind?
    http://boycottnovell.com/ooxml-abuse-index/

  18. Roy Schestowitz said,

    October 3, 2008 at 7:11 am

    Gravatar

    And since you brought up the web thing again, do you want to outline what action you think Microsoft should have taken? Or are you still pleading the 5th on that?

    That’s like asking how to handle a criminal that expresses remorse. The reasonable thing to do is to jail it.

  19. AlexH said,

    October 3, 2008 at 7:16 am

    Gravatar

    Er, no, if you remember, I didn’t defend the process: what I said was that nobody should be surprised by the process. You cannot be shocked that corporates have large sway in bodies that are funded by, er, corporates.

    It has always been the same with ISO, and it will continue to be the same with ISO, because that is what ISO’s members and funders want. People who think ISO is irrelevant simply don’t understand what it does; it has always been this ugly.

  20. AlexH said,

    October 3, 2008 at 7:20 am

    Gravatar

    “And since you brought up the web thing again, do you want to outline what action you think Microsoft should have taken? Or are you still pleading the 5th on that?”

    That’s like asking how to handle a criminal that expresses remorse. The reasonable thing to do is to jail it.

    No, it’s nothing like that. You’re accusing Microsoft of working against web standards in this of vendor extensions. I’ve pointed out numerous times that a. it’s in the standard, and b. other standards-compliant browsers do the exact same thing.

    I’m not going to defend Microsoft’s abysmal support for web standards, but in this instance you’re simply wrong.

  21. Roy Schestowitz said,

    October 3, 2008 at 7:25 am

    Gravatar

    ..it has always been this ugly.

    Ha! The classic “they are as evil as us” excuse that Microsoft has mastered (against Apple, Google, IBM, etc). You’re doing it again.

    http://boycottnovell.com/2008/04/05/microsoft-ibm-epa-proxy/

  22. AlexH said,

    October 3, 2008 at 7:40 am

    Gravatar

    Yet again you accuse me of making an argument I’m not making.

    I’m not excusing them or defending them, as I keep saying and I wish you’d actually listen.

    I’m pointing out that it’s not a surprise that they act that way based on past history, and that anyone who thought they would behave differently is being naive.

    To put it simplistically, I wouldn’t defend a man who beats his wife, but I wouldn’t be surprised that he beat her tonight if he’s beaten her every night in the past week. (Not that I am equating ISO in any way with domestic violence, which is an extremely serious subject).

    It’s really not that difficult to understand the difference between those two positions, particularly for someone as educated as yourself.

  23. Roy Schestowitz said,

    October 3, 2008 at 7:47 am

    Gravatar

    Microsoft has made attempts to mock ISO’s intgrity and also pretended to be foolish. It’s nothing new:

    Microsoft: We were naivé about standards. No, really!

    “Microsoft was also present at IETF meetings around that time, and was enthusiasticaly gaming the system. I remember one Microsoft attorney with three assistants who were each feeding “audience” questions at the attorney’s direction.

    “Organizations like Sun, which ran a large standards department, were tremendously concerned with Microsoft’s attempts to game the system at the time.

    “Microsoft is no newcomer to the standards business. Protests otherwise on their behalf are insincere.”

    http://technocrat.net/d/2008/6/23/44269

    To say that the system was always dysfunctional is a self-serving stretch. Mind you, it was Redmond’s own press that presented an interview about C++’s standardisation, which required no manipulation.

    Nothing like OOXML (and Microsoft) has ever hit ISO, so let’s not become revisionists.

  24. AlexH said,

    October 3, 2008 at 7:53 am

    Gravatar

    Um, Microsoft have been heavily involved in ISO for years, it’s not like they just “hit ISO”.

    I suggest you do some more research on how ISO operate, who funds them, and how they’ve handled software stuff in the past.

  25. Roy Schestowitz said,

    October 3, 2008 at 8:15 am

    Gravatar

    Care to educate me? I did do some reading.

    Be specific.

  26. Luc Bollen said,

    October 3, 2008 at 8:19 am

    Gravatar

    @AlexH: There was NO user “data with the 1900 bug” in OOXML format at the time MS released the OOXML spec. The existing “data with the 1900 bug” was only in the binary .xls format. It was therefore perfectly possible for MS to avoid the 1900 leap year bug in the OOXML specification.

    A good indication of this is that ODF don’t have this bug specified, and OOo is perfectly able to open .xls files and store the data in ODF format. The problem should be handled in the import filter, not in the format specification.

    @Roy: You only published Part 1 of the spec (document N1080). Could you also publish the other parts (documents N1081, N1082 and N1083 ?)

  27. Roy Schestowitz said,

    October 3, 2008 at 8:32 am

    Gravatar

    I’ll upload these too in just a moment. The server is under stress that leads to errors, due to bandwidth (several gigabytes).

    I’ll update the post in a moment.

  28. Andy said,

    October 3, 2008 at 8:52 am

    Gravatar

    So where those changes ordered by the BRM properly carried out by the editor? If not, time for defect reports.

  29. AlexH said,

    October 3, 2008 at 9:14 am

    Gravatar

    @Roy: sure, look at their history in the C++ standardisation WG, or any number of the other WGs they have deep involvement in. They’re one of the most common vendors.

    @Luc: as I said, this isn’t a format issue, this is a user data issue. Indeed, ODF 1.1 and previous editions didn’t even address this syntax, because it’s application-specific. The problem is that you can’t just “convert” user data when you convert the file format, because spreadsheet data isn’t typed and you can’t know which numbers to adjust.

    So, ODF “doesn’t have this bug” is simply untrue: it left it unspecified, and ODF apps interpret things as they like (= compatible with Excel). ODF 1.2 will standardise this bug as well, so that apps that want to behave “compatibly” can do so.

  30. Roy Schestowitz said,

    October 3, 2008 at 9:24 am

    Gravatar

    Please, Alex, do not make attempts to rewrite history.

    http://reddevnews.com/blogs/weblog.aspx?blog=1203

    Speaking of theater, the IT industry got an eyeful when Microsoft admitted that one of its Swedish employees had offered monetary compensation to Microsoft partners in Sweden if they engaged in the proposal process and voted for the OOXML spec. Sweden invalidated its “yes” vote for OOXML and essentially abstained from the final voting.

    No surprise, broader accusations of ballot stuffing — by way of getting dozens of companies to suddenly join the ISO voting bodies of individual nations — abound.

    I asked Bjarne Stroustrup, the creator of the C++ programming language and a guy who has wended his way through the ISO ratification maze a few times himself, if he’s ever seen this kind of chicanery in previous ISO votes.

    “I have never heard of money changing hands in exchange for votes or anything equivalent,” Stroustrup writes back. “I guess every process is vulnerable to political and economic pressures, but I have not personally seen or suspected anything like that in relation to C++.””

  31. Luc Bollen said,

    October 3, 2008 at 9:27 am

    Gravatar

    Openformula (part of ODF 1.2) doesn’t MANDATE the bug, as ECMA376 was doing. From http://wiki.oasis-open.org/office/About_OpenFormula

    “Doesn’t mandate mistakes. Just because one program gets something wrong doesn’t mean that everyone should make the same mistake. The specification is carefully written to not require certain bugs, just because someone has a bug. For example, Excel incorrectly believes that 1900 was a leap year, and at least draft version 1.3 of the Excel specification claims that compatible applications must make the same mistake. Nonsense. Instead, OpenDocument wisely stores dates as dates (not just numbers), and thus does not require that applications have this bug. The Excel specification also requires that applications cannot be more capable than Excel (it doesn’t permit support for dates before 1900). Again, nonsense. In fact, at least one OpenDocument spreadsheet application (OpenOffice.org Calc) can correctly calculate dates and date differences going back to 1583! Similarly, many applications handle complex numbers in a very clumsy way; we’ve devised the specification to make sure that future applications can support better approaches, instead of tying their hands to a technique known to be poor.”

  32. AlexH said,

    October 3, 2008 at 9:28 am

    Gravatar

    Roy, don’t accuse me of doing something without quoting where you think I’m doing it.

    My statement was that Microsoft have a long-standing and deep involvement in ISO. That statement is correct, your hand-waving notwithstanding.

  33. AlexH said,

    October 3, 2008 at 9:36 am

    Gravatar

    @Luc: if you have an untyped number being used as a date, which is what current data is, what is an app going to do?

    If it doesn’t implement that bug, days are off by one. Great.

    So, yes, it doesn’t mandate, because the default formulas are typed. That’s great for new data. It doesn’t work for imported data, and that’s why they’re also standardising that bug in the specification.

  34. Roy Schestowitz said,

    October 3, 2008 at 9:40 am

    Gravatar

    AlexH, I was referring to your attempt to throw out claims of Microsoft/OOXML scandals by painting others as “equally evil”. You do this a lot. So does Microsoft.

  35. Luc Bollen said,

    October 3, 2008 at 9:41 am

    Gravatar

    Here is what OpenFormula says about this (normative text):

    “Implementations of formulas in an OpenDocument file shall use the epoch specified in the table-null-date attribute of the element, and shall support at least the following epoch values: 1899-12-30, 1900-01-01, and 1904-01-01.

    Many applications cannot handle Date values before January 1, 1900. Some applications can handle dates for the years 1900 and on, but include a known defect: they incorrectly presume that 1900 was a leap year (1900 was not a leap year). Applications may reproduce the 1900-as-leap-year bug for compatibility purposes, but should not. Portable documents shall not include date calculations that require the incorrect assumption that 1900 was a leap year. Portable documents shall not assume that negative date values are impossible (many implementations use negative dates to represent dates before the epoch). Portable documents should use the epoch date 1899-12-30 to compensate for serial numbers originating from applications that include a 1900-02-29 leap day in their calculations.”

    I think we are far from “ODF 1.2 will standardise this bug as well”.

  36. AlexH said,

    October 3, 2008 at 9:43 am

    Gravatar

    @Roy: No, I don’t do that “a lot”, and I would thank you to either give me a citation or withdraw another baseless attack.

    I already explained my position to you in very simple terms. I haven’t defended the OOXML “scandals”, nor have I defended ISO or Microsoft.

    So please retract that comment.

  37. AlexH said,

    October 3, 2008 at 9:45 am

    Gravatar

    @Luc:

    Well, you already quoted the relevant text:

    “Applications may reproduce the 1900-as-leap-year bug for compatibility purposes, but should not.”

    That standardises the bug, because it puts that behaviour in the standard.

    No-one likes that behaviour, but it is important that it is in the standard, because you cannot convert legacy data correctly without it.

  38. Luc Bollen said,

    October 3, 2008 at 9:54 am

    Gravatar

    @AlexH: “That standardises the bug, because it puts that behaviour in the standard.”

    No. The behaviour is not SPECIFIED in the standard. The standard simply acknowledges that applications may implement the bug.

    And it is clear that OpenFormula doesn’t standardise application behaviour, but only data format.

  39. AlexH said,

    October 3, 2008 at 10:06 am

    Gravatar

    @Luc: untrue.

    Implementations of formulas in an OpenDocument file shall use the epoch specified in the table-null-date attribute of the <table:calculation-setting> element, and shall support at least the following epoch values: 1899-12-30, 1900-01-01, and 1904-01-01.

    The first epoch takes into account the leap year bug on PCs (and is the default in OOo 3), at the cost of incorrectly importing data referring to the first few months of 1900, and the last epoch is the Mac bug.

  40. Roy Schestowitz said,

    October 3, 2008 at 10:14 am

    Gravatar

    AlexH,

    My statement stands. Moreover, not necessarily based on just this discussion in isolation, your claims/insinuation that nothing was amiss is defence of Microsoft, OOXML, and ISO.

  41. AlexH said,

    October 3, 2008 at 10:16 am

    Gravatar

    @Roy: do you actually want to quote me something where I said nothing was amiss?

    I think it’s sad that you make idle accusations knowing you have no evidence.

  42. Luc Bollen said,

    October 3, 2008 at 10:22 am

    Gravatar

    @AlexH: “at the cost of incorrectly importing data”

    I agree with you: the standardised approach “incorrectly” implement the bug.
    In fact, it recommends a “best effort” approach.

    So I maintain that the bug is not standardised in ODF 1.2, and I’m happy to close here our discussion about the “1900 bug”, as you implicitly recognised you were wrong in your first statement.

    However, could you explain what you mean by the “Mac bug” ???

  43. Roy Schestowitz said,

    October 3, 2008 at 10:22 am

    Gravatar

    Look, Alex, I’ll be brutally honest. I haven’t the desire or patience to pin down particular examples, but I can very well recall you claiming that BSI did nothing wrong in reversing the vote for no reason… after Alex Brown and other ‘Softies’ intervened, stuffed or whatever else they can do in this secretive process that ended up on the desk of the British courts (lacked funding to be concluded).

    Specifically, you claimed that Microsoft just had more friends than IBM, or something along those lines. You always underplay the abuses, which sometimes leads me to suspecting you’re one of these FOSS people who were hired by Microsoft (we have them in the IRC channel).

  44. AlexH said,

    October 3, 2008 at 10:32 am

    Gravatar

    @Luc:

    I think you misunderstand. 1.2 very much says that you can implement the bug. The 1899 “best effort” approach means that you can apply that bug to those dates in the small affected range, as the standard says applications may do – that’s the same behaviour as Excel. So my first statement was in fact correct.

    @Roy:

    If you’re not willing to defend accusations, then you shouldn’t make them in the first place. I don’t need to go into the reasons why that is morally wrong. I’m not going to address the rest of your pathetic insinuations, though.

    Just to remind you, what I said about the BSI was that they were perfectly entitled to take the decision that they took, and that the legal challenge would go nowhere. And that’s what happened: it didn’t “lack funding to be concluded”, it fell flat at the first hurdle and no-one was willing to spend more money on a goose chase.

    The point remains the ISO’s members – the nations – can take decisions on any basis they like. We might not like the conclusions that they arrive at, but they’re entitled to make those decisions.

    That’s not a defence of them, it’s a statement of fact. Let me put it in terms you might understand: are many people happy that Bush was elected in the US? And, did the electors in the US have the right to elect him?

    Saying that they had the right to elect him doesn’t mean that whatever happened in Florida was defensible.

  45. Roy Schestowitz said,

    October 3, 2008 at 10:48 am

    Gravatar

    By that logic, Standard Norway did nothing wrong, either. Thanks for confirming that scandalous processes or decisions can be accepted based on the ‘merit’ of independent choice, where stuffed rooms, stolen votes and rule-bending is fine. That’s the way I read it anyway and perhaps you didn’t follow what had happened in BSI.

  46. Luc Bollen said,

    October 3, 2008 at 10:54 am

    Gravatar

    @ AlexH

    ODF 1.2 very much says that you can implement the bug, BUT SHOULD NOT.

    If you want to consider this as being a standardisation of the bug, I’m afraid you are as stubborn as Roy, who makes far reaching conclusions from what you’ve said. ;-)

  47. AlexH said,

    October 3, 2008 at 11:01 am

    Gravatar

    I don’t know how many times I need to repeat this, but I didn’t make a judgement on whether or not it was right or wrong.

    All I said was that they have the right to make that decision.

    Take Norway for an example, then. They dismissed the technical committee, and made a non-technical decision.

    It wasn’t exactly democracy in action. In that case it seems the org decided that it was more important to bring the standard into ISO than for the standard to be debugged.

    It’s obviously wrong if you think the decision should be made on technical grounds alone.

  48. AlexH said,

    October 3, 2008 at 11:03 am

    Gravatar

    @Luc:

    Sure, it says should not. But, it’s still in the standard, so it’s standardised.

    Having buggy behaviour standardised is important. You don’t want to copy it, but you do want to understand it so that when you do things like import spreadsheets, they continue to work and get the right results.

    Most of the ODF apps have implemented all this stuff already anyway, because if you’re not Excel compatible then you’re not usable.

  49. Roy Schestowitz said,

    October 3, 2008 at 11:02 am

    Gravatar

    I don’t know how many times I need to repeat this, but I didn’t make a judgement on whether or not it was right or wrong.

    That’s just a convenient waiver for you, is it not? Like other technique that include casting “ODF” as “IBM” or “it’s just as bad/evil as X”.

    I’m not buying it.

  50. Roy Schestowitz said,

    October 3, 2008 at 11:06 am

    Gravatar

    Most of the ODF apps have implemented all this stuff already anyway, because if you’re not Excel compatible then you’re not usable.

    And again… it sound like Redmond Kool-Aid. You’re behaving as though it’s better to mimic Microsoft.

  51. Luc Bollen said,

    October 3, 2008 at 11:09 am

    Gravatar

    @AlexH

    It’s not standardised, it is documented. Having buggy behaviour DOCUMENTED is important.

  52. AlexH said,

    October 3, 2008 at 11:21 am

    Gravatar

    @Roy:

    That’s just a convenient waiver for you, is it not? Like other technique that include casting “ODF” as “IBM” or “it’s just as bad/evil as X”.

    Yet again you make that accusation, yet again it’s absolutely indefensible.

    I’m not going to bother to explain the argument further, because you’re just going to accuse me of that nonsense yet again, and I can’t be bothered. Your style of straw man arguments is boring. Argue the points I make, not the ones I don’t make.

    @Luc: if it goes into a standard, it’s standardised unless it’s in a section marked informative.

    I’m not sure why there is so much back and forth on this; OpenDocument is clear on this issue. This behaviour is allowed and standardised, because it’s a real issue which affects spreadsheet users.

    As I said way up there ^^, there are much better reasons to be against OOXML than the bits which make dealing with legacy data possible.

  53. enquiring minds want to know said,

    October 3, 2008 at 10:09 pm

    Gravatar

    and what the hell does this have to do with Novell?

  54. jcwarrio0866 said,

    October 3, 2008 at 11:16 pm

    Gravatar

    @AlexH:

    I’m not sure why there is so much back and forth on this; OpenDocument is clear on this issue. This behaviour is allowed and standardised, because it’s a real issue which affects spreadsheet users.

    Actually, the behavior you mention is NOT allowed.

    Portable documents SHALL NOT include date calculations that require the incorrect assumption that 1900 was a leap year.

  55. A committee member said,

    October 4, 2008 at 1:26 am

    Gravatar

    Hi, I’ve just done a quick comparison (with ‘diff’) of your files to the official files, and found that your OfficeOpenXML-WordprocessingMLArtBorders.zip file is corrupted. It has the right size and unzips ok, but after unzipping one of the resulting files (balloonsHotAir_bottomRight.png)
    is empty. That is the only difference after unzip.

    I’ve done a diff of hexdump outputs, which shows that a block of 65536 consecutive bytes has been zeroed.

  56. Roy Schestowitz said,

    October 4, 2008 at 2:41 am

    Gravatar

    A committee member,

    I’ve just re-uploaded the file. It seems identical to what it was before, at least in terms of size. Since it comes from the source, it can’t have been tempered.

  57. Pedro Gimeno said,

    October 4, 2008 at 5:32 am

    Gravatar

    I can confirm the corruption. unzip -t reveals it. The zipfile with md5 a33bb0c7f11ef63293ee4dfb6dbb986c is corrupt. At offset CF000 starts the 65536-byte zeroed block. I don’t have the original, but by examination of the zipfile directory it seems that the incomplete/missing files are:

    balloonsHotAir_bottomRight.png
    balloonsHotAir_left.png
    balloonsHotAir_right.png
    balloonsHotAir_top.png
    balloonsHotAir_topLeft.png

  58. Roy Schestowitz said,

    October 4, 2008 at 6:04 am

    Gravatar

    The original from the Project Editor is identical. I checked to confirm that there was no error in transmission (the nodes), so the original should suffer from the same error.

  59. Dan O'Brian said,

    October 4, 2008 at 7:15 am

    Gravatar

    jcwarrio0866: I can tell this is your first time reading a spec. That doesn’t mean the format is not allowed, it’s a warning to implementers that using that format for new documents that desire to be portable should not use it.

  60. A committee member said,

    October 4, 2008 at 8:26 am

    Gravatar

    You shouldn’t check for file corruption by checking file sizes only. Your OfficeOpenXML-WordprocessingMLArtBorders.zip is the same size as the non-corrupted one which I got from my national stadardization organization where I’m a committee member.

  61. Roy Schestowitz said,

    October 4, 2008 at 8:57 am

    Gravatar

    “A committee member”,

    That’s a fair point that I agree with. Just to shed light on this, I have no doubt that the files have not been tempered because they were obtained directly from the source (twice even). It is possible that the discrepancy you claim to be aware of occurred somewhere along a different route. I have no explanation for it, I’m afraid.

  62. A committee member said,

    October 4, 2008 at 10:54 am

    Gravatar

    @Roy: Thanks a lot for clarifying who gave you the copies of the files. I was afraid that the file corruption might have been a watermark and that someone who leaked you the files might be getting in trouble now. However, I don’t think that the Decoment Editor is truly the original source for the zip file of graphics. I think he probably received that from Microsoft and he didn’t have any instruictions from the BRM to modify that in any way. Maybe someone at the ISO/IEC “ITTF” (which has the task of checking standards for formal correctness) noticed that the file is corrupted and managed to get a corrected file from Microsoft. This might help explain the long delay between completing of the editiing work and the ISO/IEC internal distribution of the document.

    @Pedro: Your list of filenames is exactly correct. My earlier assertion about only one file being affected was wrong.

  63. Michael J said,

    October 4, 2008 at 11:48 am

    Gravatar

    Re the argument between Alex and Luc about standards:

    When composing a spec, most writers use the wording from RFC 2119. The word “Shall” is used to indicate a requirement while “Should” indicates a recommendation.

    So the ODF standard quoted probably means to recommend against an application implementing the Excel bug, but not to forbid it. (If the ODF spec’s authors are using RFC 2119[1], they will certainly say so in the spec).

    The quote from the standard *does* say that “Portable documents” “shall not” require the Excel bug, so I would guess that you could say that the ODF spec (as quoted) *permits* applications to maintain the Excel bug, so long as they don’t describe the files as “portable”.

    The OXML[2] spec, however, seems to *require* that apps maintain the Excel bug. That is somewhat different from permitting it.

    So I suggest that the ODF committee’s actions do not act as any justification for the ISO’s in this case.

    But what would I know? I’m just a humble[3] programmer.

    [1] http://www.ietf.org/rfc/rfc2119.txt
    [2] They stopped calling it “OOXML” some time ago.
    [3] http://en.wikipedia.org/wiki/Uriah_Heep_(David_Copperfield)

  64. Jose_X said,

    October 4, 2008 at 2:28 pm

    Gravatar

    AlexH, I haven’t used a spreadsheet in a while. After thinking about it for a little bit, I can’t figure something out. If dates don’t have types, then how can they be rendered as dates automatically? It seems you are saying that any particular opening of a spreadsheet will always have the dates off or the numbers off or any combination of these off. By off I mean rendered as numbers when intended as dates or vice-versa.

    I know you can go back and forth between dates and number rendering, but something has to cue in that this is now meant as a date or else there is no reason to have it be rendered as a date upon opening a sheet (conversely, a similar argument could apply for numbers if dates are preferred). I really don’t think when people pass spreadsheets around that numbers and dates are randomly flipped arbitrarily.

    Perhaps there is a type and Microsoft does not want to reveal how it is stored. Maybe there is a type and OO.o gets it right.

    BTW, if dates are typed, then as mentioned above, they can be converted and it would make no sense to keep the broken legacy leap year rules in the format.

    People, as for what ODF says, ODF is not perfect. It does seem from what has been quoted here that 1.2 will allow for the backwards mistake.

    Beware of Microsoft within OASIS. They gain if they can get bad decisions to be standardized because then OOXML cannot be singled out as broken. Expect that and more from them because they really hurt if OOXML is not adopted and found legit by a significant number of users. If the backwards thing doesn’t have a good reason for staying (this would be true IMO if dates are typed), then I would suggest that a bad leap year interpretation not be allowed in the std period.

    We can petition to the TC list. Is this issue something that is worth harassing them over?

  65. Roy Schestowitz said,

    October 4, 2008 at 2:48 pm

    Gravatar

    Jose, I received the following message yesterday:


    To all Participants:

    The 90-day period for this discussion list has now ended. A charter has been submitted and can be seen at http://lists.oasis-open.org/archives/tc-announce/200808/msg00009.html. Your participation has been greatly appreciated; we at OASIS hope that all individuals interested in furthering this work will join the technical committee.

    Regards,

    Mary

    ___________________________________________________________

    Mary P McRae

    Director, Technical Committee Administration

    OASIS: Advancing open standards for the information society

    email: mary.mcrae@oasis-open.org

    web: http://www.oasis-open.org

    phone: 1.603.232.9090

    Join us at the OASIS Forum on Security

    30 Sept – 3 Oct, near London

    http://events.oasis-open.org/home/forum/2008


    I’m the only person left in the #OIIC IRC channel (except the channel guard, which is a bot).

    What bothers me is that nobody has really responded to this yet:

    http://www.heise-online.co.uk/open/Is-Microsoft-trying-to-take-control-of-ODF–/news/111649

    We can probably wait patiently to see how ODFers react, but failure to respond would seem fishy.

  66. AlexH said,

    October 4, 2008 at 5:02 pm

    Gravatar

    @Jose: they’re not typed. The formatting of numbers (e.g., as dates, currency, etc.) is separate stylistic information.

    You can’t use stylistic information as a cue because a. not everything used in the calculation may be so styled, and b. the calculation may use relative dates.

    @Roy: the OIIC discussion forum was limited to 90 days from the start. It was never, ever going to be an ongoing forum. I’m happy to answer your questions on why that is if you have any.

  67. jcwarrior0866 said,

    October 4, 2008 at 6:38 pm

    Gravatar

    @Dan O’Brian:

    Hello Dan. I think you’ve rushed to the conclusion that this is the first time I read a spec. I do not think it’s important to clarify this in particular because this conversation is not about me.

    Let me quote what you mentioned earlier:

    “That doesn’t mean the format is not allowed, it’s a warning to implementers that using that format for new documents that desire to be portable should not use it.”

    Well Dan, I disagree. In no way the SHALL and SHALL NOT verbal forms are recommendations or warnings. They indicate *requirement* instead. Take a look at the OpenFormula spec:

    Within this specification, the key words “shall” and “shall not” (for requirements), “should” and “should not” (for recommendations), “may” and “need not” (for permissions), and “can” and “cannot” (for statements of possibility or capability) are to be interpreted as described in Annex H of [ISO/IEC Directives] (part 2).

    I can also bring here what Annex H of [ISO/IEC Directives] (part 2) mention about this verbal forms:

    Verbal form: shall
    Equivalent expressions for use in exceptional cases (see 6.6.1.3): is to, is required to, it is required that, has to, only … is permitted, it is necessary.

    Verbal form: shall not
    Equivalent expressions for use in exceptional cases (see 6.6.1.3): is not allowed [permitted] [acceptable] [permissible], is required to be not, is required that … be not, is not to be.

    Annex H of [ISO/IEC Directives] (part 2) also mention the meaning of SHOULD and SHOULD NOT, but I am not going to put them in my comment.

    Best regards.

  68. Johan Krüger-Haglert said,

    October 4, 2008 at 6:49 pm

    Gravatar

    Just put it on TPB if it’s not already there. Problem solved.

  69. John Hardin said,

    October 4, 2008 at 6:59 pm

    Gravatar

    This sort of thing is what the CORAL distributed cache was created for.

    http://boycottnovell.com.nyud.net:8080/forms/ooxml/1080.pdf
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-WordprocessingMLArtBorders.zip
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-SpreadsheetMLStyles.zip
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-DrawingMLGeometries.zip
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-RELAXNG-Strict.zip
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-XMLSchema-Strict.zip
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/1081c/1081c.htm
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/1082c/1082c.htm
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/1083c/1083c.htm
    http://boycottnovell.com.nyud.net:8080/forms/ooxml/1080-html/

    I don’t understand why people don’t post CORAL links when they *know* they’re going to get slashdotted out of existence…

  70. Dan O'Brian said,

    October 4, 2008 at 7:01 pm

    Gravatar

    You might want to re-read the entire snippet you posted before, because while I agree with the definition of “shall not” that you posted, it doesn’t change the fact that I am correct.

    Portable documents SHALL NOT include date calculations that require the incorrect assumption that 1900 was a leap year.

    Note that it says “Portable”. That says nothing of preexisting/imported documents.

  71. Roy Schestowitz said,

    October 4, 2008 at 7:06 pm

    Gravatar

    How hard you try to defend Microsoft bugs, Dan O’Brian. I already know you from many prior comments in this Web site, but it’s good that you show others in this thread who you are.

  72. PaulS said,

    October 4, 2008 at 7:43 pm

    Gravatar

    AlexH said:

    “It has always been the same with ISO, and it will continue to be the same with ISO, because that is what ISO’s members and funders want. People who think ISO is irrelevant simply don’t understand what it does; it has always been this ugly. ”

    As someone who has been involved in several standards committees (include some involvement with ISO), I can say that, while members do work to support the interests of the companies they represent, the level of shinanigans in SC-34 is orders of magnitude beyond anything I’ve ever seen or heard of.

  73. Dan O'Brian said,

    October 4, 2008 at 7:45 pm

    Gravatar

    Roy: Now you’re trying to play the same game with me as you are with AlexH. I have not defended Microsoft here, I am just telling it like it is.

  74. Roy Schestowitz said,

    October 4, 2008 at 7:47 pm

    Gravatar

    I think not. You spin to shelter bias.

  75. standardize this said,

    October 4, 2008 at 7:57 pm

    Gravatar

    Some may claim a metric standard that mentions imperial units SHOULD NOT be mixed with metric somehow standardizes imperial units as a part of that metric system. Those engaging in bigotry of this nature MUST be playing a dishonest and disingenious semantic game.

  76. Marius said,

    October 4, 2008 at 7:57 pm

    Gravatar

    Hi,

    I’ve exported the PDF document in a series of PNG images and created an index for them, so users can access it just like the HTML version you have.

    There are several advantages to this:

    1. the page layout is preserved and the information is easier to follow
    2. readers only download 3-15KB (one image) at a time, not the full 40-60MB
    3. you don’t waste so much bandwidth with that very large document

    The only downside is that readers can’t use copy and paste to extract information but they might as well download the full PDF file then.

    If you wish, the blog readers can use the following link to view the document:

    http://www.definethis.org/temp/ooxml/

    or you can download a copy (http://www.definethis.org/temp/ooxml/1080.rar – ~160MB) and extract it on your server.

    I’ll leave it on my server for a few weeks but I won’t be able to keep it there forever.

  77. Roy Schestowitz said,

    October 4, 2008 at 8:02 pm

    Gravatar

    Thanks a lot for this. Someone produced a similar thing several hours ago and I haven’t gotten around to uploading it. I’ll update the post.

  78. Jose_X said,

    October 4, 2008 at 8:50 pm

    Gravatar

    >> You can’t use stylistic information as a cue because a. not everything used in the calculation may be so styled, and b. the calculation may use relative dates.

    Thanks AlexH. I’m not trying to make your life difficult, but I still don’t quite follow what you meant by (a) or (b). Could you provide a rough example? It need not be legal syntax but enough to convey the idea.

    Here is the wall before me. Something has to cue in the renderer that we have a number that is a date and not something else. Why isn’t this good enough as the type information?

    If the formatting cue and overall context wasn’t good enough, how would the renderer know to format that specific number specifically as a date without messing up anything else? So we have this specific number precisely being identified as a date. If that information isn’t a type definition, what is?

    It’s not clear to me that I am covering everything, being precise, or even making sense. If I had more experience here, I would be better able to judge. Still, I don’t see it. I may have to dig into the specs to get to the bottom of this (or read something online that is clear and save myself the effort).

  79. Dan O'Brian said,

    October 4, 2008 at 9:58 pm

    Gravatar

    Jose_X: It might be best to ask the developers of OpenOffice or Gnumeric, for example.

    I’m not sure if AlexH can provide an example or not, but I would guess that the OOo and Gnumeric developers could.

    If I wanted to know the information you are after, those are the people I’d be asking.

  80. Jose_X said,

    October 4, 2008 at 10:16 pm

    Gravatar

    Dan O’Brian,
    If AlexH want’s to clarify s/he can. [It's a he right?]

    Short of sitting down with ODF or OOXML (no) and putting all the pieces on the table to look at them carefully, I would probably get the fastest insight by directly asking those guys you mentioned.

    Anyway, AlexH had mentioned that typing wasn’t involved. That would explain why you’d want to keep legacy, but I don’t then understand how the proper thing could be rendered from a common old (untyped) number. If typing info is available, then it would make no sense to keep the error in a standard format. That would make the format (even the quasi exception being suggested for ODF) problematic and distasteful without reason.

    I have not looked at this too carefully, or I would say so. That’s why I think a few examples with specifics might quickly clarify things for me. Also, I got interested in the conversation but otherwise am not that motivated right now to follow up on this.

  81. AlexH said,

    October 5, 2008 at 5:03 am

    Gravatar

    @Jose:

    I’ll try to explain as best I can. One thing you might want to do is look at the OpenFormula spec, which for the first time does actually include typed information.

    You’re right in that the formatting cue will enable you to see information which is being treated as a date. The problem is that not all that information will be formatted like that.

    But there is no such thing as a ‘date’ in legacy spreadsheets: all you have is numbers which are being treated as an offset from an epoch. Some of those offsets will be “dates”, and some will just be offsets: e.g., what is the number 5? Does it refer to 5th January 1900 (or 4th)? Or are we using that to say “5 days from now”?

    Even worse, many spreadsheet users will calculate things based on references to other spreadsheets – e.g., having a master sales sheet, and then various report sheets. In that instance, you can’t even see the other data unless you’re in the “top” spreadsheet. If you rely on the stored values in the sheet you opened, you have again no idea what those values actually represented on the other sheet.

  82. Yfrwlf said,

    October 5, 2008 at 11:20 am

    Gravatar

    Through all the ranting I don’t know if I have an accurate picture of the problem that is being argued about, but from what I can gather:

    A program like OOo can interpret an ODF document in one of two ways, it can either read the document via “the buggy way” or “the non-buggy way”, but it can only do one. If the ODF format allows for either way to be used, then the readers like OOo and others could read the document correctly, or incorrectly, and it is technically impossible for these programs to always read the document correctly, all because the document standard hasn’t specified which method it prefers?

    If that’s correct, then of course ODF is a broken format in that regard, however that depends on how broken it is in “the wild”, and you’d think that there would be something you could do to correct it, some way of fixing any older documents simply by having a converter which upconverts them to a newer standard format which does away with the bug entirely without breaking anything for anyone. Formats should tie up any loose ends, whatever it takes in order to allow readers to always read the format correctly. I thought this was the problem with OOXML, as it included certain things which would allow an OOXML document to be interpreted in two different ways, and in order to do it the correct way it required the use of proprietary software that wasn’t available for all platforms/users/etc and was basically controlled. Obviously a controlled standard like that isn’t a true open standard, and obviously an “internally used” or “controlled” standard isn’t a standard.

    Any way, I hope all formats can be made better, but obviously the ISO should never accept proprietary or borked formats as being standards. It’s obvious to anyone who knows Microsoft well that this move was simply to E.E.E. the office document format to prevent competition, when the horrible (bad for them) truth is they are going to have to start competing (good for consumers) without pulling backstabbing unlawful business tactics.

  83. James said,

    October 5, 2008 at 11:27 am

    Gravatar

    @AlexH: Why you haven’t a website or blog? I just want to see a bit more about you and your knowledge. PLEASE, show me who you are.

    Maybe you’re the AlexH from http://www.contoso.com?
    http://en.wikipedia.org/wiki/Contoso

    http://center.spoke.com/info/pDJMWq/AlexHankin

    Alex Hankin
    Contoso, Ltd.
    Senior Director
    New York, NY

    Skype: AlexH
    Home IM: Alex@hotmail.com
    Home Email: Alex@hotmail.com
    Work Email: alexhankin@contoso.com
    Work IM: alexhankin@contoso.com

    Telex: 781 234
    Home (208) 555-5656
    Mobile: (775) 551-2345
    Fax: (207) 555-9999
    Direct: (207) 555-1112
    Tel: (207) 555-1000

    Here:

    http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Apt-BR%3Aofficial&hs=VHr&q=microsoft+Alex+Hankin+alexh&btnG=Search

  84. AlexH said,

    October 5, 2008 at 11:27 am

    Gravatar

    @Yfrwlf: this isn’t an ODF problem, this is a ‘spreadsheet data’ problem. The file format doesn’t matter, because the data we’re talking about is entered by users.

    Indeed, in ODF 1.1 the formula stuff isn’t specified at all: it was deemed out of scope for the standard.

    The issue is very much “data in the wild” though. If you open a file in an older format, or cut and paste from one, or link to one, or otherwise get the data from elsewhere, then it’s a problem.

  85. Yfrwlf said,

    October 5, 2008 at 12:00 pm

    Gravatar

    You’re saying the storage of the data isn’t a problem, that the data is safe and can be read and written correctly to the ODF, but that it’s a problem with the readers like OOo? I don’t see how it’s not either one or the other, either the format has a problem, or the reader does.

    Regardless, all I know is that a properly done document format will be able to account for all data correctly, so that if a program implements the format how it’s supposed to be implemented, all data will be correct. If it’s impossible or difficult for the program to read or write certain kinds of data correctly due to a lack of specification by the format, it’s the format’s job to implement additional standards to allow for correct interpretation, or aid in that process to allow for greater format uptake in the various office programs which exist today.

  86. Fleep said,

    October 5, 2008 at 1:11 pm

    Gravatar

    AlexH has a point, and his point doesn’t change the basis of the OOXML criticism. Why are you so upset about him not agreeing with your views?

  87. Jose_X said,

    October 5, 2008 at 2:20 pm

    Gravatar

    You have a number out there. Call it a spreadsheet value if you want. Call it a number in some text file. Call it what you want.

    You have some code out there. The code uses a broken algorithm for turning that number into a date.

    Is this a reason to break ODF or any new format?

    No, it is not.

    Just keep the legacy documents as is (eg, keep as is the text file with the “5″ on line 27 offset 12).

    If you change formats for that file (eg, to ODF 1.2), the old code is not likely to work anyway. If you change formats for that file, you’ll need new code anyway. Why make a broken format to then have to create new code that is also broken?

    AlexH, if you try to be specific maybe you will be able to convince people here because it just doesn’t make sense that the old mistakes “need” to be carried forward. If so, we’d still be using cavemen data formats and no new code would ever be written (eg, no converters or even new code to replace the old code).

    The main reason I can see to keep things as is is as yet another way to help out Microsoft’s vast investments in this brokenness. If things change, new players will be on a similar footing (wrt to date interpretation) as Microsoft.

    It makes sense to fix past mistakes. In a competitive marketplace, the old garbage instituted by a particular vendor would not carry forward.

  88. Jose_X said,

    October 5, 2008 at 2:21 pm

    Gravatar

    Fleep, I don’t see AlexH’s “point”. Could you give your interpretation so that maybe it will make sense to me and to some others?

  89. Jose_X said,

    October 5, 2008 at 2:28 pm

    Gravatar

    >> The main reason I can see to keep things as is is as yet another way to help out Microsoft’s vast investments in this brokenness. If things change, new players will be on a similar footing (wrt to date interpretation) as Microsoft.

    Another reason to keep the brokenness would be to allow (eg) Novell to maintain their special advantages if Novell also has a bunch of investments in re-implementations of this brokenness or know that this brokenness will somehow give them an advantage (eg, if Microsoft stays on top, Novell’s existing income stream might be more likely to stay in tact).

  90. Roy Schestowitz said,

    October 5, 2008 at 2:38 pm

    Gravatar

    Novell receives access to Microsoft source code, so brokenness is not much of an issue to them. They can just copy (mimic) rather than reverse-engineer quirks, bugs, and changes.

    All the ‘weird’ stuff in OOXML serves Microsoft. The more bizarre the format, the less manageable it is for competitors.

    This conversation got latched onto one particular flaws among much more serious ones, which is a shame. Shouldn’t we discuss what Microsoft put in a separate ‘baskets’ and all those Windows-only ‘features’ and ‘loopholes’ of OOXML?

  91. AlexH said,

    October 5, 2008 at 3:10 pm

    Gravatar

    @Jose:

    When you’re converting a file format, you have to re-use the existing data, yes?

    What I’m trying to get across to you is that there is no way to tell whether a given number in the old data needs to be changed, because there isn’t enough information to be able to do that. The “fix” is basically to decrement a number by one; but you have no idea which numbers need to be changed.

    Adjusting user data on import is an extremely dodgy practice in general; you have to be absolutely 100% sure you’re getting it right.

  92. AlexH said,

    October 5, 2008 at 3:15 pm

    Gravatar

    @Roy: even if that were true, which it’s not, they publish it as free software so anyone else can look at / copy the functionality.

  93. Roy Schestowitz said,

    October 5, 2008 at 3:31 pm

    Gravatar

    It’s true, and there is no reason whatsoever for people to replicate platform-specific behaviour that’s otherwise irrelevant to document data storage.

  94. Jose_X said,

    October 5, 2008 at 3:36 pm

    Gravatar

    >> This conversation got latched onto one particular flaws among much more serious ones, which is a shame.

    Let’s mention some more things of interest that are demonstrated well through this simple date example.

    I think this example helps demonstrate that there are many types of data that are interrelated. Eg, the date numerical representation ..is related to.. the type attributes identifying that number as a date convertable using algorithm X ..is related to ….

    Microsoft’s extensive closed source (still ongoing) history and investments means that the pertinent data for proper interpretation of any other data is spread across the entire of their product line.

    A format brought up by people working in the open is likely to be much better than something that got cooked up based on this closed stew. When diverse groups openly try to agree on stds, they are led to formats that work well among diverse groups. One such item is that related data should be accounted for somewhere centrally.

    No doubt Microsoft keeps tabs on their data centrally, but they don’t reveal this within the OOXML format they make public. OOXML is a piece to a complicated puzzle. This piece is missing key info for interworking with the rest of Microsoft’s software. The crucial bits of data are scattered all over the place and they are only opening some portions. Of course, they can open up whatever they want and then create new bits that they keep close.

    Don’t expect change from them as long as they have closed source and interlocking monopolies — lack of checks and balances: no real penalty for changing; HUGE existing investments: in a Gordian Knot body of source code, in the Microsoft Way Mindsets, in existing contracts made valuable by their unique position; HUGE business reasons for preserving the existing frameworks and methods: so that powerful business levers don’t disappear, so that they can be (very) cash positive and subsidize businesses they need/want to control but in which they currently aren’t competitive…. The lists go on and on.

    Microsoft can’t afford to be broken up in a way where important bits of the code end up in different companies. That would not only initially lead to chaos, but long term they lose their advantages if they can’t keep closed source the secret info about many product interactions (the source code itself implies some of this secret info) interspersed across these product lines. If you have different companies, who would hold the central knowledge and who would ensure this would stay in sync with the evolving products of the now distinct companies?

    Because of this, the likely result leading up to the breakup would be a reshuffling internally so that one company would get the real goods. This would allow that one company to eventually take over where Microsoft currently sits. UNLESS you prohibited these new companies from building products to service both sides of the interfaces. The problem here is that what constitutes an interface?

    I think that the idea of having an evolving closed source OS API makes no sense from a fair competition point of view. In fact, closed source and competition are incompatible items. Closed source implies monopolies. The OS is simply the most important software component on a device. And software, traditionally, is the much more powerful way to implement rapid changes that do lead to losing interop assuming interop existing the second prior to the new change.

    The only advice I can currently offer generally to users is to avoid closed source.

    And developers that want to produce competitive code should also stick to open source environments and libraries (the assumption is that money would be made other than through the powerful lock-in exclusivity of closed source).

  95. Dan O'Brian said,

    October 5, 2008 at 3:43 pm

    Gravatar

    FWIW, the buggy date interpretation code is not a bug in Microsoft’s Excel code, it was deliberately implemented to work around a broken Lotus 1-2-3 bug because Microsoft’s Excel needed to be able to import Lotus 1-2-3 spreadsheets w/o breaking formulas in pre-existing spreadsheets.

    ODF needs the same workaround for the same reason.

  96. Jose_X said,

    October 5, 2008 at 3:55 pm

    Gravatar

    AlexH, I think you missed what I was talking about. You keep the old data in the old formats if you want. The new data can be saved to the new formats. The new formats require new code for translating no matter what. It’s a new format! This means that the new code *does* know to use the correct algorithms while the old programs continue to work with the same old assumptions. If you want to interchange the old with the new data, new converters know the old and the new rules.

    Ie, the info needed to know which alg to use is available. Data in old formats use the broken algs and data in new formats use the good alg. And you can use software converters to convert from one format to the other (statically once and for all or dynamically as the various formats are encountered).

    Again, you are not giving examples where this could not be done or would be foolish to try it. Your vague argument generalizes to “we should keep the formats we had back in 1940 so that we don’t have problems moving forward.”

    Yeah, maybe we should have kept the year 2000 bug as well.

    Yfrwlf (as I read the replies) was assuming pretty much this and then adding that the point is to make *specific* which alg to use in the new formats.

    Using the old rules (as OOXML does) is foolish. Leaving it up in the air (ODF 1.2 might do this in part) will just lead to excusable incompatibilities.

    Microsoft needs formats that are underspecified (plus broken in as many ways as possible) in order to allow monopoly backed lock-in secrets to exist in an excusable manner.

    “Hey, the std was not specified precisely so we picked….”

    The excuses are probably primarily meant to keep them safe in court actions from the government.

  97. AlexH said,

    October 5, 2008 at 4:01 pm

    Gravatar

    @Jose: I didn’t miss it, I keep trying to explain to you the situation.

    It’s not that the application can’t use the right algorithm. It’s that when you open the old data, you cannot adjust it so that it is “correct”.

    So the “just use software converters” argument simply doesn’t hold: you cannot do it. The spreadsheet doesn’t hold enough information to know which data needs to be corrected and which data doesn’t.

    You can save the data in the new format, with the new algorithms, but it doesn’t help. Unless you can correct the data, it’s wrong. And you can’t correct the data because you don’t know which data is dates, which is date offsets, and which is just numbers.

  98. AlexH said,

    October 5, 2008 at 4:04 pm

    Gravatar

    @Jose:

    Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it’s set to 1931-2030 or something, so you know that ’98′ == ’1998′, ’31′ == ’1931′, etc.

    It’s exactly like that. Unless you know what the “range” is to begin with, you cannot hope to convert the data accurately, because you’re missing enough information.

  99. Roy Schestowitz said,

    October 5, 2008 at 4:33 pm

    Gravatar

    Since you can determine what the date should be when handling old files you can save it properly too. Preserving bugs for compatibility with legacy (and proprietary) formats is the road to another sort of mess — a bigger one. It makes up a hack that’s only a baggage for debugging purposes and maintenance.

  100. Jose_X said,

    October 5, 2008 at 5:00 pm

    Gravatar

    [AlexH, I'll address your very last comment with the '31' example after this reply. I just noticed your example prior to posting, but I don't think it changed the essense of this post. In any case, I'll get to your last comment in a sec.]

    AlexH, we can’t automate something across the board 100.00% when that information is not known in one place 100.00% of the time. The place to address the date issue is where it is known that something is a date. Where this info is not captured in the same file as the data, the conversion can be done with the apps or library calls that interpret the particular numbers as dates or by users doing the work manually when they identify something should be a date but it is not (with the help of existing handy filters ready to come to the rescue).

    In any of these cases, changing to a *new* format with *new* semantics means that something of this nature *has to be done* anyway.

    [In the case of Excel date-formatted items, the process can be automated because that date formatting info is included in the same file as the date integer. However, there are spreadsheets generated in ways were that data need not be maintained. It's for these cases that we are talking about.]

    If you don’t want to deal with any manual process in a particular case and be willing to keep all the bugs of the past then you use a legacy app or app mode that works as legacy. But then you can’t convert to the new format that has new semantics from an app that doesn’t have access to the type info. You can’t convert to OOXML, ODF, or anything else that might have new semantics.. unless you want to do some hand tweaking. To convert to a new format with new semantics that were previously kept in an ad hoc way, you have to add a bit of manualness to the process.

    AlexH, can you give any example at all where this would not be a manageable situation? Remember that you can keep the old data in the old formats read by the old applications.

    The ODF should use the proper date semantics. If something in some old file somewhere is not known to be a date based on info inherent in that file, then you can’t save it in ODF anyway where that unknown date maps to a date. You would just get a number. This has nothing to do with the new format. It has to do with the app making the conversion not having access to the missing info. This would not be the case for Excel files where the dates were formatted to look like dates, but it might be the case where a text file is interpreted oddly by some app X. In which case, the conversion should be handled by an upgrade to that app X.

    … [thinking I better repeat some of this again]…

    In other words (darn this is tiring), you can’t take a number that is not known to be a date and make it a date automatically without error. This has nothing to do with the new format. It has everything to do with requiring access to the missing semantic info. If you have access, you can do it. If you don’t have access, you can’t do it no matter what OOXML or ODF says. It would just be a number.

    And once again: in the case of Excel files with numbers formatted as dates, that info IS AVAILABLE* so there is no problem for this common case. [*: it's available subject to the proper reverse engineering of the old MS binary formats.. of course the EU could force Microsoft to reveal this info so if they don't they would be in violation.. in any case, there is no excuse.]

    AlexH, you have not shown a single example, and you are mixing issues. Some might even say you are using FUD to give the impression that the task is unmanageable. If it’s unmanageable, you should be able to give many examples instead of 0 examples. Please give examples, AlexH. [Ed- note comment at top.]

    Dan O’Brian, that Microsoft kept the Lotus bugs doesn’t justify that as a good decision. Let’s give reasons other than to say that X person did Y so therefor Y must be good.

    [I am not trying to be verbally abusive, but I don't like to see anyone defending Microsoft or their ways without reasons that pass muster. If you want to defend Microsoft, come with real reasons or expect to anger a lot of people.]

  101. Jose_X said,

    October 5, 2008 at 5:17 pm

    Gravatar

    >> Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it’s set to 1931-2030 or something, so you know that ‘98′ == ‘1998′, ‘31′ == ‘1931′, etc.

    AlexH, this is the sort of thing I mentioned in the last comment. If you know the info (ie, that X is to be interpreted as Y), then you know the info. If you don’t, then you don’t.

    If you do, then you can perform the conversion. If you don’t, then you can’t.

    In any case, if you have a new format with a previously non-existent semantics/type named “date”, then you can’t have things magically appear as dates, no matter the specific semantics/algorithm, unless the converter has info of what were dates in the old formats. If you do, then you can make the conversion. If you don’t for whatever reason, then the mapping will be to a common old integer just as before and not to the new date type.

    In the case you gave, the conversion can be done if that semantic bit can be deduced from the data in the file. Otherwise, the conversion should be done by whatever entity knows that we are dealing with a date. Otherwise, it can’t be done no matter how OOXML or ODF define the new date data type.

    To repeat from earlier comments, if you have an Excel number formatted as a date, then that info can be deduced from the binary files and that formatting knowledge could be designated to map into the new “date” tags of the new format. Here, you knew that the old number is a date that would need to be adjusted to match the semantics of the new tags.

    The exact semantics aren’t the stumbling block so long as they are well-defined (as Yfrwlf mentioned). What is important for designing the new semantics is that the semantics be well-defined and as “sane” as possible. What is important to acquire the ability for old data to be used with the new tags is that the semantics for the old data be fully known. These two issues are distinct. Microsoft themselves cannot make the right number into a date for OOXML unless they already know they have a date. If they don’t, they must keep it as an integer. If they do, then they can convert to the proper definition no matter what the definition is: using the correct alg or using the broken alg.

  102. Jose_X said,

    October 5, 2008 at 5:27 pm

    Gravatar

    AlexH, consider finishing off the example you gave of “31″ with more details to give examples of a problem that would be solved if we make ODF have the old semantics but would not be solved if ODF were instead to use the fixed algorithm.

  103. Dan O'Brian said,

    October 5, 2008 at 5:44 pm

    Gravatar

    From my understanding, both OpenOffice and Microsoft Office interpret the date at display time depending on the user’s configuration settings.

    I don’t have a copy of Microsoft Office handy, so I can’t check their configuration UI, but in OpenOffice you can find the config setting under Tools / Options / OpenOffice.org Calc / Calculate / Date

    Depending on which of the 3 radio buttons you select (12/30/1899 [default], 01/01/1900 (StarCalc 1.0), or 01/01/1904) the spreadsheet interprets the data in a different way.

    Now, if someone using Calc (using the default setting) imports a spreadsheet with a date that was created in, say, StarCalc 1.0, and then saved to ODF, where the saved ODF document forced the interpretation (as Jose is suggesting can be done) to be the 12/30/1899 epoch, then the data in the spreadsheet could very well be wrong, but the user might not notice it right away.

    I think that’s the problem that AlexH is trying to explain.

  104. Jose_X said,

    October 5, 2008 at 5:48 pm

    Gravatar

    Alex, if before, all the information we knew about a particular integer was that it was an integer type, then we map it to the integer type in ODF.

    However, if the converter can deduce more type info, then we might be able to map to a date tag or to some other tag.

    And in these last cases, where we know enough to identify a (eg) date, if we can map to a tag with the broken date semantics, then we can map to the fixed semantics since this entails adjusting the values in a well-defined way. Ie, if we know to map to “date with broken algorithm” then we can map to “date with fixed algorithm” since there is a well-defined mapping to this fixed algorithm.

    However, in other cases, the mappings may not be so nice. In general, we need to create good formats and fix mistakes of the past. If, as customers, we put our data into proprietary closed formats such as what Microsoft offers their customers, then we make a decision that may not be fixable short of knocking down Gates’ door demanding relief.. or knocking down his Window if you want longer lasting relief.

  105. Jose_X said,

    October 5, 2008 at 5:57 pm

    Gravatar

    Dan O’Brian,

    No. No. No.

    When we save, we know how to adjust the number so that it maps properly to the canonical form implied by the corrected definition. The saving process knows the users config info and so can adjust into a canonical form. Then everyone else that reads this does the necessary translations to match their settings.

    If we can save into X-1 then we can similarly save into X by adding +1 at the time of save. The semantics of the ODF file date tag would then let everyone know that we have X and not X-1.

    ODF is tagged. The tags carry semantic information just like binary Excel files do (but in a closed proprietary way).

  106. Jose_X said,

    October 5, 2008 at 6:01 pm

    Gravatar

    The bottom line is that if we *don’t know* something is a date, but only know that it is an integer, we map to integer, whether we are mapping to OOXML or ODF or to anything else that has an integer type. If we *do know* something is a date (with the implied broken semantics), then we can map to a date in ODF such that the necessary adjustment is made.

  107. Jose_X said,

    October 5, 2008 at 6:06 pm

    Gravatar

    So in the case with meaning 1 or meaning 2 or meaning 3 of a date or whatever other context is necessary for proper interpretation:

    If we know this, we map to ODF correct date tag, adjusting as necessary.

    If we don’t know this extra date context, then we play it safe and keep the integer as an integer.

  108. Dan O'Brian said,

    October 5, 2008 at 6:06 pm

    Gravatar

    Jose_X: the problem is, as AlexH has already pointed out, that spreadsheets have historically saved untyped data.

    Dates can be saved as “1/31″ (interpreted as January 31 of the current year), “1/50″ (January 1st, 1950), or “39725″ (the number of days since the configured epoch) and possibly other formats.

    The question is, which epoch is 39725 counting from? And how do we know it’s a date without more context?

  109. Dan O'Brian said,

    October 5, 2008 at 6:09 pm

    Gravatar

    Jose_X: I think that if all spreadsheets had agreed upon a canonical epoch in the very beginning, we would not have this problem. Unfortunately, that did not happen.

  110. Jose_X said,

    October 5, 2008 at 6:18 pm

    Gravatar

    Dan, if it’s an integer and that is all we know, we keep it as an integer. If we can deduce that it is a date with a broken formula and that is its sole role, we map to ODF date tag but fix the integer value as necessary.

    I suggest that, specifically for Excel, cells containing a simple integer and formatted as a date be mapped to ODF dates but with the correct value to match the ODF epoch.

    In any case, the ODF date tag is there for the future. Existing data can be mapped to ODF integers (or strings or whatever) as they are, while new items entered under a date context can be mapped to the ODF correct formula date.

  111. Dan O'Brian said,

    October 5, 2008 at 6:19 pm

    Gravatar

    When we save, we know how to adjust the number so that it maps properly to the canonical form implied by the corrected definition. The saving process knows the users config info and so can adjust into a canonical form.

    No, it doesn’t – that’s the problem. All it knows is that the field looks like a number. It doesn’t necessarily know if it is a date or not.

  112. Jose_X said,

    October 5, 2008 at 6:20 pm

    Gravatar

    Dan, not all people agreed to use the same language way back when, yet we live in a world where people using different languages can interact together because of the wonders of translators.

  113. Roy Schestowitz said,

    October 5, 2008 at 6:23 pm

    Gravatar

    Given the ability to interpret the date — whether as Excel, Lotus, whatever — you can save it properly for the future. You needn’t carry on bugs from the past.

    As Sutor said, “OOXML is about the past and ODF is the future.”

  114. Dan O'Brian said,

    October 5, 2008 at 6:23 pm

    Gravatar

    Jose_X: There are far more knowledgeable people working on OOo, Gnumeric, Excel, etc spreadsheet apps than we are (combined, even, I’m sure), so I’ll leave it up to them to solve (if it can be solved) or not. If they haven’t solved it by now, I’d imagine it’s not as simple as you imagine it to be.

  115. Roy Schestowitz said,

    October 5, 2008 at 6:26 pm

    Gravatar

    It’s an old discussion and a solved discussion..
    http://www.robweir.com/blog/2006/10/leap-back.html

  116. Dan O'Brian said,

    October 5, 2008 at 6:26 pm

    Gravatar

    Roy: Like I said, I’ll leave it up to more knowledgeable folks to figure out. If they find they need support for different epochs, then they need support for multiple epochs. If they decide they don’t, I trust that my data will continue to work (unless proven otherwise) and that’s all I care about in the end.

  117. Jose_X said,

    October 5, 2008 at 6:29 pm

    Gravatar

    >> No, it doesn’t – that’s the problem. All it knows is that the field looks like a number. It doesn’t necessarily know if it is a date or not.

    So then we map to an ODF number and not to an ODF date. Simple. The same would go if we had wanted to use OOXML or any other format that has a date tag. We would not map to its date tag but instead would map to the regular number tag.

    However, in any particular case, the application may know that we are dealing with a date. In which case, it would be able to save to the ODF date with the proper adjustment along the way.

    Say we have an Excel spreadsheet that has a date formatting for a number. Then Excel/OO.o presumably needs to use the broken formula on that number to format it properly. Fine, but what we then do is we save that number as a date but adjusted as necessary when we save to ODF. Then when we read that ODF file later, we use the proper formula on the already adjusted number. If we want to convert back to binary Excel format, we make it a number type again and adjust its value backwards. In either format, we can deal with that *known* date properly. That number was marked for life as a date through its date formatting from the original creation as data within an Excel date formatted cell.

  118. Jose_X said,

    October 5, 2008 at 6:38 pm

    Gravatar

    >> If they haven’t solved it by now, I’d imagine it’s not as simple as you imagine it to be.

    In other words, you aren’t willing to provide an opinion on why ODF should be one way or the other.

    Do keep in mind that there are many decisions that are taken by people not based on technical feasibility.

    Dan, if you have a link to where you think competent people are having this discussion, please post it. I think I might want to get in on the act or at least hear the reasons given.

    I provided feedback to the OIIC formation discussion list, but they weren’t interested in covering specifics. I started on that road and was told by Mary McRae (is that how you spell it) that engaging in specifics of that nature was prohibited on that list. The specifics will be carried out in private (though joining up is allowed if you pay the $300).

    I would have no problem giving a particular pov if it would help a public discussion and if I didn’t have to dedicate too many resources to the task beyond the time required to do the contributed postings (the though process, etc).

  119. Jose_X said,

    October 5, 2008 at 6:52 pm

    Gravatar

    >> It’s an old discussion and a solved discussion..
    http://www.robweir.com/blog/2006/10/leap-back.html

    For the record, I’ll quote here from that piece from Rob’s blog:

    >> The “legacy reasons” argument is entirely bogus. Microsoft could have easily have defined the XML format to require correct dates and managed the compatibility issues when loading/saving files in Excel. A file format is not required to be identical to an application’s internal representation.

    >> Here is how I would have done it. Define the OOXML specification to encode dates using serial numbers that respect the Gregorian leap year calculations used by 100% of the nations on the planet. Then, if Microsoft desires to maintain this bug in their product, then have Excel add 1 to every date serial number of 60 or greater when loading, and subtract 1 from every such date when saving an OOXML file. This is not rocket science. In any case, don’t mandate the bug for every other processor of OOXML. And certainly don’t require that every person who wants the correct day of the week in 1900 to perform an extra calculation.

    Microsoft’s reason for keeping things broken exist, but that doesn’t mean ODF should follow their lead. Let Microsoft keep OOXML the laughing stock that it is within tech circles. Let us keep ODF sound. Or I should specify, if Microsoft messes up ISO ODF, OASIS should not follow suit.

    People, open source is the key. ODF and other open standards are secondary. Standards are meant to enhance interop, but when that cannot be achieved, these standards lose their value. And interop among independent third parties within the context of a closed source monopoly dominated market is nonsensical.

  120. Jose_X said,

    October 5, 2008 at 6:55 pm

    Gravatar

    I should note that Rob is co-lead of the ODF TC within OASIS and started up an interop effort to complement the main TC. I don’t have to agree 100% with Rob, though I have found I share many of his views, including what I quoted above. And I really don’t think I am alone in agreeing with Rob.

  121. Roy Schestowitz said,

    October 5, 2008 at 7:03 pm

    Gravatar

    Alex and Dan are only here to carry a “this site is wrong” banner, so I don’t expect them to agree with Rob. I would expect them to endlessly try to give the impression that the messenger can’t be trusted because they simply don’t like the messages. It’s a dangerous stubbornness.

  122. Dan O'Brian said,

    October 5, 2008 at 7:59 pm

    Gravatar

    Rob Weir probably counts as someone more knowledgeable than me, so if he says that it’s not needed, I’ll accept that it’s not needed.

    I was only explaining what I thought AlexH was trying to explain (I admit to knowing very little about the internal workings of spreadsheet applications).

    My position on this subject has always been that I’d leave it up to the experts.

  123. Jose_X said,

    October 5, 2008 at 8:08 pm

    Gravatar

    Dan, experts are bought and sold all the time.

    You should pay attention to arguments if you want to avoid being manipulated by the unscrupulous.

  124. Jose_X said,

    October 5, 2008 at 8:10 pm

    Gravatar

    Dan, if you don’t want to think something through (it can be difficult at times because it would take a lot of preparation to get up to speed), hire or find someone that you trust would understand and be honest with you about it.

  125. Dan O'Brian said,

    October 5, 2008 at 8:15 pm

    Gravatar

    Jose_X: that’s pretty funny coming from you… you refuse to contact the people implementing the specs and Roy refuses to contact anyone ever involved with the processes (e.g. he refuses to contact GNOME developers before accusing GNOME of depending on Mono).

    When I say experts, I mean the experts implementing the Free Software office applications that are very unlikely to have been “bought” and/or other experts that I trust (which in this case is limited to the aforementioned group because I don’t happen know any proprietary office developers).

  126. Dan O'Brian said,

    October 5, 2008 at 8:20 pm

    Gravatar

    In case it wasn’t obvious to you, I use OpenOffice.org, Gnumeric and Abiword – all Free Software office suites. I trust those developers to Do The Right Thing(tm).

    I care little about the file formats and the standards committees because I have far too many other things on my plate (like products I’m responsible for), and, as I said above, I trust the people involved with OOo, Gnumeric, etc to DTRT and make my documents continue to work.

  127. Dan O'Brian said,

    October 5, 2008 at 8:21 pm

    Gravatar

    At some point, everyone has to trust other people to do their jobs, otherwise nothing can ever get done because you’d be too busy making sure everyone else was doing their job.

  128. Jose_X said,

    October 5, 2008 at 8:32 pm

    Gravatar

    Dan, I believe in FOSS more than in open standards. Because of general repetitive pleas from Groklaw, I decided to participate in the politics of standards setting briefly but came away dissatisfied. In the end, it’s OASIS’ sandbox, and they will do what is in their best interest. It certainly looks like they will do a better job than ECMA and Microsoft dominated groups (note that Microsoft may come to dominate OASIS or any other group in time.. it’s possible). I am willing to let them do their thing. As is, I have found ODF better than OOXML from what bits I have heard.

    And I don’t refuse to contact anyone. Do you have contact info because I don’t. What I refuse to do is waste time. Everyone has to prioritize their time.

    >> When I say experts, I mean the experts implementing the Free Software office applications that are very unlikely to have been “bought” and/or other experts that I trust (which in this case is limited to the aforementioned group because I don’t happen know any proprietary office developers).

    I should mention that “experts” disagree all the time.

    Also, I should mention that those developing “free” office suites are sometimes (many times perhaps) paid. Their software may be “free software” as defined by the FSF, but that doesn’t mean they work for free.

    Finally, if you do listen to these groups, you probably want to be trying to explain to AlexH why many of these groups don’t like OOXML instead of why some do.

  129. Jose_X said,

    October 5, 2008 at 8:38 pm

    Gravatar

    I should also clarify, FOSS projects have their own politics. But with FOSS you can fork if you think differently and are willing to go to the trouble. To some extent you don’t need cooperation from others to get your fork to work. With standards, OTOH, forking is a bigger deal (assuming it would even be acceptable based on OASIS copyrights.. IANAL). The whole concept of standards are to get many to agree. Individual standard is a bit of an oxymoron.

  130. Dan O'Brian said,

    October 5, 2008 at 8:52 pm

    Gravatar

    The people I know working on OOo are paid, the people I know working on Abiword and Gnumeric are not.

    However, even though the people I know working on OOo are paid, I trust their honesty.

    As far as AlexH, how do we know he doesn’t listen to these groups?

  131. Jose_X said,

    October 5, 2008 at 8:55 pm

    Gravatar

    >> At some point, everyone has to trust other people to do their jobs, otherwise nothing can ever get done because you’d be too busy making sure everyone else was doing their job.

    Just in case I was misunderstood, I wasn’t trying to be condescending or sarcastic. I was honestly saying that we should seek advice/help from individuals/groups that we find trustworthy in order to help us manage complexity. Complexity is anything we haven’t yet taken the time to figure out for ourselves. Time is a limited resource. What one day appears to be extremely complex, can later appear to be quite simple. What one can figure out, so can others. But we all have limited time. In my case, I may not be taking the time to try and understand the problem as well as I can or to present it as well as I can, but am so far willing to keep up with contrary arguments. Does someone want other/better examples from me than whatever I may have given?

    As an aside, I am here partially watching the “Brad Pitt Troy movie”. The scene just shown was of the Trojan king right after the Trojans defeated the Greeks who now supposedly will go back home. An argument is made to the king that attacking the Greeks by their ships would be foolish. The king ignores this because his trusted priest person says that the gods think the Greeks will be vanquished in an attack.

    Funny coincidence. We have to trust someone whenever we don’t dive into the details of something. Sometimes it works out and sometimes it doesn’t.

  132. Jose_X said,

    October 5, 2008 at 9:05 pm

    Gravatar

    >> As far as AlexH, how do we know he doesn’t listen to these groups?

    Or that he does. Or that I do or don’t.

    Should we attack the Greeks? Whose advice do we take or do we dig into the details?

    Anyway, I don’t worry about misunderstandings if people can/will work to fix them. More upsetting is purposeful deception. As long as we stay away from purposeful deception as much as possible everything should work itself out slowly. We all cheat here and there though. Balance is good. I have seen myself and others go overboard at times. On the surface, I think most people will expect anyone trying to defend Microsoft to come a little bit more prepared than usual, and they will be seen very critically if they don’t do a convincing job.

  133. Jose_X said,

    October 5, 2008 at 9:32 pm

    Gravatar

    >> And I don’t refuse to contact anyone. Do you have contact info because I don’t. What I refuse to do is waste time. Everyone has to prioritize their time.

    Let me add.. I don’t think anyone wants to waste time.. ie, others don’t want to waste time with me either. To enter into some discussions, you need to do some homework if possible. That takes time.

    In any case, if anyone has a link to a related public discussion, feel free to post that info here for the benefit of all.

  134. AlexH said,

    October 6, 2008 at 1:23 am

    Gravatar

    Good grief, so much comment over such a small issue.

    @Jose:

    You said, “If we don’t know this extra date context, then we play it safe and keep the integer as an integer.”.

    That’s precisely the situation! We don’t have the extra information, so the integer stays as an integer.

    However, the integer is still a buggy offset and is usually “one off” (i.e., is x+1 when the real value should be x).

    So the situation is that you have to encode various schemes in order to deal with the buggy data, because you cannot convert it when you upgrade the file format.

    It’s really as simple as that.

  135. Roy Schestowitz said,

    October 6, 2008 at 1:40 am

    Gravatar

    I assume that you will continue to disagree no matter the evidence you are presented to refute the argument.

    As I wrote earlier, this discussion was resolved before; Microsoft just didn’t fix its specs though.

  136. AlexH said,

    October 6, 2008 at 1:56 am

    Gravatar

    @Roy: your ‘evidence’ in a blog post from 2006 is pretty much trumped by the normative reference to the ODF 1.2 draft.

    If this was so easy, no-one would bother to encode the legacy behaviour into ODF 1.2. However, it’s not that easy, so that behaviour is being put into the standard.

    Coping with legacy data makes ODF actually useful. If we couldn’t convert old data, ODF would be a significantly harder sell. There is a big difference between legacy file formats and legacy data, which people here don’t seem to understand.

  137. Roy Schestowitz said,

    October 6, 2008 at 2:02 am

    Gravatar

    http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-25680

    Luc Bollen said,

    October 3, 2008 at 9:27 am

    Openformula (part of ODF 1.2) doesn’t MANDATE the bug, as ECMA376 was doing. From http://wiki.oasis-open.org/office/About_OpenFormula

    “Doesn’t mandate mistakes. Just because one program gets something wrong doesn’t mean that everyone should make the same mistake. The specification is carefully written to not require certain bugs, just because someone has a bug. For example, Excel incorrectly believes that 1900 was a leap year, and at least draft version 1.3 of the Excel specification claims that compatible applications must make the same mistake. Nonsense. Instead, OpenDocument wisely stores dates as dates (not just numbers), and thus does not require that applications have this bug. The Excel specification also requires that applications cannot be more capable than Excel (it doesn’t permit support for dates before 1900). Again, nonsense. In fact, at least one OpenDocument spreadsheet application (OpenOffice.org Calc) can correctly calculate dates and date differences going back to 1583! Similarly, many applications handle complex numbers in a very clumsy way; we’ve devised the specification to make sure that future applications can support better approaches, instead of tying their hands to a technique known to be poor.”

  138. Roy Schestowitz said,

    October 6, 2008 at 2:07 am

    Gravatar

    @AlexH: Why you haven’t a website or blog? I just want to see a bit more about you and your knowledge. PLEASE, show me who you are.

    Maybe you’re the AlexH from http://www.contoso.com?
    http://en.wikipedia.org/wiki/Contoso

    http://center.spoke.com/info/pDJMWq/AlexHankin

    Alex Hankin
    Contoso, Ltd.
    Senior Director
    New York, NY

    Skype: AlexH
    Home IM: Alex@hotmail.com
    Home Email: Alex@hotmail.com
    Work Email: alexhankin@contoso.com
    Work IM: alexhankin@contoso.com

    Telex: 781 234
    Home (208) 555-5656
    Mobile: (775) 551-2345
    Fax: (207) 555-9999
    Direct: (207) 555-1112
    Tel: (207) 555-1000

    It’s actually Alex Hudson.

    http://www.alexhudson.com/

  139. AlexH said,

    October 6, 2008 at 2:49 am

    Gravatar

    Haha, I missed that :)

    Slightly sad that people will attempt to tie you to Microsoft for expressing opinions which don’t fit with their world view. One thing I respect about Jose is that he always argues on the topic, not ad hominem.

  140. Roy Schestowitz said,

    October 6, 2008 at 2:55 am

    Gravatar

    AlexH,

    What raises this suspicion are actual past incidents. Microsoft deserves no trust anymore as was caught many times before employing forum shills and such (some examples). It continues to this date.

    BTW, you have not seen that comment because it’s only moments ago that I checked to see what was trapped by the automated filter.

  141. AlexH said,

    October 6, 2008 at 3:14 am

    Gravatar

    @Roy: sure, and I understand that.

    I just think some people find it very easy to wonder aloud at possible connections as a way of avoiding discussion of actual issues.

  142. Pedro Gimeno said,

    October 6, 2008 at 3:51 am

    Gravatar

    I disagree with AlexH when he says it’s impossible to fully support legacy documents through import filters. It would require some support from the ODF spec for it to work, though:

    1. Implement a “Legacy Date” cell format. This cell format interprets a cell’s number as a date with the 1900 bug for showing. Cells with date format in Excel files would be converted to “Legacy Date” when imported.

    2. Implement a “LEGACYDATE()” function which accepts one argument, which converts a number into a proper date taking into account the 1900 bug. Excel formulas which have functions accepting dates as arguments would be fixed so that each argument that is accepted as a date is first passed through LEGACYDATE(). For example, WEEKDAY(a3+b3) would become WEEKDAY(LEGACYDATE(a3+b3)).

    Scripts can’t be supported, though. It’s impossible to analyze a script and they would require manual fixing.

    Of course portable documents should never use the legacy cell format or function. Documents intended to be portable should be manually transformed to get rid of the legacy bits.

  143. Pedro Gimeno said,

    October 6, 2008 at 3:55 am

    Gravatar

    Forgot to say that the very same logic could apply to the 1904 quirk.

  144. AlexH said,

    October 6, 2008 at 4:10 am

    Gravatar

    @Pedro:

    It’s a nice idea, but it’s not great for a couple of reasons. A big one is that you’re adding this LEGACYDATE() function into existing formulas, which will confuse users who are expecting the previous formulas.

    That’s actually a huge issue: formulas are basically user interface, and is one reason why they are so clunky even in OpenDocument 1.2. If we were designing something from scratch right now, I don’t think it would look much like the existing system, but migration is a huge problem.

    The second issue is that you’re dropping all this *LEGACY() stuff into the sheet, but you’re getting the same effect as setting a base epoch sheet-wide.

    So I agree that it could work (although it could fail if anyone has written custom functions to do date manipulation within a spreadsheet), but it’s the same solution as that already proposed in OpenDocument 1.2: you put in place the facility to manage dates with legacy epochs. I would venture that the ODF solution is cleaner; you’re writing the same code (changing date offsets), but putting the function call internally in the spreadsheet code rather than externally in the spreadsheet formula.

  145. AlexH said,

    October 6, 2008 at 4:11 am

    Gravatar

    Hm, I meant to add that I said you can’t convert the data. Obviously, you can convert formulas which do the calculation to take into account different data, but as I said, that’s effectively the same solution as ODF 1.2 proposes.

    So my point was about numbers in cells, not formula function calls.

  146. Ianp said,

    October 6, 2008 at 4:25 am

    Gravatar

    I’ll use OO as the program in this example. If you open a “xls” file in OO then OO should know it will have the bad date problem. So when you’ve finished working on it, you then decide what format to save it in. 0O will then make this decision based on your choice of format to save it in, “If save in XLS format, save in bad format else if save in ODF format then save in correct format”.
    If you know the format of the file, (embedded info or by file extention) then you should be able to work out what the integer represents. If you can’t work it out then that file is “dead” unless you use the original spreadsheet program that created it.

  147. Jose_X said,

    October 6, 2008 at 6:57 am

    Gravatar

    AlexH, I don’t see what is confusing you. Let me put this simply and then we can fill in the exceptions as we come to them.

    Take one:

    You said originally:
    >> The problem is that you can’t just “convert” user data when you convert the file format, because spreadsheet data isn’t typed and you can’t know which numbers to adjust.

    Here is my simplified response:

    The formatting or some other clues give away the intended usage of a number as a date that uses some (possibly broken) algorithm; thus, you can convert this number type value into an ODF date type value, adjusting so as to map the original value into a value that works with the correct algorithm.

    There is no problem. We know we had a date. We know the formulas to use in all cases.

    If no such clues can be found then don’t convert. In other words, don’t convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.

    Again there is no problem. We simply mapped each number to itself and used an ODF number type. If the orig was not intended as a date, we correctly left it alone. If the orig was intended as a date by some other application (since the formatting wasn’t done for user visual purposes), we still preserved that value. There is no problem. New applications would not treat a number as a date because it got saved under the number type not the date type.

    Where is the problem? Also, if you think there is a problem, give an example.

    Take two:

    >> Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it’s set to 1931-2030 or something, so you know that ’98′ == ’1998′, ’31′ == ’1931′, etc.

    >> It’s exactly like that. Unless you know what the “range” is to begin with, you cannot hope to convert the data accurately, because you’re missing enough information.

    In simple terms here is why this example you gave before is not a counter-example.

    You are *not* missing the type information in this example. The config option is known if you convert using the app that uses that config option (presumably the same app the user would use to open the file anyway or else the user would be screwed anyway, even prior to any conversion).

    The config option is known and the ODF date semantics are also known. The mapping is straightforward.

    So why is there a problem here? We know we have a date. We know all the conversions necessary.

  148. AlexH said,

    October 6, 2008 at 7:46 am

    Gravatar

    @Jose: but you’re contradicting yourself. You say:

    If no such clues can be found then don’t convert. In other words, don’t convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.

    I’ve told you in many instances you can’t know whether or not a certain number is a date. You can’t say “don’t convert”, because then anything which uses the unconverted values starts spitting out the wrong answer!

  149. Jose_X said,

    October 6, 2008 at 8:15 am

    Gravatar

    >> but you’re contradicting yourself

    You didn’t show any contradiction. I think you aren’t understanding what I am saying. Please show the two items that contradict.

    >> I’ve told you in many instances you can’t know whether or not a certain number is a date.

    Of course you can know if a number is being used as a date. One way is if it is formatted as a date.

    [This formatting information is found within the same file as the number in the case of Excel spreadsheets (or so is what reverse engineering or special access to Micrsoft has determined I believe.. as I think that is how OO.o interprets Excel files).]

    >> You can’t say “don’t convert”, because then anything which uses the unconverted values starts spitting out the wrong answer!

    What are you talking about? Can you give an example to this nonsensical statement. I must not be understanding you.

    You need to give more context in your replies.

    [I'm waiting any minute now for me or you to start saying "oops, my bad", but it's not happening. This is emboldening me to be more reckless to see if I go too far, but the problem is that you are not giving examples, as, in fact, you did not challenge my rebuttal of your lone example.]

  150. Jose_X said,

    October 6, 2008 at 8:21 am

    Gravatar

    Oh, OK, I think I see where you think I was contradicting myself.

    Let me rephrase.

    >> If no such clues can be found then don’t convert. In other words, don’t convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.

    If no such clues can be found for a particular numerical value then don’t convert that value. In other words, don’t convert the original file format to ODF or do convert the original fle to ODF but mapping such a numerical value identically to itself and to an ODF number type.

    Anyway, so what is the problem now, with anything of what I wrote in this reply from which you quote? http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-26078

  151. Roy Schestowitz said,

    October 6, 2008 at 8:37 am

    Gravatar

    Alex, I’m really not following you either. If you can open a file, you can interpret the data and then store it properly, bug-free.

  152. AlexH said,

    October 6, 2008 at 8:59 am

    Gravatar

    Ok, let’s start at the beginning and see if this helps.

    First, there are no “dates” – that concept doesn’t exist. A spreadsheet stores numbers. Some of those numbers may be formatted as “dates”, where the spreadsheet interprets them as date offsets from an epoch.

    Formatting is no indicator of type. Where a value is used in a calculation, it may or may not be formatted correctly – that’s up to the user.

    Second, if you’re altering data, you have to ensure you have correctly altered *all* instances within a spreadsheet. So, you cannot do a reverse topological sort from “date formatted” cells to try to work out what other cells contains “dates”: there may be data that is in the sheet, but not currently used in a calculation. One obvious example would be the result of a VLOOKUP().

    It’s that simple. You cannot look at a number like “5″ and say “oh, that’s actually a date”. There are some clues in a spreadsheet. There are not sufficient clues to know. This is why no vendor automatically converts values; if it was that easy people would do it!

  153. AlexH said,

    October 6, 2008 at 9:04 am

    Gravatar

    @Jose: Ok, here’s a simple example in faux CSV for you to convert to non-buggy format:

    “15″,”0″,=A1+B1
    “0″,”2″,=A2+B2
    “0″,”5″,=A3+B3
    “=WEEKDAY(OFFSET(C1,2))”

    A1:B3 are formatted as “date”. Should be easy, no?

  154. Ianp said,

    October 6, 2008 at 9:06 am

    Gravatar

    @AlexH “I’ve told you in many instances you can’t know whether or not a certain number is a date. ”

    If you encounter this situation, you’ve got a dead file. If a new spreadsheet application cannot find out if its operating on a number or a date then that makes the file unusable so this argument is meaningless.

  155. Jose_X said,

    October 6, 2008 at 9:13 am

    Gravatar

    Consider 3 hypothetical files using the old extension “.old” and how these would be converted into ODF.

    file nodates.old:
    5 3 6

    file hiddendates.old:
    5 3 6

    file obviousdates.old:
    5 3 6 fd232

    The first file is a spreadsheet that has 3 number values. These values have to do with how many oranges, pears, and watermelons we sold last week. There are no dates.

    The second file has a “3″ which is a date. It refers to the third day during the week. Ordinarily, we can’t tell this refers to a date (let’s assume we can’t tell in this case unless we ask the author manually — ie, there is no hint in the .old file that this is a date).

    The third file has a date also, but this information can be deduced because the “fd232″ code means that the second field (the “3″) is a date interpreted according to the broken leap year formula (I made this code up but let’s flow with it).

    Here is how I am saying that these three files would be handled. An app that understands these .old formats would map…

    .. nodates.old into an ODF file with the numbers not changing values and staying as number types within a table.

    .. hiddendates.old into an ODF file with the numbers also not changing values and staying as number types within a table. Thus hiddendates.odf and nodates.odf may look essentially the same.

    .. obviousdates.old into an ODF file with the “5″ and “6″ staying put as number types; with the “fd232″ turning into whatever formatting code yields the same effect in ODF as in .old; and with the “3″ being turned into the right number so that it maps properly when we use the correct date formula, and the type of the data would be date.

    Now, nodates.old -> nodates.odf presented no ambiguities to the app doing the saving. There was nothing much to be done: the data looks identically as it should.

    hiddendates.old -> hiddendates.old presented no ambiguities to the app doing the saving. There was nothing much to be done: the data looks identically as it should. Alex has no problem with this case because I did not translate. The data is the same. No information is lost. No new semantics are implied because I used the ODF number type which has ordinary number semantics (not date semantics) just as is found in the .old files.

    obviousdates.old -> obviousdates.odf presented no ambiguities to the app doing the saving because it was able to identify the date and know the associated algorithm and it knows the algorithm associated with the ODF date type. New applications know that the ODF date type uses the correct algorithm, so no prob there. Old applications, can’t even read ODF, so a helper function would need to be constructed. This helper function knows to convert ODF date values into the values used by the .old as all of the information needed is known: the semantics of the date type for ODF are known and the semantics for the .old dates are also known (otherwise we would not have translated to ODF in the first place.. but the “fd232″ was assumed to give this information in total).

    Note that if “fd232″ did not include everything we needed (eg, the algorithm, any timezone offsets, etc) then we would have applied case 2 and simply mapped the number into an identical valued ODF number of the number type.

  156. Roy Schestowitz said,

    October 6, 2008 at 9:15 am

    Gravatar

    Alex, I’ll repeat myself for the who-knows-what time. If your data file contains enough information for an application to interpret the meaning of values (and type), then you can ‘rescue’ this data from the bug. It’s very simple, really!

  157. AlexH said,

    October 6, 2008 at 9:39 am

    Gravatar

    You three still don’t get it. There is data on a spreadsheet which isn’t necessarily part of any calculation chain.

    The only way your “conversion” system could work in the face of that is to flag individually each cell which had been “converted”, so that the stuff you couldn’t convert could be later “fixed”. But it’s ugly, and quite rightly no-one does it.

    ODF 1.2 takes the right approach by allowing variable epoch calculation. It’s simple, and it works.

  158. Jose_X said,

    October 6, 2008 at 9:44 am

    Gravatar

    Alex, I looked over your example. It takes a little while because I haven’t coded in this for a while.

    >> First, there are no “dates” – that concept doesn’t exist.

    Fine. I accepted this from the start. At least we are on the same page so far. Step 1: check.

    >> Formatting is no indicator of type….

    If it is formatted as a date, I suggested we do assume it is a date; otherwise, this is a bug in the original spreadsheet.

    Sure, a value can dub as a date and as a password or something else. These oddball cases should be rooted out. The conversion would presumably be done by someone that has a clue over the specifics of the spreadsheet page. In any case, this odd scenario likely is not common. Also, there is no need to convert. A conservative company would start by not converting anything or converting and checking. However, it makes no sense to bind all time into the future to use the bugs of the past on account of a failure to find a simple rule that would apply 100.00% of the time.

    >> … Where a value is used in a calculation, it may or may not be formatted correctly – that’s up to the user.

    Right. Any arbitrary value can be used as a date (but not indicated as such within the same file or through any clues given to the processor converting into ODF) by any arbitrary piece of code, whether that code is called a spreadsheet formula or is a utility application that resides on another file on another computer on another network.

    In the absence of date formatting and any other needed information that would be needed by the given file type to suggest an unambiguous date, we don’t map into the ODF date type. Instead, we map identically into the/an ODF number type.

    >> Second, if you’re altering data, you have to ensure you have correctly altered *all* instances within a spreadsheet. So, you cannot do a reverse topological sort from “date formatted” cells to try to work out what other cells contains “dates”: there may be data that is in the sheet, but not currently used in a calculation. One obvious example would be the result of a VLOOKUP().

    First let’s start by pointing out the these scenarios may apply to some spreadsheet file types but not to others.

    Now, my short answer here is that if we can’t tell for sure, then as stated already, we map the numbers unchanged into ODF number types. This amounts to an identity/null conversion and is no worse than what OOXML demands.

    I may try and break this down more later to analyze Excel files. Worst case, we would have all Excel file numbers map identically into ODF number types. However, the ODF date type is still there for when we know we have a date.

    >> It’s that simple. You cannot look at a number like “5″ and say “oh, that’s actually a date”. There are some clues in a spreadsheet. There are not sufficient clues to know. This is why no vendor automatically converts values; if it was that easy people would do it!

    If we don’t know, we don’t adjust the values or map to ODF date types. There is no problem. This simply means we aren’t trying to deduce semantics from the old format to identify candidates for the ODF date type.

    These cases don’t present problems.

    And the cases where we do have enough info means that the converter can know if an injective mapping is possible (to guarantee that we can find the inverses uniquely or at least without problems — depending on the particular semantics of the file format, the mapping may not even need to be injective). BTW, X+1 is essentially injective as are all linear functions (scaling and translating). http://en.wikipedia.org/wiki/Injective_function

    The point though is that we would have to be sure we could undo the “damage” of conversion. If we couldn’t guarantee that, then we would not attempt the mapping into the ODF date type and just stick with the number type.

    Remember that we aren’t just talking about Excel spreadsheets. Any arbitrary file might be mappable into ODF. ODF is a general purpose file format. It makes no sense to cripple it when all scenarios can be handled gracefully. Sure, for Excel files, maybe a crippled ODF would smell just as bad, but we don’t have to accept a smelly ODF format period, as we can do better.

  159. Jose_X said,

    October 6, 2008 at 9:48 am

    Gravatar

    Roy, #comment-26098 is not showing up.. did it get filtered? Should I repost?

  160. Roy Schestowitz said,

    October 6, 2008 at 9:55 am

    Gravatar

    Jose, it entered the queue for moderation and I’ve just recovered it.

  161. Jose_X said,

    October 6, 2008 at 9:59 am

    Gravatar

    >> The only way your “conversion” system could work in the face of that is to flag individually each cell which had been “converted”, so that the stuff you couldn’t convert could be later “fixed”. But it’s ugly, and quite rightly no-one does it.

    The “flag” is automatic. It is called the date type. Only things converted become the date type. Again, it is automatic.

    If a strange bifurcation would be needed to account for all possibilities, then we could just not convert to the date type. The date type implies “date” and nothing else. The/a number type can always be used.

    I’ll quote from the comment that hasn’t showed up yet,
    >> Sure, a value can dub as a date and as a password or something else. These oddball cases should be rooted out. The conversion would presumably be done by someone that has a clue over the specifics of the spreadsheet page. In any case, this odd scenario likely is not common. Also, there is no need to convert. A conservative company would start by not converting anything or converting and checking. However, it makes no sense to bind all time into the future to use the bugs of the past on account of a failure to find a simple rule that would apply 100.00% of the time.

    What I mean here is that something formatted as a date might also take on a very different role. In this case, changing that value, although correct insofar as the role of the number as a date is concerned, would lead to problems for the number’s alter ego.

    Remember that we can always be conservative and not convert, but this is no reason not to have a correct date type.

    In fact, I don’t see any argument for having an incorrect date type. If the old Excel files don’t have date type information as you say, then why ever would we consider converting into a date type (except to be aggressive)? Hence date types would only be used for new data, in which case what does the legacy argument have to do with anything since legacy means not new.

  162. Shane Coyle said,

    October 6, 2008 at 10:05 am

    Gravatar

    Maybe I am confused, but look at it this way.

    Suppose some govt office has a bunch of old spreadsheets which were saved by this buggy excel version, even suppose they still have that old 386 and the excel version running to access them and print to that ancient printer over there, once a year (if ya think its not likely, ya havent worked for the govt).

    Anyhow, we decide we want to open those files in a shared folder from our shiny workstation with Office 14 or OO3 or whatever. The modern app needs to know how to tell if that file has this known bug, render the information correctly in present use, and also cannot (IMO) change the file itself by ‘repairing the bug’ because it would wreck the file for its native app version, which expects its ‘buggy’ data in order to give the expected result.

    Translating is ugly, but the important thing is the data and getting it right, each time we open it. In terms of file type conversion, different issue because then you know you can safely ignore that version-specific bug and just save the correct information after you translated it in and corrected for the bug.

    My wonder is more, was this bug ever fixed, or was this a case of hiding your sins in a closed source/closed format application?

  163. Jose_X said,

    October 6, 2008 at 10:06 am

    Gravatar

    >> ODF 1.2 takes the right approach by allowing variable epoch calculation. It’s simple, and it works.

    Alright. I have not looked at the details. I can accept an attribute that would specify the algorithm to be used for converting the dates/numbers. That is OK.

    I would want sane defaults.

    But despite this, OOXML’s approach of forcing a twisted conversion calculation looks to be a folly. It’s even sadder if we consider that Microsoft’s own past formats did not type dates (Alex stated this for I would otherwise have no clue). Why, if you are only now going to add dates to the repertoire, would you want a crippled date type? All past “dates” would just map to number types as a conservative default anyway.

    Well, I can hypothesize some reasons why Microsoft would do this. I was trying to imply “why would any format based on technical merits want to have a crippled date type on purpose…”

  164. Jose_X said,

    October 6, 2008 at 10:17 am

    Gravatar

    >> The modern app needs to know how to tell if that file has this known bug, render the information correctly in present use, and also cannot (IMO) change the file itself by ‘repairing the bug’ because it would wreck the file for its native app version, which expects its ‘buggy’ data in order to give the expected result.

    Modern apps have all the information they need (barring proprietary secrets of course). If the format is X then use meaning A. If the format is Y then use meaning B. This is possible if formats X and Y existed when the app was created/updated.

    You are correct that the old application won’t know. But then why would you convert into ODF in the first place since the old application couldn’t read ODF? Not every user would convert their old files into ODF. If you build a translator from ODF into the old format, that translator can make the adjustments as it knows the semantics it needs before and after. If it wouldn’t know the semantics unambiguously (Alex gives examples where messes can occur if we try to be aggressive converters), then this info would have been known and the conversions would not have taken place in the first place (at least not without user approval).

    None of this, we can see, has any implication to lead us to want to cripple the date type semantics of a new format. We are always safe by converting as is into a number type. Future creations of data as date type should be clean.

    We can only imagine why OOXML would force the crippling upon us.

  165. AlexH said,

    October 6, 2008 at 10:20 am

    Gravatar

    @Jose: it doesn’t have a crippled date type. This is purely about the “date as a serial integer” format used by older systems. Opening any older data has this same problem, it’s not a file format issue.

    Seriously, the differences between OOXML and ODF in this area are minimal. Both have a system to deal with older integer encodings. Both have date types which do not feature this bug. Both can cope with different epochs.

  166. Jose_X said,

    October 6, 2008 at 10:37 am

    Gravatar

    >> it doesn’t have a crippled date type. This is purely about the “date as a serial integer” format used by older systems. Opening any older data has this same problem, it’s not a file format issue.

    OK, I went back and read this link http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-25680 and the comparison appears to be against the Excel format and not against OOXML.

    This link also mentioned earlier http://www.robweir.com/blog/2006/10/leap-back.html does mention OOXML; however, as you already noted, is dated two years ago.

    >> Seriously, the differences between OOXML and ODF in this area are minimal. Both have a system to deal with older integer encodings. Both have date types which do not feature this bug. Both can cope with different epochs.

    Are you kidding me? So we worked on a non-issue with the current OOXML and ODF? I mean, sure it was a fun mental exercise to an extent.

    Past experiences suggest that just because you say “everything is fine” doesn’t make it so; however, I have no other reason to complain about any specific format with what I have currently verified.

    I also need to get on with some other work.

    PS: Alex, thanks for the examples you eventually gave. It can be annoying to come up with them, but it helps track down where our minds are not meeting. It’s still not completely clear to me were the gap existed, but I have a better idea. Of course, this potentially being a non-issue …. .. Roy, this forum is a great time sink! Thanks. Thanks a lot ;-)

  167. Roy Schestowitz said,

    October 6, 2008 at 10:45 am

    Gravatar

    Roy, this forum is a great time sink!

    Well, if it’s any solace, this thread/page has been viewed well over 10,000 times and this server fed almost 50 gigs so far this month (mirrors and CORAL excluded).

  168. AlexH said,

    October 6, 2008 at 10:47 am

    Gravatar

    Are you kidding me? So we worked on a non-issue with the current OOXML and ODF?

    Well, I did say at the beginning it wasn’t really a file format issue ;)

    Both OOXML and ODF use the same format for dates – the ISO format – so it only comes down to how to import legacy data. I guess OOXML mandates .xls-compatible defaults, but in practice that’s just stating the bleeding obvious… :)

  169. Ian said,

    October 6, 2008 at 11:50 am

    Gravatar

    @Roy

    “If your data file contains enough information for an application to interpret the meaning of values (and type), then you can ‘rescue’ this data from the bug. It’s very simple, really!”

    I’m always nervous with the concept of a “best guess” data conversion. If you have a spreadsheet with 3 rows and five columns, it’s really not a big issue. When you have a 20 MB file with thousands of possible rows, you have to trust the computer to not screw anything up. Best guess data conversion isn’t necessarily a trustworthy process, certainly something I wouldn’t trust. I don’t care if it’s OO.org, Excel, 1-2-3, whatever.

  170. Roy Schestowitz said,

    October 6, 2008 at 12:10 pm

    Gravatar

    Ian, I was not suggesting that guessing would be involved?

  171. Jose_X said,

    October 6, 2008 at 12:41 pm

    Gravatar

    Ian, yes, as Roy said (I think), sometimes you can be very precise. Alex has been focused on Excel spreadsheets. The ODF date tags could be used as a target from a lot more types of files than Excel spreadsheets, and many of these might make very clear that something is a date (which Excel apparently doesn’t do).

    Your apprehension is shared. Caution applies to any type of automated data manipulation.

  172. Jose_X said,

    October 6, 2008 at 12:46 pm

    Gravatar

    In the interest of fairness, I’ll post a link to the current ODF 1.1. I think focusing on ODF is smarter than helping Microsoft hunt down its bugs with OOXML. Sure, it’s more fun to help debug OOXML or find gotchas, but this is not always a desirable exercise if you care about ODF adoption being taken up against the leverages Microsoft will use to help OOXML.

    From http://www.oasis-open.org/specs/

    We have the full standard on a single webpage. http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1-html/OpenDocument-v1.1.html [note, this is a large webpage]

  173. AlexH said,

    October 6, 2008 at 1:04 pm

    Gravatar

    @Jose: note that current ODF versions don’t standardise formula stuff at all, so it’s not really relevant to them.

  174. Roy Schestowitz said,

    October 6, 2008 at 1:06 pm

    Gravatar

    Well, this page, unlike the discussion, is not about formulas. It’s about formats as a whole.

  175. AlexH said,

    October 6, 2008 at 1:30 pm

    Gravatar

    Given the whole premise of this argument is shaky at best (that ISO freely publish things which haven’t been, er, published..), that’s arguable.

    ISO aren’t exactly the place you’d want to go for standards documents. They charge €220 for ISO 26300…

  176. Jose_X said,

    October 6, 2008 at 5:21 pm

    Gravatar

    An interesting plugin for OO.o would be a tool to help the user visually add semantics to a document, perhaps as “hints” or perhaps by selecting specific tags and filling in attributes (for say a paragraph or highlighted section of text, for a set of cells, for a set of images or image parts, etc). This would be a way to facilitate mappings from older formats to ODF in a visual manner without having to deal directly with XML. Maybe this is already on the way or exists.

  177. tuomoks said,

    October 6, 2008 at 10:32 pm

    Gravatar

    First, I don’t like the way MS handled the ISO process but that’s life, done and closed if not forgotten.. Let’s do better next time.

    Now, the whole date problem! Or other such problems. Nothing new – I once had to sort out what to do in an insurance company, calculating dates 200 years back and 200 years to the future – already 152 (or was it 162?) different programs, procedures, methods, etc which gave different answers but already in many (18000+) applications and used by different databases for calculations! Talk about nightmare, actually only two gave correct answers in each case(!) and one of them was SLOW! So – I can understand the technical pain but not the politics – deal with it, it’s a fact no matter what you think.

    The whole problem as I see it is the current love to (new?) metalanguages. Why not then use SGML and enhance it? Clean, simple, proven, etc? Much faster by design than anything after that? Add LaTeX and simple authentication, authorization, AES encryption, etc to that, together they would support any and all requirements we can think today – even binary or interactive data would be no problem? Not invented here syndrome – one again? Besides – prove to me that metalanguages are better for computing than binary! Maybe for humans but the computers do the work and I’m not even sure of the benefits for humans – it is as easy to read hex, octal, whatever as ASCII which actually covers a very small part of all the needed (human) languages – try to read NLS supported meta sometimes – back to interpretation and translation?

    Sorry, seen these fights too many times (probably?) – they have nothing to do with “a better way to solve a problem” but who’s controlling, who makes the decisions, politics (and money) as usually. Nothing technically difficult but (still?) understandable, IT is a young field and going through the growing pains.

  178. Roy Schestowitz said,

    October 7, 2008 at 2:10 am

    Gravatar

    Whose control though? There’s a fight for the perception that ODF is IBM, which is false. Does any company own or control HTML, for example?

  179. tuomoks said,

    October 7, 2008 at 3:30 am

    Gravatar

    @Roy – you are right, too many misconceptions who, what, etc. Nobody (not IBM, not Sun, not …) owns ODF, it is a standard. Now, nobody actually owns OOXML either but it has this small problem, the extensions are not defined the way they should in a standard. The problem with OOXML are those undefined extensions – they can include patented, whatever proprietary methods, handling, etc – once used you are stuck! No way out, DMCA takes care of that, you pay and maybe retroactively a lot – to one company, no one else can help you! Or at least with great pain create an environment for conversions back and forth – fun if it even works? The governments, institutions, large corporations, etc follow the rules and for a reason so..

  180. Roy Schestowitz said,

    October 7, 2008 at 4:12 am

    Gravatar

    OOXML is not Microsoft Office? That’s news to me.

  181. Roy Schestowitz said,

    October 7, 2008 at 4:13 am

    Gravatar

    I should add: the fact that the documents presented in this post (OOXML) arrive from Microsoft, well… that speaks volumes. Microsoft us just using ISO as a ‘front’.

  182. tuomoks said,

    October 7, 2008 at 5:11 am

    Gravatar

    OOXML (Office Open XML) is the standard, Microsoft created but not yet (if ever) implemented in any product, not even in MS Office! Just to clarify – an copy :

    “Microsoft originally developed the specification as a successor to its earlier binary and Office 2003 XML file formats. The specification was later handed over to Ecma International to be developed as the Ecma 376 standard, under the stewardship of Ecma International Technical Committee TC45. Ecma 376 was published in December 2006[9] and can be freely downloaded from Ecma International.

    An amended version of the format, ISO/IEC DIS 29500 (Draft International Standard 29500), received the necessary votes for approval as an ISO/IEC Standard as the result of a JTC 1 fast tracking standardization process that concluded in April 2008. Next and last step in the standardization process is the final publication as of ISO/IEC IS 29500, Information technology – Office Open XML formats as an international standard.”

    Yes, of course MS used (again?) ISO/whatever to force something but this time it may not work well. Seen the rumors that MS wants to take over ODF? Good for them, bad for people who let it happen – if they do, I hope not!

    Now, working in/with small and huge corporations I can tell – they think weird! Any, even a small company can participate but for some reason they just refuse and take whatever is given? Huge corporations have their own problems, often slow to react, internal fights which prevent them making decisions before too late, whatever. So, if MS can do the next “cup”, start managing ODF as seems with some other OSS projects (amazingly many) – good for them and instead of complaining people should start working if they don’t like it.

    As I have said, I’m not a big MS fan but at least they react! In some small companies I have worked, the price of a VP lunch would have paid one year in standards committee, guess which one they select – they just keep complaining instead of making their own future! A weird world we have!

  183. Jeetje said,

    October 7, 2008 at 8:25 am

    Gravatar

    AlexH, there are 2 ways to solve bugs in data that are a result of former (faulty) specs:
    1) Carry over the same faulty specs into new specs. We already know what that begets: a 5K+ pages ‘spec’ describing every fault ever made since the inception of the former spec(s), ironically dubbed OOXML
    2) MAP every faulty spec to a correct spec and specify a mathematically correct algoritm to implement that mapping. The resulting correct spec will be a lot leaner and meaner / easier to implement, the ‘downside’ being that most mappings are not reversible (i.e. it’s a one-way street).

    The benefits of option 2 are manifold:
    a) New implementations of the correct spec aren’t burdened with the obligation to account for all possible faults, hence the resulting software will be small and fast.
    b) Converters from old an old, faulty spec to the new, correct spec can be implemented separately, allowing for bulk conversion of old documents into new, cleaned up documents.

    The biggest downside: The original manufacturer of the faulty software (based on its own faulty specs) is caught with his pants down and may very well lose a lot of business to people who ARE capable of keeping data accessible for decades to come.

    Basically, MS used a process akin to ISO 9000 series certification in the most perverse way possible, asking ISO to confirm the way data has been handled since the inception of Word / Excel aso is compliant with the spec they have now drawn up. From a business point of view, ISO had very little choice but to agree the data is compliant to specs, whereas from a technical PoV they should have rejected the whole spec as being a waste of the trees used to produce the paper it was printed on.

    The right way forward is saying byebye to all the errors MS ever made in storing our data, the only corporation that is able to help us do that is MS itself, and if they don’t help us out quickly they may very well help themselves out of business pretty fast (considering how fast we are approaching a big recession, as MS Office still is a pretty poor value-for-money proposition).

  184. AlexH said,

    October 7, 2008 at 8:31 am

    Gravatar

    @Jeetje: I agree with that. It’s just that option 2 isn’t available in this instance, as I have shown many times.

    This isn’t a recent problem, nor is it arguably MS’s problem. If it was so trivially fixable, Microsoft would have done it already – not least in the early days, since that would have caused added incompatibility with Lotus 1-2-3.

  185. Jose_X said,

    October 7, 2008 at 12:07 pm

    Gravatar

    [Jeetje] >> 2) MAP every faulty spec to a correct spec and specify a mathematically correct algoritm to implement that mapping. The resulting correct spec will be a lot leaner and meaner / easier to implement, the ‘downside’ being that most mappings are not reversible (i.e. it’s a one-way street).

    [AlexH] >> I agree with that. It’s just that option 2 isn’t available in this instance, as I have shown many times.

    My two cents:

    It is not a problem to create a new item whose map from legacy is not well defined in all cases. This just means that legacy stays legacy, but the new can have new good solid home.

    As one example, in the case of “dates” in formats that don’t have that type, it just means that you keep them as “numbers”, whether in the old format or the new, if you need to be conservative or want maximal flexibility. Where possible, you may migrate to date types. Also, new dates that are created will have their date type as well.

    As far as having many choices, eg, dates based on X or Y alg or reference point, that is a different issue. I like choice. I also like constraining choice for use cases (that’s what types do.. for particular use cases they limit the range of possibilities). So overall, I have no problem if odd date formats exist, but I like to have “profiles” or whatever you want to call it (eg, “portable documents”) where you will find a restricted well-defined environment. Judicious use of limits for well-defined scenarios is a plus.. but you also want an ample toolbox to be able ultimately to handle a great many scenarios.

    This brings up extensions and monopoly leverage. Extensions are good if used for good. They are bad (too few contract constraints) in the hands of someone that can and will abuse it, eg, via the embrace, extend, extinguish strategy.

    The best of both worlds is to recognize that monopolies and perhaps other types of players need special restrictions but the rest of us don’t (at least not yet). Reach monopoly status, and you graduate. The Microsoft clan should have left a long time ago and left Microsoft on cruise .. to one day be overtaken by others. Their existing power reach while still aboard Microsoft is unhealthy for the rest of us.

  186. Luc Bollen said,

    October 7, 2008 at 12:35 pm

    Gravatar

    @AlexH: I come back to this discussion after a couple of days, and I did not read all the comments made since then. I admire you for being patient enough to continue the discussion.

    I would just like to say that I fully agree with your analysis: the .xls files have not enough information, in some cases, to reliably adjust the data for the 1900 bug.

    We only differ on the semantic analysis of the text contained in the OpenFormula spec: do they *standardise* the “1900 bug” or do they *document* it ?

  187. AlexH said,

    October 7, 2008 at 12:58 pm

    Gravatar

    @Luc:

    I’m not sure what difference you see between standardising something and documenting something. At the end of the day, a standard is simply a documented specification for something.

    Does ODF mandate handling the leap-year bug? No; both ODF and OOXML have a specific date type for data which doesn’t suffer this problem. It only applies to importing legacy data.

  188. oliver said,

    October 7, 2008 at 1:28 pm

    Gravatar

    Is there a Torrent available of these files? I’d like to have it mirrored locally before all public mirrors are shut down…

  189. e7o.de said,

    October 7, 2008 at 2:06 pm

    Bunkern: OOXML-Dokumentation leaked…

    Nach vielen auftauchenden Unregelmäßigkeiten bei der “Normierung” von OOXML ist nun auch der Standard an sich im Netz aufgetaucht. Typisch ist, dass die Copyright-Keule ausgepackt wird und im Blogeintrag deshalb die Datei nicht mehr zu finden ist:

  190. Jose_X said,

    October 7, 2008 at 3:08 pm

    Gravatar

    >> Dokumentation leaked

    Reminds me of piracy.

    It’s all good for the vendor.

  191. rcfa said,

    October 7, 2008 at 4:20 pm

    Gravatar

    Putting bugs into the standard is NOT acceptable.
    A reference to existing user data is not relevant.
    The standard, if it’s worth to be called one, should have the revision of the document format version as part of the file format.
    Thus any legacy spreadsheets, etc. should have a format version smaller than the first format version of the ISO standard. Converting legacy documents should then where possible make the required adjustments (date), or warn the user (calculation issues).
    It’s not acceptable that astronomy and mathematics are redefined for all ages, just because some programmers decades ago weren’t able to think straight.
    There is NO ROOM for legacy bugs in a NEW STANDARD. The removal of these bugs must be part and parcel of the transition from some proprietary format to an international document standard, and the resulting transition will necessarily require similar care as the y2k issue.
    In neither case is it acceptable just to keep doing what was done in the past in order to avoid breaking backwards compatibility.
    People who don’t want these transition pains can stick with the old, proprietary document format.

  192. AlexH said,

    October 7, 2008 at 4:30 pm

    Gravatar

    @rcfa: with that attitude, we’d be stuck with .xls forever more.

    Having a transition plan is the only way you can get people to upgrade to new formats like OpenDocument.

    It’s not technically nice, no. But it’s a practical necessity. A new format which people can’t upgrade to is of very little use to people who need to do real work.

  193. Roy Schestowitz said,

    October 7, 2008 at 4:48 pm

    Gravatar

    And yet, bugs are being rewarded, Watch what Microsoft did to HTML/CSS… deliberately even.

    “We’re disheartened because Microsoft helped W3C develop the very standards that they’ve failed to implement in their browser. We’re also dismayed to see Microsoft continue adding proprietary extensions to these standards when support for the essentials remains unfinished.”

    –George Olsen, Web Standards Project

  194. AlexH said,

    October 7, 2008 at 4:54 pm

    Gravatar

    @Roy: that is a quote dating back about eight years now, though.

    The Web Standards Project has had a Microsoft Task Force for a number of years now, which seems to be having a real effect.

    The web browser market has been changed massively by free software, and Microsoft are not in a position to ignore standards now. And if you want to see fewer places use Silverlight, you should be rooting for better standards support in IE, because without SVG/etc. you don’t have many other options – and it’s only IE behind in that area.

  195. Roy Schestowitz said,

    October 7, 2008 at 4:59 pm

    Gravatar

    @Roy: that is a quote dating back about eight years now, though.

    Ah! That makes it OK. Let’s just forget all the crime where (age >= 2 years).

  196. Roy Schestowitz said,

    October 7, 2008 at 5:00 pm

    Gravatar

    Just to clarify, I don’t compare it to crime in this case, but how often I hear this excuse about age when bringing up heaps of blatant crime!

  197. name required said,

    October 7, 2008 at 5:41 pm

    Gravatar

    download the specification (rar version) from the stealthnet.

    stealthnet://?hash=6AED03BB4BA2B91393BB5E97E5CCA8F49BBF650BD33D7D59D446B4EAA4B10FE2A78528CAA3F48E00EDD075E6A014FD5AC924FDEEB7B4B3CF63ED88860437CE48&name=OOXML-ISO-standard-english_leaked-html-edition_october-2008-1080-boycottnovell.com.rar&size=164435005

    use stealthnet for your p2p needs and participate to make it larger and stronger and enrich it with your content.

    dont let these war- and money mongers rule this planet and enslave humanity any further.

  198. RJoe said,

    October 8, 2008 at 3:11 am

    Gravatar

    Just my opinion in two or three cases involved here…

    First: How can the documentation of an ISO-standard be secret? is there no obligation to publish such a document???

    Second: A new standard should not implement errors of previous applications. The 1900 bug should not even have any effect on the OOXML formatted data, because we talk about a calendar date. This should be formatted yyyy.mm.dd or something like that, but not in days starting from a specific date! If an application wants to be compatible with previous versions, it can rebuild it in it’s internal data.

    It’s a shame what happened in norway these days. The oficials from ISO don’t have any spine. Otherwize they have rejected this document from MS.

  199. AlexH said,

    October 8, 2008 at 3:20 am

    Gravatar

    @Roy: I’m not saying forget about it, I’m just pointing out that things have changed in the meantime. Anyone reading what you wrote might have been confused and thought it was a recent quote, when that is not the current outlook of the Web Standards Project.

    @RJoe: one of the things ISO has always done is charge money for paper standards. They’ve never been published openly except where another organisation also has their own copy (e.g., OpenDocument). That’s obviously something which ought to change.

    Your point about dates is correct, and you’ve actually pointed out how the modern date type basically works. But as I said previously, it’s not as simple as saying “just convert old data”, because you can’t. This hack will be with us for many years to come.

  200. Roy Schestowitz said,

    October 8, 2008 at 4:11 am

    Gravatar

    RJoe,

    ISO was, in part, stuffed by Microsoft employees, so the decision to let this abomination happen was down to Microsoft, too. This impulsive thing was a response to corruption in the process where people got bullied, bribed, blackmailed. I thought that only the ‘non-finalisation’ of the text was the reason it was not out there. It’s surprising to find that so-called ‘open’ standards are not open even for access (an afterthought and a realisation that came to me only later, so I removed the files).

  201. Jeetje said,

    October 8, 2008 at 6:14 am

    Gravatar

    @AlexH, Jose_X: I partly agree, partly disagree with the both of you as far as the ‘the right way forward’ for the year 1900 bug and similar issues is concerned ^^

    I’m on the same page as Jose as far as choice is concerned for using X or Y alg or reference point, however as Rob Weir showed in his piece regarding the YEARFRAC function (http://www.robweir.com/blog/2008/05/fractured-yearfrac-and-discounted-disc.html), those algoritms and reference points need to be unambiguously defined lest we run the risk of crashing another bank or Mars lander ^^

    And if we have two well defined algoritms with associated reference points, it’s a trivial excercise to specify a mathematical mapping from the faulty one to the correct one AS LONG AS one point in the faulty specs space doesn’t map on multiple points of the correct space. If the latter case occurs, context will need to be taken into account to try and estimate the correct mapping and as with all algoritms taking context into account, the best judge of the final result will probably be a human.

    The bigger question though is: how many documents CANNOT be mapped automatically, i.e. need context and maybe human intervention to correct any errors?

    However, SC 34 is still muddying the waters regarding the future spec unifying ISO 29500 and 26300, diluting that process with the simultaneaous task of ensuring the mapping of legacy MS documents to the new format will be relatively painless for MS (i.e. NOT aiming for the best possible unified format for the next coupla decades). Already a number of countries encompassing a sizable portion of the globe’s population have stated their prefered document format is ODF, so if SC 34 doesn’t cut away all legacy fluff from ISO 29500 and strive for unification by the end of 2009, their efforts will become wholly irrelevant. And that would definitely be a shame, as that committee is about the only forum outside MS that is at all able to draw up mappings from faulty specs to correct specs…

    First things first:
    1) A (mathematically correct) unified document format by the end of 2009
    2) see 1
    3) see 1
    4) As soon as 1 has been developed, spawn X workgroups to help out with conversion algoritms from legacy to unified.

  202. AlexH said,

    October 8, 2008 at 6:33 am

    Gravatar

    @Jeetje:

    I think you actually raise two different problems. The “leap year” bug is a very specific and quite unique issue, in that it’s basically impossible for software to “fix” spreadsheets. The best approach so far is to put the standard (legacy) epoch back one day into 1899, so that the values are 99% correct without the need for any conversion; only people with spreadsheets that care about days in 1900 will experience problems. That’s sound engineering.

    The other issues, like YEARFRAC, are where OOXML is not soundly specified enough. I think this is just competition in action: one early advantage of OOXML was that it went much deeper than the OpenDocument specification, and this was touted as a benefit. Now, the boot is somewhat on the other foot, because OpenDocument is reaching the same depths but at a greater level of detail.

    We’re sadly still in the same situation of copying what Excel does, but that’s because this is really user interface. Any change here impacts users, not the vendors.

  203. Pedro Gimeno said,

    October 8, 2008 at 6:51 am

    Gravatar

    @AlexH:

    >> @rcfa: with that attitude, we’d be stuck with .xls forever more.

    Wouldn’t that be .wks instead?

  204. AlexH said,

    October 8, 2008 at 6:56 am

    Gravatar

    @Pedro: well, precisely :)

  205. rcfa said,

    October 8, 2008 at 8:26 am

    Gravatar

    @AlexH&Pedro: No, we wouldn’t be stuck with .xls/.wks forever. The transition might take a bit longer, and it might be a bit more painful, but we’d end up with fixed software (and spreadsheets are software, too).

    The y2k issue was neither quick, nor cheap; it was what you’d call “paying for past sins”. The same needs to happen with these date and calculation bugs. Just define a bug a standard is as ridiculous as redefining the meaning of noon during “summer time” (there’s no such time, because noon is when the sun is highest, not when a bunch of politicians decide it to be).

    The reason I bring up summer time is no accidental: instead of having summer and winter HOURS (as in opening or business hours), the government decides to “cheat” everyone by redefining an astronomical event. They could equally easily mandate that school and government office hours start one hour earlier in summer, and more or less the rest of the economy would follow suit (working parents have to bring kids to school, business want to sell to government, etc.)

    That would be the right approach.
    It seems that getting things done right doesn’t count anymore, only slop counts, as long as it “gets done, who gives a f* how it gets done”. And it’s that attitude that creates that sort of mess in the first place.

    If you screw up, you have to pay for it. You can pay now, or a lot more later. The price just goes up the longer you wait.

    So the point of a quick transition is completely lost if the transition doesn’t fix the legacy issues in the process. I rather see a much slower transition and adoption, but can count that there are no dead legacy dogs buried in new documents.

  206. oliver said,

    October 8, 2008 at 12:20 pm

    Gravatar

    > This hack will be with us for many years to come.

    So if _that_ is already given – what is the plan to get rid of the hack in the long run? I mean, even if I accept that this hack can’t be fixed _now_, can I at least expect that people are working to completely fix this over the coming years? Or did you actually mean to say “This hack will be with us for as long as Microsoft is in business”?

  207. rcfa said,

    October 8, 2008 at 12:37 pm

    Gravatar

    @oliver: what they must mean: “This hack will be with us for as long as nobody has the guts to ratify a standard that’s worth the name standard.”
    Bugs are there to be squashed, not to be elevated to a standard. What’s next, are we going to redefine Pi as the integer 3?

  208. AlexH said,

    October 8, 2008 at 12:52 pm

    Gravatar

    @oliver:

    It will die over time as people move to typed spreadsheet formats. At some point, probably in five years or something, the feature will get dropped from the specs., then later the apps will stop supporting it.

    There’s not really a huge amount of point removing stuff from the specification while it’s still in use by users and has to be supported by applications. That’s one reason why HTML5 is a lot more promising than XHTML2: in fact, XHTML2 is almost the case study in why technical perfection does not work.

What Else is New


  1. Gradual Collapse of Microsoft's Extensive (and External) Patent Trolling Operations

    The President of Microsoft Technology Licensing LLC (patent troll) leaves and the founder of Intellectual Ventures, Microsoft's largest peripheral patent troll, joins Sherpa Technology



  2. No End to Battistelli's Witch-hunts Against the Media, Against Staff, and Against Politicians

    Rumours about the fate of people who are (or have been) criticising Battistelli's reign of terror at the EPO



  3. Links 10/1/2017: Synfig 1.2, Kodachi Linux 3.7

    Links for the day



  4. With Help From the US Supreme Court (Key Cases), Patent Trolls Are Going Away

    The demise of patent trolls in the United States, a trend partly attributable to Alice and other Supreme Court decisions, will likely accelerate soon (later this year) as the future of the Eastern District of Texas courts is at stake



  5. Patent Maximalism on Display: Patent Aggressor IBM Celebrated in the Media

    The patent lust at IBM, which is suing if not just shaking down companies using software patents, earns plenty of puff pieces from the corporate media



  6. FFPE-EPO, the EPO Management's Pet/Yellow Union, Helps Union-Busting (Against SUEPO) in Letter to Notorious Vice-President

    In a letter to Elodie Bergot (as CC) and Željko Topić, who faces many criminal investigations, FFPE-EPO ringleaders reveal their allegiance not to EPO staff but to those who perpetually attack the staff



  7. Links 9/1/2017: Civilization VI Coming to GNU/Linux, digiKam 5.4.0 Released

    Links for the day



  8. Links 9/1/2017: Dell’s Latest XPS 13, GPD Pocket With GNU/Linux

    Links for the day



  9. Update on Patent Trolls and Their Enablers: IAM, Fortress, Inventergy, Nokia, MOSAID/Conversant, Microsoft, Intellectual Ventures, Faraday Future, A*STAR, GPNE, AlphaCap Ventures, and TC Heartland

    A potpourri of reports about some of the world’s worst patent trolls and their highly damaging enablers/facilitators, including Microsoft which claims that it “loves Linux” whilst attacking it with patents by proxy



  10. Mark Summerfield: “US Supreme Court Decision in Alice Looks to Have Eliminated About 75% of New Business Method Patents.”

    Some of the patent microcosm, or those who profit from the bureaucracy associated with patents, responds to claims made by Techrights (that software patents are a dying breed in the US)



  11. Eight Wireless Patents Have Just Been Invalidated Under Section 101 (Alice), But Don't Expect the Patent Microcosm to Cover This News

    Firms that are profiting from patents (without actually producing or inventing anything) want us to obsess over and think about the rare and few cases (some very old) where judges deny Alice and honour patents on software



  12. 2017: Latest Year That the Unitary Patent (UPC) is Still Stuck in a Limbo

    The issues associated with the UPC, especially in light of ongoing negotiations of Britain's exit from the EU, remain too big a barrier to any implementation this year (and probably future years too)



  13. Links 7/1/2017: Linux 4.9.1, Wine 2.0 RC4

    Links for the day



  14. India Keeps Rejecting Software Patents in Spite of Pressure From Large Foreign Multinationals

    India's resilience in the face of incredible pressure to allow software patents is essential for the success of India's growing software industry and more effort is needed to thwart corporate colonisation through patents in India itself



  15. Links 6/1/2017: Irssi 1.0.0, KaOS 2017.01 Released

    Links for the day



  16. Watchtroll a Fake News Site in Lobbying Mode and Attack Mode Against Those Who Don't Agree (Even PTAB and Judges)

    A look at some of the latest spin and the latest shaming courtesy of the patent microcosm, which behaves so poorly that one has to wonder if its objective is to alienate everyone



  17. The Productivity Commission Warns Against Patent Maximalism, Which is Where China (SIPO) is Heading Along With EPO

    In defiance of common sense and everything that public officials or academics keep saying (European, Australian, American), China's SIPO and Europe's EPO want us to believe that when it comes to patents it's "the more, the merrier"



  18. Technical Failure of the European Patent Office (EPO) a Growing Cause for Concern

    The problem associated with Battistelli's strategy of increasing so-called 'production' by granting in haste everything on the shelf is quickly being grasped by patent professionals (outside EPO), not just patent examiners (inside EPO)



  19. Links 5/1/2017: Inkscape 0.92, GNU Sed 4.3

    Links for the day



  20. Links 4/1/2017: Cutelyst 1.2.0 and Lumina 1.2 Desktop Released

    Links for the day



  21. Financial Giants Will Attempt to Dominate or Control Bitcoin, Blockchain and Other Disruptive Free Software Using Software Patents

    Free/Open Source software in the currency and trading world promised to emancipate us from the yoke of banking conglomerates, but a gold rush for software patents threatens to jeopardise any meaningful change or progress



  22. New Article From Heise Explains Erosion of Patent Quality at the European Patent Office (EPO)

    To nobody's surprise, the past half a decade saw accelerating demise in quality of European Patents (EPs) and it is the fault of Battistelli's notorious policies



  23. Insensitivity at the EPO’s Management – Part V: Suspension of Salary and Unfair Trials

    One of the lesser-publicised cases of EPO witch-hunting, wherein a member of staff is denied a salary "without any notification"



  24. Links 3/1/2017: Microsoft Imposing TPM2 on Linux, ASUS Bringing Out Android Phones

    Links for the day



  25. Links 2/1/2017: Neptune 4.5.3 Release, Netrunner Desktop 17.01 Released

    Links for the day



  26. Teaser: Corruption Indictments Brought Against Vice-President of the European Patent Office (EPO)

    New trouble for Željko Topić in Strasbourg, making it yet another EPO Vice-President who is on shaky grounds and paving the way to managerial collapse/avalanche at the EPO



  27. 365 Days Later, German Justice Minister Heiko Maas Remains Silent and Thus Complicit in EPO Abuses on German Soil

    The utter lack of participation, involvement or even intervention by German authorities serve to confirm that the government of Germany is very much complicit in the EPO's abuses, by refusing to do anything to stop them



  28. Battistelli's Idea of 'Independent' 'External' 'Social' 'Study' is Something to BUY From Notorious Firm PwC

    The sham which is the so-called 'social' 'study' as explained by the Central Staff Committee last year, well before the results came out



  29. Europe Should Listen to SMEs Regarding the UPC, as Battistelli, Team UPC and the Select Committee Lie About It

    Another example of UPC promotion from within the EPO (a committee dedicated to UPC promotion), in spite of everything we know about opposition to the UPC from small businesses (not the imaginary ones which Team UPC claims to speak 'on behalf' of)



  30. Video: French State Secretary for Digital Economy Speaks Out Against Benoît Battistelli at Battistelli's PR Event

    Uploaded by SUEPO earlier today was the above video, which shows how last year's party (actually 2015) was spoiled for Battistelli by the French State Secretary for Digital Economy, Axelle Lemaire, echoing the French government's concern about union busting etc. at the EPO (only to be rudely censored by Battistelli's 'media partner')


CoPilotCo

RSS 64x64RSS Feed: subscribe to the RSS feed for regular updates

Home iconSite Wiki: You can improve this site by helping the extension of the site's content

Home iconSite Home: Background about the site and some key features in the front page

Chat iconIRC Channel: Come and chat with us in real time

CoPilotCo

Recent Posts