OOXML Leaked: The Stuff ISO Doesn't Want You to Have (Updatedx9)

Dr. Roy Schestowitz

2008-10-03 01:47:11 UTC
Modified: 2008-10-07 14:08:01 UTC

[Update: Marius has produced this HTML version which is easiest to browse and requires no large-sized downloads. Another reader, Tony Manco, has produced this HTML version (another mirror... and another) of the core of OOXML so that you can access the specs quickly.]

In light of the systematic abuse and the demise of ISO, which IBM loudly protested against [1, 2], we shall no longer let this process remain secretive. We finally have complete copies of the documents which the shenanigans keep behind passwords (unlike ODF which they attack). This includes 6 files, namely:

[Note: appended at the bottom of this post we now have 1081c, 1082c, and 1083c.]

[Note #2: we now have a mirror listed at the bottom.]

For those who forgot the opposition to ISO's bad behaviour, here is another new article about IBM's action.

In a recent announcement IBM said that it would reconsider its membership in the hundreds of bodies that create global standards for everything from software to servers.

Another article says that "IBM Nixes Standards Shenanigans" and further to the exodus in Norway we also have Glyn Moody's take.

A little while back I noted a provocative call from IBM for standards bodies to do better – a clear reference to the ISO's handling of OOXML. Here are some other people who are clearly very unhappy with the same: 13 members of the Norwegian technical committee that actually took part in the process.

[...]

This particular saga is only just beginning...

Feel free to pass around (or even ridicule) those ~60 megabytes of lock-in, which Microsoft won't let you see. This probably still contains many of the known flaws, which stayed in tact awaiting and even deserving scrutiny. ⬆

Update (03/10/2008): we've just added 1081c, 1082c, and 1083c.

Update #2 (04/10/2008): this Web server sporadically goes down due to heavy load (over 10 GB of traffic today, plus lots of CPU and RAM). We've made a mirror available, so please use it instead, if possible.

Update #3 (04/10/2008): we now have an HTML version of the core of OOXML, but please use this mirror (HTML), which should be faster.

Update #4 (04/10/2008): the first mirror was downed by the load (thousands of OOXML pages combined with the Slashdot effect can do that), so here is a second mirror. If it's down as well, come back later when there's less hammering on the servers.

Update #5 (04/10/2008): third mirror of the HTML version, just in case.

Update #6 (04/10/2008): here is a mirror of the PDF (1080.pdf).

Update #7 (05/10/2008): here is a much better HTML version of OOXML (1080). We will have another one soon, but it comprises over 11,000 files, so this may put strain on the server.

Update #8 (06/10/2008): now that the load on the server has declined somewhat (tens of gigabytes in days), we decided that it's safe to upload this graphics-rich HTML version of 1080 (comprising over 11,000 pertinent files).

Update #9 (07/10/2008): due to legal intimidation from ISO or its cronies, we have removed OOXML (also from the mirrors).

Comments

Mike Brown

2008-10-03 05:01:05

5,500+ pages. This is from page 2304:

"For legacy reasons, an implementation using the 1900 backward compatibility date base system shall treat 1900 as though it was a leap year. [Note: That is, serial value 59 corresponds to February 28, and serial value 61 corresponds to March 1, the next day, allowing the (non-existent) date February 29 to have the serial value 60. end note] A consequence of this is that for dates between January 1 and February 28, WEEKDAY shall return a value for the day immediately prior to the correct day, so that the (non-existent) date February 29, 1900, has a day-of-the-week that immediately follows that of February 28, and immediately precedes that of March 1, 1900."

Really, you couldn't make this stuff up.
Roy Schestowitz

2008-10-03 07:25:18

Wonderful. Software bugs (Microsoft Office) are part of "the standard", which rather than being fixed are just lumped in with the rest of the pile of bugs.

Maybe OOXML should also explicitly state that 850 * 77.1 = 100,000.

http://www.downloadsquad.com/2007/09/25/excel-2007-cant-do-math-unless-850-77-1-100-000/
AlexH

2008-10-03 08:19:36

Actually, bugs are part of the standard if the standard is already out there.

In the case of spreadsheet data, having an app re-interpret the data as something different is clearly, definitely, and obviously wrong. "Correctness" doesn't matter if "fixing" it actually breaks user data.

Come on, there are better criticisms of OOXML than its legacy support....
Darren

2008-10-03 10:12:32

Hang on, OOXML (ISO/IEC 29500:2008) has NO legacy support as there are NO apps that curently implement it. This being the case, there should not be any bugs left in there like this. Even MS has no roadmap of when they will support the ISO "standard".
DanielHedblom

2008-10-03 10:16:42

@AlexH

Thats why nobody besides Microsoft wanted this "standard" go trough the fast track process. It is/was badly broken, unspecified, impossible to implement and really a pure pile of manure.

The "standard" is just a dump of how one specific implementation of a document format works, bugs and all. Thats so wrong that its not even funny.

That the standard contains bugs and that the only halfway implementation contains piles of bugs is actually the best argument against it there could ever be.
Roy Schestowitz

2008-10-03 10:22:35

AlexH, what's with the Microsoft apologism again? Are you again going to take Microsoft's side with spin?
AlexH

2008-10-03 10:27:47

@Darren: the file format isn't, but the data is. This isn't a file format issue, this is a data issue.

@Roy: it's not "apologism". Free software implements this same bug as well, because it makes spreadsheets actually work. If we broke people's spreadsheets, that would rightly make them angry.
Roy Schestowitz

2008-10-03 10:33:46

Issues need to be mended, not reckoned with. I fail to understand your logic.
AlexH

2008-10-03 10:37:42

@Roy: the point is, you can't just "mend" this issue. If you change the way the software interprets the formula, the data comes out different, and often different is wrong. You can't automatically fix up the data because spreadsheets do not have a concept of type, only of formatting.

OpenDocument 1.2 is going to standardise the exact same bug that you deride OOXML for, and I'm sure Microsoft will somehow catch the blame for that as well. However, it's just not that simple a problem: you can't play fast and loose with people's existing spreadsheets because this is not a file format issue.
Roy Schestowitz

2008-10-03 10:49:17

I thought you were referring to the calculation bugs and the leap year.

Anyway, you used similar logic to justify Microsoft's disobeying of Web standards.

http://boycottnovell.com/2008/09/13/microsoft-admitted-mono-trap/#comment-24236
AlexH

2008-10-03 10:59:52

I am referring to the calculation bugs.

My point is you can't say "it's calculating it wrong therefore all existing spreadsheets must be wrong": many of the people who care will have adjusted for that bug already, and correcting the bug will actually silently wreck existing data.

And, no, my logic on web standards was completely different. Not least because Microsoft were following web standards, and even though I asked you many times what they should be doing, you had no answer. You like to bash them no matter what they do, which is fine, but trying to pretend like you have a good reason is a sham.
AlexH

2008-10-03 11:00:52

.. and anyway, if you don't like it in OOXML, I suggest you get onto office-formula TC at OASIS and ask them to remove it, because they're putting the same thing into OpenDocument, for the exact same very good reasons.
Roy Schestowitz

2008-10-03 11:22:07

"Very good reasons"? Deliberately accepting bugs is good reasons? Or is it Microsoft's unwillingness to get its act together? I's feet-dragging.

Same with the Web by the way. Microsoft had almost a decade to fix its problems, but it didn't until it lost market share.
AlexH

2008-10-03 11:44:10

I've outlined the reason. If you don't think it's a good one, that's your call, but the vendors of office suites disagree with you.

If we were talking about the ugly text runs that OOXML does, that would be one thing. But we're not talking about the file format in any way here - we're talking about user data. That's totally and utterly different, and I fail to see why you can't grasp that.

And since you brought up the web thing again, do you want to outline what action you think Microsoft should have taken? Or are you still pleading the 5th on that?
DanielHedblom

2008-10-03 11:59:18

Would be nice to be able to moderate away astroturfers like AlexH. Paid shrills have no place here.
AlexH

2008-10-03 12:05:43

@DanielHedblom: please don't make personal accusations that are known to be untrue.

The fact that I have a different opinion to other people here doesn't make me a "shill", paid or otherwise.
Roy Schestowitz

2008-10-03 12:09:45

AlexH, I remember many other things you wrote here in the comments about OOXML, including your defense of the actual process.

How can one be so blind? http://boycottnovell.com/ooxml-abuse-index/
Roy Schestowitz

2008-10-03 12:11:26

And since you brought up the web thing again, do you want to outline what action you think Microsoft should have taken? Or are you still pleading the 5th on that?

That's like asking how to handle a criminal that expresses remorse. The reasonable thing to do is to jail it.
AlexH

2008-10-03 12:16:37

Er, no, if you remember, I didn't defend the process: what I said was that nobody should be surprised by the process. You cannot be shocked that corporates have large sway in bodies that are funded by, er, corporates.

It has always been the same with ISO, and it will continue to be the same with ISO, because that is what ISO's members and funders want. People who think ISO is irrelevant simply don't understand what it does; it has always been this ugly.
AlexH

2008-10-03 12:20:09

"And since you brought up the web thing again, do you want to outline what action you think Microsoft should have taken? Or are you still pleading the 5th on that?"

That’s like asking how to handle a criminal that expresses remorse. The reasonable thing to do is to jail it.

No, it's nothing like that. You're accusing Microsoft of working against web standards in this of vendor extensions. I've pointed out numerous times that a. it's in the standard, and b. other standards-compliant browsers do the exact same thing.

I'm not going to defend Microsoft's abysmal support for web standards, but in this instance you're simply wrong.
Roy Schestowitz

2008-10-03 12:25:20

..it has always been this ugly.

Ha! The classic "they are as evil as us" excuse that Microsoft has mastered (against Apple, Google, IBM, etc). You're doing it again.

http://boycottnovell.com/2008/04/05/microsoft-ibm-epa-proxy/
AlexH

2008-10-03 12:40:51

Yet again you accuse me of making an argument I'm not making.

I'm not excusing them or defending them, as I keep saying and I wish you'd actually listen.

I'm pointing out that it's not a surprise that they act that way based on past history, and that anyone who thought they would behave differently is being naive.

To put it simplistically, I wouldn't defend a man who beats his wife, but I wouldn't be surprised that he beat her tonight if he's beaten her every night in the past week. (Not that I am equating ISO in any way with domestic violence, which is an extremely serious subject).

It's really not that difficult to understand the difference between those two positions, particularly for someone as educated as yourself.
Roy Schestowitz

2008-10-03 12:47:00

Microsoft has made attempts to mock ISO's intgrity and also pretended to be foolish. It's nothing new:

Microsoft: We were naivé about standards. No, really!

"Microsoft was also present at IETF meetings around that time, and was enthusiasticaly gaming the system. I remember one Microsoft attorney with three assistants who were each feeding "audience" questions at the attorney's direction.

"Organizations like Sun, which ran a large standards department, were tremendously concerned with Microsoft's attempts to game the system at the time.

"Microsoft is no newcomer to the standards business. Protests otherwise on their behalf are insincere."

http://technocrat.net/d/2008/6/23/44269

To say that the system was always dysfunctional is a self-serving stretch. Mind you, it was Redmond's own press that presented an interview about C++'s standardisation, which required no manipulation.

Nothing like OOXML (and Microsoft) has ever hit ISO, so let's not become revisionists.
AlexH

2008-10-03 12:53:28

Um, Microsoft have been heavily involved in ISO for years, it's not like they just "hit ISO".

I suggest you do some more research on how ISO operate, who funds them, and how they've handled software stuff in the past.
Roy Schestowitz

2008-10-03 13:15:11

Care to educate me? I did do some reading.

Be specific.
Luc Bollen

2008-10-03 13:19:25

@AlexH: There was NO user "data with the 1900 bug" in OOXML format at the time MS released the OOXML spec. The existing "data with the 1900 bug" was only in the binary .xls format. It was therefore perfectly possible for MS to avoid the 1900 leap year bug in the OOXML specification.

A good indication of this is that ODF don't have this bug specified, and OOo is perfectly able to open .xls files and store the data in ODF format. The problem should be handled in the import filter, not in the format specification.

@Roy: You only published Part 1 of the spec (document N1080). Could you also publish the other parts (documents N1081, N1082 and N1083 ?)
Roy Schestowitz

2008-10-03 13:32:32

I'll upload these too in just a moment. The server is under stress that leads to errors, due to bandwidth (several gigabytes).

I'll update the post in a moment.
Andy

2008-10-03 13:52:01

So where those changes ordered by the BRM properly carried out by the editor? If not, time for defect reports.
AlexH

2008-10-03 14:14:56

@Roy: sure, look at their history in the C++ standardisation WG, or any number of the other WGs they have deep involvement in. They're one of the most common vendors.

@Luc: as I said, this isn't a format issue, this is a user data issue. Indeed, ODF 1.1 and previous editions didn't even address this syntax, because it's application-specific. The problem is that you can't just "convert" user data when you convert the file format, because spreadsheet data isn't typed and you can't know which numbers to adjust.

So, ODF "doesn't have this bug" is simply untrue: it left it unspecified, and ODF apps interpret things as they like (= compatible with Excel). ODF 1.2 will standardise this bug as well, so that apps that want to behave "compatibly" can do so.
Roy Schestowitz

2008-10-03 14:24:46

Please, Alex, do not make attempts to rewrite history.

http://reddevnews.com/blogs/weblog.aspx?blog=1203

"Speaking of theater, the IT industry got an eyeful when Microsoft admitted that one of its Swedish employees had offered monetary compensation to Microsoft partners in Sweden if they engaged in the proposal process and voted for the OOXML spec. Sweden invalidated its "yes" vote for OOXML and essentially abstained from the final voting.

"No surprise, broader accusations of ballot stuffing -- by way of getting dozens of companies to suddenly join the ISO voting bodies of individual nations -- abound.

"I asked Bjarne Stroustrup, the creator of the C++ programming language and a guy who has wended his way through the ISO ratification maze a few times himself, if he's ever seen this kind of chicanery in previous ISO votes.

""I have never heard of money changing hands in exchange for votes or anything equivalent," Stroustrup writes back. "I guess every process is vulnerable to political and economic pressures, but I have not personally seen or suspected anything like that in relation to C++.""
Luc Bollen

2008-10-03 14:27:02

Openformula (part of ODF 1.2) doesn't MANDATE the bug, as ECMA376 was doing. From http://wiki.oasis-open.org/office/About_OpenFormula

"Doesn't mandate mistakes. Just because one program gets something wrong doesn't mean that everyone should make the same mistake. The specification is carefully written to not require certain bugs, just because someone has a bug. For example, Excel incorrectly believes that 1900 was a leap year, and at least draft version 1.3 of the Excel specification claims that compatible applications must make the same mistake. Nonsense. Instead, OpenDocument wisely stores dates as dates (not just numbers), and thus does not require that applications have this bug. The Excel specification also requires that applications cannot be more capable than Excel (it doesn't permit support for dates before 1900). Again, nonsense. In fact, at least one OpenDocument spreadsheet application (OpenOffice.org Calc) can correctly calculate dates and date differences going back to 1583! Similarly, many applications handle complex numbers in a very clumsy way; we've devised the specification to make sure that future applications can support better approaches, instead of tying their hands to a technique known to be poor."
AlexH

2008-10-03 14:28:41

Roy, don't accuse me of doing something without quoting where you think I'm doing it.

My statement was that Microsoft have a long-standing and deep involvement in ISO. That statement is correct, your hand-waving notwithstanding.
AlexH

2008-10-03 14:36:03

@Luc: if you have an untyped number being used as a date, which is what current data is, what is an app going to do?

If it doesn't implement that bug, days are off by one. Great.

So, yes, it doesn't mandate, because the default formulas are typed. That's great for new data. It doesn't work for imported data, and that's why they're also standardising that bug in the specification.
Roy Schestowitz

2008-10-03 14:40:10

AlexH, I was referring to your attempt to throw out claims of Microsoft/OOXML scandals by painting others as "equally evil". You do this a lot. So does Microsoft.
Luc Bollen

2008-10-03 14:41:56

Here is what OpenFormula says about this (normative text):

"Implementations of formulas in an OpenDocument file shall use the epoch specified in the table-null-date attribute of the element, and shall support at least the following epoch values: 1899-12-30, 1900-01-01, and 1904-01-01.

Many applications cannot handle Date values before January 1, 1900. Some applications can handle dates for the years 1900 and on, but include a known defect: they incorrectly presume that 1900 was a leap year (1900 was not a leap year). Applications may reproduce the 1900-as-leap-year bug for compatibility purposes, but should not. Portable documents shall not include date calculations that require the incorrect assumption that 1900 was a leap year. Portable documents shall not assume that negative date values are impossible (many implementations use negative dates to represent dates before the epoch). Portable documents should use the epoch date 1899-12-30 to compensate for serial numbers originating from applications that include a 1900-02-29 leap day in their calculations."

I think we are far from "ODF 1.2 will standardise this bug as well".
AlexH

2008-10-03 14:43:43

@Roy: No, I don't do that "a lot", and I would thank you to either give me a citation or withdraw another baseless attack.

I already explained my position to you in very simple terms. I haven't defended the OOXML "scandals", nor have I defended ISO or Microsoft.

So please retract that comment.
AlexH

2008-10-03 14:45:35

@Luc:

Well, you already quoted the relevant text:

"Applications may reproduce the 1900-as-leap-year bug for compatibility purposes, but should not."

That standardises the bug, because it puts that behaviour in the standard.

No-one likes that behaviour, but it is important that it is in the standard, because you cannot convert legacy data correctly without it.
Luc Bollen

2008-10-03 14:54:42

@AlexH: "That standardises the bug, because it puts that behaviour in the standard."

No. The behaviour is not SPECIFIED in the standard. The standard simply acknowledges that applications may implement the bug.

And it is clear that OpenFormula doesn't standardise application behaviour, but only data format.
AlexH

2008-10-03 15:06:43

@Luc: untrue.

Implementations of formulas in an OpenDocument file shall use the epoch specified in the table-null-date attribute of the <table:calculation-setting> element, and shall support at least the following epoch values: 1899-12-30, 1900-01-01, and 1904-01-01.

The first epoch takes into account the leap year bug on PCs (and is the default in OOo 3), at the cost of incorrectly importing data referring to the first few months of 1900, and the last epoch is the Mac bug.
Roy Schestowitz

2008-10-03 15:14:09

AlexH,

My statement stands. Moreover, not necessarily based on just this discussion in isolation, your claims/insinuation that nothing was amiss is defence of Microsoft, OOXML, and ISO.
AlexH

2008-10-03 15:16:47

@Roy: do you actually want to quote me something where I said nothing was amiss?

I think it's sad that you make idle accusations knowing you have no evidence.
Luc Bollen

2008-10-03 15:22:20

@AlexH: "at the cost of incorrectly importing data"

I agree with you: the standardised approach "incorrectly" implement the bug. In fact, it recommends a "best effort" approach.

So I maintain that the bug is not standardised in ODF 1.2, and I'm happy to close here our discussion about the "1900 bug", as you implicitly recognised you were wrong in your first statement.

However, could you explain what you mean by the "Mac bug" ???
Roy Schestowitz

2008-10-03 15:22:29

Look, Alex, I'll be brutally honest. I haven't the desire or patience to pin down particular examples, but I can very well recall you claiming that BSI did nothing wrong in reversing the vote for no reason... after Alex Brown and other 'Softies' intervened, stuffed or whatever else they can do in this secretive process that ended up on the desk of the British courts (lacked funding to be concluded).

Specifically, you claimed that Microsoft just had more friends than IBM, or something along those lines. You always underplay the abuses, which sometimes leads me to suspecting you're one of these FOSS people who were hired by Microsoft (we have them in the IRC channel).
AlexH

2008-10-03 15:32:20

@Luc:

I think you misunderstand. 1.2 very much says that you can implement the bug. The 1899 "best effort" approach means that you can apply that bug to those dates in the small affected range, as the standard says applications may do - that's the same behaviour as Excel. So my first statement was in fact correct.

@Roy:

If you're not willing to defend accusations, then you shouldn't make them in the first place. I don't need to go into the reasons why that is morally wrong. I'm not going to address the rest of your pathetic insinuations, though.

Just to remind you, what I said about the BSI was that they were perfectly entitled to take the decision that they took, and that the legal challenge would go nowhere. And that's what happened: it didn't "lack funding to be concluded", it fell flat at the first hurdle and no-one was willing to spend more money on a goose chase.

The point remains the ISO's members - the nations - can take decisions on any basis they like. We might not like the conclusions that they arrive at, but they're entitled to make those decisions.

That's not a defence of them, it's a statement of fact. Let me put it in terms you might understand: are many people happy that Bush was elected in the US? And, did the electors in the US have the right to elect him?

Saying that they had the right to elect him doesn't mean that whatever happened in Florida was defensible.
Roy Schestowitz

2008-10-03 15:48:07

By that logic, Standard Norway did nothing wrong, either. Thanks for confirming that scandalous processes or decisions can be accepted based on the 'merit' of independent choice, where stuffed rooms, stolen votes and rule-bending is fine. That's the way I read it anyway and perhaps you didn't follow what had happened in BSI.
Luc Bollen

2008-10-03 15:54:53

@ AlexH

ODF 1.2 very much says that you can implement the bug, BUT SHOULD NOT.

If you want to consider this as being a standardisation of the bug, I'm afraid you are as stubborn as Roy, who makes far reaching conclusions from what you've said. ;-)
AlexH

2008-10-03 16:01:14

I don't know how many times I need to repeat this, but I didn't make a judgement on whether or not it was right or wrong.

All I said was that they have the right to make that decision.

Take Norway for an example, then. They dismissed the technical committee, and made a non-technical decision.

It wasn't exactly democracy in action. In that case it seems the org decided that it was more important to bring the standard into ISO than for the standard to be debugged.

It's obviously wrong if you think the decision should be made on technical grounds alone.
AlexH

2008-10-03 16:03:37

@Luc:

Sure, it says should not. But, it's still in the standard, so it's standardised.

Having buggy behaviour standardised is important. You don't want to copy it, but you do want to understand it so that when you do things like import spreadsheets, they continue to work and get the right results.

Most of the ODF apps have implemented all this stuff already anyway, because if you're not Excel compatible then you're not usable.
Roy Schestowitz

2008-10-03 16:04:51

I don’t know how many times I need to repeat this, but I didn’t make a judgement on whether or not it was right or wrong.

That's just a convenient waiver for you, is it not? Like other technique that include casting "ODF" as "IBM" or "it's just as bad/evil as X".

I'm not buying it.
Roy Schestowitz

2008-10-03 16:06:35

Most of the ODF apps have implemented all this stuff already anyway, because if you’re not Excel compatible then you’re not usable.

And again... it sound like Redmond Kool-Aid. You're behaving as though it's better to mimic Microsoft.
Luc Bollen

2008-10-03 16:09:03

@AlexH

It's not standardised, it is documented. Having buggy behaviour DOCUMENTED is important.
AlexH

2008-10-03 16:21:17

@Roy:

That’s just a convenient waiver for you, is it not? Like other technique that include casting “ODF” as “IBM” or “it’s just as bad/evil as X”.

Yet again you make that accusation, yet again it's absolutely indefensible.

I'm not going to bother to explain the argument further, because you're just going to accuse me of that nonsense yet again, and I can't be bothered. Your style of straw man arguments is boring. Argue the points I make, not the ones I don't make.

@Luc: if it goes into a standard, it's standardised unless it's in a section marked informative.

I'm not sure why there is so much back and forth on this; OpenDocument is clear on this issue. This behaviour is allowed and standardised, because it's a real issue which affects spreadsheet users.

As I said way up there ^^, there are much better reasons to be against OOXML than the bits which make dealing with legacy data possible.
enquiring minds want to know

2008-10-04 03:09:20

and what the hell does this have to do with Novell?
jcwarrio0866

2008-10-04 04:16:26

@AlexH:

I’m not sure why there is so much back and forth on this; OpenDocument is clear on this issue. This behaviour is allowed and standardised, because it’s a real issue which affects spreadsheet users.

Actually, the behavior you mention is NOT allowed.

Portable documents SHALL NOT include date calculations that require the incorrect assumption that 1900 was a leap year.
A committee member

2008-10-04 06:26:49

Hi, I've just done a quick comparison (with 'diff') of your files to the official files, and found that your OfficeOpenXML-WordprocessingMLArtBorders.zip file is corrupted. It has the right size and unzips ok, but after unzipping one of the resulting files (balloonsHotAir_bottomRight.png) is empty. That is the only difference after unzip.

I've done a diff of hexdump outputs, which shows that a block of 65536 consecutive bytes has been zeroed.
Roy Schestowitz

2008-10-04 07:41:45

A committee member,

I've just re-uploaded the file. It seems identical to what it was before, at least in terms of size. Since it comes from the source, it can't have been tempered.
Pedro Gimeno

2008-10-04 10:32:59

I can confirm the corruption. unzip -t reveals it. The zipfile with md5 a33bb0c7f11ef63293ee4dfb6dbb986c is corrupt. At offset CF000 starts the 65536-byte zeroed block. I don't have the original, but by examination of the zipfile directory it seems that the incomplete/missing files are:

balloonsHotAir_bottomRight.png balloonsHotAir_left.png balloonsHotAir_right.png balloonsHotAir_top.png balloonsHotAir_topLeft.png
Roy Schestowitz

2008-10-04 11:04:47

The original from the Project Editor is identical. I checked to confirm that there was no error in transmission (the nodes), so the original should suffer from the same error.
Dan O'Brian

2008-10-04 12:15:03

jcwarrio0866: I can tell this is your first time reading a spec. That doesn't mean the format is not allowed, it's a warning to implementers that using that format for new documents that desire to be portable should not use it.
A committee member

2008-10-04 13:26:46

You shouldn't check for file corruption by checking file sizes only. Your OfficeOpenXML-WordprocessingMLArtBorders.zip is the same size as the non-corrupted one which I got from my national stadardization organization where I'm a committee member.
Roy Schestowitz

2008-10-04 13:57:26

"A committee member",

That's a fair point that I agree with. Just to shed light on this, I have no doubt that the files have not been tempered because they were obtained directly from the source (twice even). It is possible that the discrepancy you claim to be aware of occurred somewhere along a different route. I have no explanation for it, I'm afraid.
A committee member

2008-10-04 15:54:02

@Roy: Thanks a lot for clarifying who gave you the copies of the files. I was afraid that the file corruption might have been a watermark and that someone who leaked you the files might be getting in trouble now. However, I don't think that the Decoment Editor is truly the original source for the zip file of graphics. I think he probably received that from Microsoft and he didn't have any instruictions from the BRM to modify that in any way. Maybe someone at the ISO/IEC "ITTF" (which has the task of checking standards for formal correctness) noticed that the file is corrupted and managed to get a corrected file from Microsoft. This might help explain the long delay between completing of the editiing work and the ISO/IEC internal distribution of the document.

@Pedro: Your list of filenames is exactly correct. My earlier assertion about only one file being affected was wrong.
Michael J

2008-10-04 16:48:17

Re the argument between Alex and Luc about standards:

When composing a spec, most writers use the wording from RFC 2119. The word "Shall" is used to indicate a requirement while "Should" indicates a recommendation.

So the ODF standard quoted probably means to recommend against an application implementing the Excel bug, but not to forbid it. (If the ODF spec's authors are using RFC 2119[1], they will certainly say so in the spec).

The quote from the standard *does* say that "Portable documents" "shall not" require the Excel bug, so I would guess that you could say that the ODF spec (as quoted) *permits* applications to maintain the Excel bug, so long as they don't describe the files as "portable".

The OXML[2] spec, however, seems to *require* that apps maintain the Excel bug. That is somewhat different from permitting it.

So I suggest that the ODF committee's actions do not act as any justification for the ISO's in this case.

But what would I know? I'm just a humble[3] programmer.

[1] http://www.ietf.org/rfc/rfc2119.txt [2] They stopped calling it "OOXML" some time ago. [3] http://en.wikipedia.org/wiki/Uriah_Heep_(David_Copperfield)
Jose_X

2008-10-04 19:28:12

AlexH, I haven't used a spreadsheet in a while. After thinking about it for a little bit, I can't figure something out. If dates don't have types, then how can they be rendered as dates automatically? It seems you are saying that any particular opening of a spreadsheet will always have the dates off or the numbers off or any combination of these off. By off I mean rendered as numbers when intended as dates or vice-versa.

I know you can go back and forth between dates and number rendering, but something has to cue in that this is now meant as a date or else there is no reason to have it be rendered as a date upon opening a sheet (conversely, a similar argument could apply for numbers if dates are preferred). I really don't think when people pass spreadsheets around that numbers and dates are randomly flipped arbitrarily.

Perhaps there is a type and Microsoft does not want to reveal how it is stored. Maybe there is a type and OO.o gets it right.

BTW, if dates are typed, then as mentioned above, they can be converted and it would make no sense to keep the broken legacy leap year rules in the format.

People, as for what ODF says, ODF is not perfect. It does seem from what has been quoted here that 1.2 will allow for the backwards mistake.

Beware of Microsoft within OASIS. They gain if they can get bad decisions to be standardized because then OOXML cannot be singled out as broken. Expect that and more from them because they really hurt if OOXML is not adopted and found legit by a significant number of users. If the backwards thing doesn't have a good reason for staying (this would be true IMO if dates are typed), then I would suggest that a bad leap year interpretation not be allowed in the std period.

We can petition to the TC list. Is this issue something that is worth harassing them over?
Roy Schestowitz

2008-10-04 19:48:34

Jose, I received the following message yesterday:

To all Participants:

The 90-day period for this discussion list has now ended. A charter has been submitted and can be seen at http://lists.oasis-open.org/archives/tc-announce/200808/msg00009.html. Your participation has been greatly appreciated; we at OASIS hope that all individuals interested in furthering this work will join the technical committee.

Regards,

Mary

___________________________________________________________

Mary P McRae

Director, Technical Committee Administration

OASIS: Advancing open standards for the information society

email: mary.mcrae@oasis-open.org

web: www.oasis-open.org

phone: 1.603.232.9090

Join us at the OASIS Forum on Security

30 Sept - 3 Oct, near London

http://events.oasis-open.org/home/forum/2008

I'm the only person left in the #OIIC IRC channel (except the channel guard, which is a bot).

What bothers me is that nobody has really responded to this yet:

http://www.heise-online.co.uk/open/Is-Microsoft-trying-to-take-control-of-ODF--/news/111649

We can probably wait patiently to see how ODFers react, but failure to respond would seem fishy.
AlexH

2008-10-04 22:02:54

@Jose: they're not typed. The formatting of numbers (e.g., as dates, currency, etc.) is separate stylistic information.

You can't use stylistic information as a cue because a. not everything used in the calculation may be so styled, and b. the calculation may use relative dates.

@Roy: the OIIC discussion forum was limited to 90 days from the start. It was never, ever going to be an ongoing forum. I'm happy to answer your questions on why that is if you have any.
jcwarrior0866

2008-10-04 23:38:32

@Dan O'Brian:

Hello Dan. I think you've rushed to the conclusion that this is the first time I read a spec. I do not think it's important to clarify this in particular because this conversation is not about me.

Let me quote what you mentioned earlier:
"That doesn’t mean the format is not allowed, it’s a warning to implementers that using that format for new documents that desire to be portable should not use it."

Well Dan, I disagree. In no way the SHALL and SHALL NOT verbal forms are recommendations or warnings. They indicate *requirement* instead. Take a look at the OpenFormula spec:

Within this specification, the key words "shall" and "shall not" (for requirements), "should" and "should not" (for recommendations), "may" and “need not” (for permissions), and “can” and “cannot” (for statements of possibility or capability) are to be interpreted as described in Annex H of [ISO/IEC Directives] (part 2).

I can also bring here what Annex H of [ISO/IEC Directives] (part 2) mention about this verbal forms:

Verbal form: shall Equivalent expressions for use in exceptional cases (see 6.6.1.3): is to, is required to, it is required that, has to, only ... is permitted, it is necessary.

Verbal form: shall not Equivalent expressions for use in exceptional cases (see 6.6.1.3): is not allowed [permitted] [acceptable] [permissible], is required to be not, is required that ... be not, is not to be.

Annex H of [ISO/IEC Directives] (part 2) also mention the meaning of SHOULD and SHOULD NOT, but I am not going to put them in my comment.

Best regards.
Johan Krüger-Haglert

2008-10-04 23:49:14

Just put it on TPB if it's not already there. Problem solved.
John Hardin

2008-10-04 23:59:14

This sort of thing is what the CORAL distributed cache was created for.

http://boycottnovell.com.nyud.net:8080/forms/ooxml/1080.pdf http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-WordprocessingMLArtBorders.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-SpreadsheetMLStyles.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-DrawingMLGeometries.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-RELAXNG-Strict.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-XMLSchema-Strict.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/1081c/1081c.htm http://boycottnovell.com.nyud.net:8080/forms/ooxml/1082c/1082c.htm http://boycottnovell.com.nyud.net:8080/forms/ooxml/1083c/1083c.htm http://boycottnovell.com.nyud.net:8080/forms/ooxml/1080-html/

I don't understand why people don't post CORAL links when they *know* they're going to get slashdotted out of existence...
Dan O'Brian

2008-10-05 00:01:25

You might want to re-read the entire snippet you posted before, because while I agree with the definition of "shall not" that you posted, it doesn't change the fact that I am correct.

Portable documents SHALL NOT include date calculations that require the incorrect assumption that 1900 was a leap year.

Note that it says "Portable". That says nothing of preexisting/imported documents.
Roy Schestowitz

2008-10-05 00:06:20

How hard you try to defend Microsoft bugs, Dan O'Brian. I already know you from many prior comments in this Web site, but it's good that you show others in this thread who you are.
PaulS

2008-10-05 00:43:43

AlexH said:

"It has always been the same with ISO, and it will continue to be the same with ISO, because that is what ISO’s members and funders want. People who think ISO is irrelevant simply don’t understand what it does; it has always been this ugly. "

As someone who has been involved in several standards committees (include some involvement with ISO), I can say that, while members do work to support the interests of the companies they represent, the level of shinanigans in SC-34 is orders of magnitude beyond anything I've ever seen or heard of.
Dan O'Brian

2008-10-05 00:45:25

Roy: Now you're trying to play the same game with me as you are with AlexH. I have not defended Microsoft here, I am just telling it like it is.
Roy Schestowitz

2008-10-05 00:47:07

I think not. You spin to shelter bias.
standardize this

2008-10-05 00:57:09

Some may claim a metric standard that mentions imperial units SHOULD NOT be mixed with metric somehow standardizes imperial units as a part of that metric system. Those engaging in bigotry of this nature MUST be playing a dishonest and disingenious semantic game.
Marius

2008-10-05 00:57:40

Hi,

I've exported the PDF document in a series of PNG images and created an index for them, so users can access it just like the HTML version you have.

There are several advantages to this:

1. the page layout is preserved and the information is easier to follow 2. readers only download 3-15KB (one image) at a time, not the full 40-60MB 3. you don't waste so much bandwidth with that very large document

The only downside is that readers can't use copy and paste to extract information but they might as well download the full PDF file then.

If you wish, the blog readers can use the following link to view the document:

http://www.definethis.org/temp/ooxml/

or you can download a copy (http://www.definethis.org/temp/ooxml/1080.rar - ~160MB) and extract it on your server.

I'll leave it on my server for a few weeks but I won't be able to keep it there forever.
Roy Schestowitz

2008-10-05 01:02:16

Thanks a lot for this. Someone produced a similar thing several hours ago and I haven't gotten around to uploading it. I'll update the post.
Jose_X

2008-10-05 01:50:39

>> You can't use stylistic information as a cue because a. not everything used in the calculation may be so styled, and b. the calculation may use relative dates.

Thanks AlexH. I'm not trying to make your life difficult, but I still don't quite follow what you meant by (a) or (b). Could you provide a rough example? It need not be legal syntax but enough to convey the idea.

Here is the wall before me. Something has to cue in the renderer that we have a number that is a date and not something else. Why isn't this good enough as the type information?

If the formatting cue and overall context wasn't good enough, how would the renderer know to format that specific number specifically as a date without messing up anything else? So we have this specific number precisely being identified as a date. If that information isn't a type definition, what is?

It's not clear to me that I am covering everything, being precise, or even making sense. If I had more experience here, I would be better able to judge. Still, I don't see it. I may have to dig into the specs to get to the bottom of this (or read something online that is clear and save myself the effort).
Dan O'Brian

2008-10-05 02:58:14

Jose_X: It might be best to ask the developers of OpenOffice or Gnumeric, for example.

I'm not sure if AlexH can provide an example or not, but I would guess that the OOo and Gnumeric developers could.

If I wanted to know the information you are after, those are the people I'd be asking.
Jose_X

2008-10-05 03:16:33

Dan O'Brian, If AlexH want's to clarify s/he can. [It's a he right?]

Short of sitting down with ODF or OOXML (no) and putting all the pieces on the table to look at them carefully, I would probably get the fastest insight by directly asking those guys you mentioned.

Anyway, AlexH had mentioned that typing wasn't involved. That would explain why you'd want to keep legacy, but I don't then understand how the proper thing could be rendered from a common old (untyped) number. If typing info is available, then it would make no sense to keep the error in a standard format. That would make the format (even the quasi exception being suggested for ODF) problematic and distasteful without reason.

I have not looked at this too carefully, or I would say so. That's why I think a few examples with specifics might quickly clarify things for me. Also, I got interested in the conversation but otherwise am not that motivated right now to follow up on this.
AlexH

2008-10-05 10:03:39

@Jose:

I'll try to explain as best I can. One thing you might want to do is look at the OpenFormula spec, which for the first time does actually include typed information.

You're right in that the formatting cue will enable you to see information which is being treated as a date. The problem is that not all that information will be formatted like that.

But there is no such thing as a 'date' in legacy spreadsheets: all you have is numbers which are being treated as an offset from an epoch. Some of those offsets will be "dates", and some will just be offsets: e.g., what is the number 5? Does it refer to 5th January 1900 (or 4th)? Or are we using that to say "5 days from now"?

Even worse, many spreadsheet users will calculate things based on references to other spreadsheets - e.g., having a master sales sheet, and then various report sheets. In that instance, you can't even see the other data unless you're in the "top" spreadsheet. If you rely on the stored values in the sheet you opened, you have again no idea what those values actually represented on the other sheet.
Yfrwlf

2008-10-05 16:20:32

Through all the ranting I don't know if I have an accurate picture of the problem that is being argued about, but from what I can gather:

A program like OOo can interpret an ODF document in one of two ways, it can either read the document via "the buggy way" or "the non-buggy way", but it can only do one. If the ODF format allows for either way to be used, then the readers like OOo and others could read the document correctly, or incorrectly, and it is technically impossible for these programs to always read the document correctly, all because the document standard hasn't specified which method it prefers?

If that's correct, then of course ODF is a broken format in that regard, however that depends on how broken it is in "the wild", and you'd think that there would be something you could do to correct it, some way of fixing any older documents simply by having a converter which upconverts them to a newer standard format which does away with the bug entirely without breaking anything for anyone. Formats should tie up any loose ends, whatever it takes in order to allow readers to always read the format correctly. I thought this was the problem with OOXML, as it included certain things which would allow an OOXML document to be interpreted in two different ways, and in order to do it the correct way it required the use of proprietary software that wasn't available for all platforms/users/etc and was basically controlled. Obviously a controlled standard like that isn't a true open standard, and obviously an "internally used" or "controlled" standard isn't a standard.

Any way, I hope all formats can be made better, but obviously the ISO should never accept proprietary or borked formats as being standards. It's obvious to anyone who knows Microsoft well that this move was simply to E.E.E. the office document format to prevent competition, when the horrible (bad for them) truth is they are going to have to start competing (good for consumers) without pulling backstabbing unlawful business tactics.
James

2008-10-05 16:27:04

@AlexH: Why you haven't a website or blog? I just want to see a bit more about you and your knowledge. PLEASE, show me who you are.

Maybe you're the AlexH from www.contoso.com? http://en.wikipedia.org/wiki/Contoso

http://center.spoke.com/info/pDJMWq/AlexHankin

Alex Hankin Contoso, Ltd. Senior Director New York, NY

Skype: AlexH Home IM: Alex@hotmail.com Home Email: Alex@hotmail.com Work Email: alexhankin@contoso.com Work IM: alexhankin@contoso.com

Telex: 781 234 Home (208) 555-5656 Mobile: (775) 551-2345 Fax: (207) 555-9999 Direct: (207) 555-1112 Tel: (207) 555-1000

Here:

http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Apt-BR%3Aofficial&hs=VHr&q=microsoft+Alex+Hankin+alexh&btnG=Search
AlexH

2008-10-05 16:27:33

@Yfrwlf: this isn't an ODF problem, this is a 'spreadsheet data' problem. The file format doesn't matter, because the data we're talking about is entered by users.

Indeed, in ODF 1.1 the formula stuff isn't specified at all: it was deemed out of scope for the standard.

The issue is very much "data in the wild" though. If you open a file in an older format, or cut and paste from one, or link to one, or otherwise get the data from elsewhere, then it's a problem.
Yfrwlf

2008-10-05 17:00:17

You're saying the storage of the data isn't a problem, that the data is safe and can be read and written correctly to the ODF, but that it's a problem with the readers like OOo? I don't see how it's not either one or the other, either the format has a problem, or the reader does.

Regardless, all I know is that a properly done document format will be able to account for all data correctly, so that if a program implements the format how it's supposed to be implemented, all data will be correct. If it's impossible or difficult for the program to read or write certain kinds of data correctly due to a lack of specification by the format, it's the format's job to implement additional standards to allow for correct interpretation, or aid in that process to allow for greater format uptake in the various office programs which exist today.
Fleep

2008-10-05 18:11:15

AlexH has a point, and his point doesn't change the basis of the OOXML criticism. Why are you so upset about him not agreeing with your views?
Jose_X

2008-10-05 19:20:32

You have a number out there. Call it a spreadsheet value if you want. Call it a number in some text file. Call it what you want.

You have some code out there. The code uses a broken algorithm for turning that number into a date.

Is this a reason to break ODF or any new format?

No, it is not.

Just keep the legacy documents as is (eg, keep as is the text file with the "5" on line 27 offset 12).

If you change formats for that file (eg, to ODF 1.2), the old code is not likely to work anyway. If you change formats for that file, you'll need new code anyway. Why make a broken format to then have to create new code that is also broken?

AlexH, if you try to be specific maybe you will be able to convince people here because it just doesn't make sense that the old mistakes "need" to be carried forward. If so, we'd still be using cavemen data formats and no new code would ever be written (eg, no converters or even new code to replace the old code).

The main reason I can see to keep things as is is as yet another way to help out Microsoft's vast investments in this brokenness. If things change, new players will be on a similar footing (wrt to date interpretation) as Microsoft.

It makes sense to fix past mistakes. In a competitive marketplace, the old garbage instituted by a particular vendor would not carry forward.
Jose_X

2008-10-05 19:21:50

Fleep, I don't see AlexH's "point". Could you give your interpretation so that maybe it will make sense to me and to some others?
Jose_X

2008-10-05 19:28:12

>> The main reason I can see to keep things as is is as yet another way to help out Microsoft's vast investments in this brokenness. If things change, new players will be on a similar footing (wrt to date interpretation) as Microsoft.

Another reason to keep the brokenness would be to allow (eg) Novell to maintain their special advantages if Novell also has a bunch of investments in re-implementations of this brokenness or know that this brokenness will somehow give them an advantage (eg, if Microsoft stays on top, Novell's existing income stream might be more likely to stay in tact).
Roy Schestowitz

2008-10-05 19:38:05

Novell receives access to Microsoft source code, so brokenness is not much of an issue to them. They can just copy (mimic) rather than reverse-engineer quirks, bugs, and changes.

All the 'weird' stuff in OOXML serves Microsoft. The more bizarre the format, the less manageable it is for competitors.

This conversation got latched onto one particular flaws among much more serious ones, which is a shame. Shouldn't we discuss what Microsoft put in a separate 'baskets' and all those Windows-only 'features' and 'loopholes' of OOXML?
AlexH

2008-10-05 20:10:45

@Jose:

When you're converting a file format, you have to re-use the existing data, yes?

What I'm trying to get across to you is that there is no way to tell whether a given number in the old data needs to be changed, because there isn't enough information to be able to do that. The "fix" is basically to decrement a number by one; but you have no idea which numbers need to be changed.

Adjusting user data on import is an extremely dodgy practice in general; you have to be absolutely 100% sure you're getting it right.
AlexH

2008-10-05 20:15:45

@Roy: even if that were true, which it's not, they publish it as free software so anyone else can look at / copy the functionality.
Roy Schestowitz

2008-10-05 20:31:43

It's true, and there is no reason whatsoever for people to replicate platform-specific behaviour that's otherwise irrelevant to document data storage.
Jose_X

2008-10-05 20:36:56

>> This conversation got latched onto one particular flaws among much more serious ones, which is a shame.

Let's mention some more things of interest that are demonstrated well through this simple date example.

I think this example helps demonstrate that there are many types of data that are interrelated. Eg, the date numerical representation ..is related to.. the type attributes identifying that number as a date convertable using algorithm X ..is related to ....

Microsoft's extensive closed source (still ongoing) history and investments means that the pertinent data for proper interpretation of any other data is spread across the entire of their product line.

A format brought up by people working in the open is likely to be much better than something that got cooked up based on this closed stew. When diverse groups openly try to agree on stds, they are led to formats that work well among diverse groups. One such item is that related data should be accounted for somewhere centrally.

No doubt Microsoft keeps tabs on their data centrally, but they don't reveal this within the OOXML format they make public. OOXML is a piece to a complicated puzzle. This piece is missing key info for interworking with the rest of Microsoft's software. The crucial bits of data are scattered all over the place and they are only opening some portions. Of course, they can open up whatever they want and then create new bits that they keep close.

Don't expect change from them as long as they have closed source and interlocking monopolies -- lack of checks and balances: no real penalty for changing; HUGE existing investments: in a Gordian Knot body of source code, in the Microsoft Way Mindsets, in existing contracts made valuable by their unique position; HUGE business reasons for preserving the existing frameworks and methods: so that powerful business levers don't disappear, so that they can be (very) cash positive and subsidize businesses they need/want to control but in which they currently aren't competitive.... The lists go on and on.

Microsoft can't afford to be broken up in a way where important bits of the code end up in different companies. That would not only initially lead to chaos, but long term they lose their advantages if they can't keep closed source the secret info about many product interactions (the source code itself implies some of this secret info) interspersed across these product lines. If you have different companies, who would hold the central knowledge and who would ensure this would stay in sync with the evolving products of the now distinct companies?

Because of this, the likely result leading up to the breakup would be a reshuffling internally so that one company would get the real goods. This would allow that one company to eventually take over where Microsoft currently sits. UNLESS you prohibited these new companies from building products to service both sides of the interfaces. The problem here is that what constitutes an interface?

I think that the idea of having an evolving closed source OS API makes no sense from a fair competition point of view. In fact, closed source and competition are incompatible items. Closed source implies monopolies. The OS is simply the most important software component on a device. And software, traditionally, is the much more powerful way to implement rapid changes that do lead to losing interop assuming interop existing the second prior to the new change.

The only advice I can currently offer generally to users is to avoid closed source.

And developers that want to produce competitive code should also stick to open source environments and libraries (the assumption is that money would be made other than through the powerful lock-in exclusivity of closed source).
Dan O'Brian

2008-10-05 20:43:01

FWIW, the buggy date interpretation code is not a bug in Microsoft's Excel code, it was deliberately implemented to work around a broken Lotus 1-2-3 bug because Microsoft's Excel needed to be able to import Lotus 1-2-3 spreadsheets w/o breaking formulas in pre-existing spreadsheets.

ODF needs the same workaround for the same reason.
Jose_X

2008-10-05 20:55:18

AlexH, I think you missed what I was talking about. You keep the old data in the old formats if you want. The new data can be saved to the new formats. The new formats require new code for translating no matter what. It's a new format! This means that the new code *does* know to use the correct algorithms while the old programs continue to work with the same old assumptions. If you want to interchange the old with the new data, new converters know the old and the new rules.

Ie, the info needed to know which alg to use is available. Data in old formats use the broken algs and data in new formats use the good alg. And you can use software converters to convert from one format to the other (statically once and for all or dynamically as the various formats are encountered).

Again, you are not giving examples where this could not be done or would be foolish to try it. Your vague argument generalizes to "we should keep the formats we had back in 1940 so that we don't have problems moving forward."

Yeah, maybe we should have kept the year 2000 bug as well.

Yfrwlf (as I read the replies) was assuming pretty much this and then adding that the point is to make *specific* which alg to use in the new formats.

Using the old rules (as OOXML does) is foolish. Leaving it up in the air (ODF 1.2 might do this in part) will just lead to excusable incompatibilities.

Microsoft needs formats that are underspecified (plus broken in as many ways as possible) in order to allow monopoly backed lock-in secrets to exist in an excusable manner.

"Hey, the std was not specified precisely so we picked...."

The excuses are probably primarily meant to keep them safe in court actions from the government.
AlexH

2008-10-05 21:01:12

@Jose: I didn't miss it, I keep trying to explain to you the situation.

It's not that the application can't use the right algorithm. It's that when you open the old data, you cannot adjust it so that it is "correct".

So the "just use software converters" argument simply doesn't hold: you cannot do it. The spreadsheet doesn't hold enough information to know which data needs to be corrected and which data doesn't.

You can save the data in the new format, with the new algorithms, but it doesn't help. Unless you can correct the data, it's wrong. And you can't correct the data because you don't know which data is dates, which is date offsets, and which is just numbers.
AlexH

2008-10-05 21:04:33

@Jose:

Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it's set to 1931-2030 or something, so you know that '98' == '1998', '31' == '1931', etc.

It's exactly like that. Unless you know what the "range" is to begin with, you cannot hope to convert the data accurately, because you're missing enough information.
Roy Schestowitz

2008-10-05 21:33:03

Since you can determine what the date should be when handling old files you can save it properly too. Preserving bugs for compatibility with legacy (and proprietary) formats is the road to another sort of mess -- a bigger one. It makes up a hack that's only a baggage for debugging purposes and maintenance.
Jose_X

2008-10-05 22:00:44

[AlexH, I'll address your very last comment with the '31' example after this reply. I just noticed your example prior to posting, but I don't think it changed the essense of this post. In any case, I'll get to your last comment in a sec.]

AlexH, we can't automate something across the board 100.00% when that information is not known in one place 100.00% of the time. The place to address the date issue is where it is known that something is a date. Where this info is not captured in the same file as the data, the conversion can be done with the apps or library calls that interpret the particular numbers as dates or by users doing the work manually when they identify something should be a date but it is not (with the help of existing handy filters ready to come to the rescue).

In any of these cases, changing to a *new* format with *new* semantics means that something of this nature *has to be done* anyway.

[In the case of Excel date-formatted items, the process can be automated because that date formatting info is included in the same file as the date integer. However, there are spreadsheets generated in ways were that data need not be maintained. It's for these cases that we are talking about.]

If you don't want to deal with any manual process in a particular case and be willing to keep all the bugs of the past then you use a legacy app or app mode that works as legacy. But then you can't convert to the new format that has new semantics from an app that doesn't have access to the type info. You can't convert to OOXML, ODF, or anything else that might have new semantics.. unless you want to do some hand tweaking. To convert to a new format with new semantics that were previously kept in an ad hoc way, you have to add a bit of manualness to the process.

AlexH, can you give any example at all where this would not be a manageable situation? Remember that you can keep the old data in the old formats read by the old applications.

The ODF should use the proper date semantics. If something in some old file somewhere is not known to be a date based on info inherent in that file, then you can't save it in ODF anyway where that unknown date maps to a date. You would just get a number. This has nothing to do with the new format. It has to do with the app making the conversion not having access to the missing info. This would not be the case for Excel files where the dates were formatted to look like dates, but it might be the case where a text file is interpreted oddly by some app X. In which case, the conversion should be handled by an upgrade to that app X.

... [thinking I better repeat some of this again]...

In other words (darn this is tiring), you can't take a number that is not known to be a date and make it a date automatically without error. This has nothing to do with the new format. It has everything to do with requiring access to the missing semantic info. If you have access, you can do it. If you don't have access, you can't do it no matter what OOXML or ODF says. It would just be a number.

And once again: in the case of Excel files with numbers formatted as dates, that info IS AVAILABLE* so there is no problem for this common case. [*: it's available subject to the proper reverse engineering of the old MS binary formats.. of course the EU could force Microsoft to reveal this info so if they don't they would be in violation.. in any case, there is no excuse.]

AlexH, you have not shown a single example, and you are mixing issues. Some might even say you are using FUD to give the impression that the task is unmanageable. If it's unmanageable, you should be able to give many examples instead of 0 examples. Please give examples, AlexH. [Ed- note comment at top.]

Dan O'Brian, that Microsoft kept the Lotus bugs doesn't justify that as a good decision. Let's give reasons other than to say that X person did Y so therefor Y must be good.

[I am not trying to be verbally abusive, but I don't like to see anyone defending Microsoft or their ways without reasons that pass muster. If you want to defend Microsoft, come with real reasons or expect to anger a lot of people.]
Jose_X

2008-10-05 22:17:29

>> Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it’s set to 1931-2030 or something, so you know that ‘98Ã¢â¬Â² == ‘1998Ã¢â¬Â², ‘31Ã¢â¬Â² == ‘1931Ã¢â¬Â², etc.

AlexH, this is the sort of thing I mentioned in the last comment. If you know the info (ie, that X is to be interpreted as Y), then you know the info. If you don't, then you don't.

If you do, then you can perform the conversion. If you don't, then you can't.

In any case, if you have a new format with a previously non-existent semantics/type named "date", then you can't have things magically appear as dates, no matter the specific semantics/algorithm, unless the converter has info of what were dates in the old formats. If you do, then you can make the conversion. If you don't for whatever reason, then the mapping will be to a common old integer just as before and not to the new date type.

In the case you gave, the conversion can be done if that semantic bit can be deduced from the data in the file. Otherwise, the conversion should be done by whatever entity knows that we are dealing with a date. Otherwise, it can't be done no matter how OOXML or ODF define the new date data type.

To repeat from earlier comments, if you have an Excel number formatted as a date, then that info can be deduced from the binary files and that formatting knowledge could be designated to map into the new "date" tags of the new format. Here, you knew that the old number is a date that would need to be adjusted to match the semantics of the new tags.

The exact semantics aren't the stumbling block so long as they are well-defined (as Yfrwlf mentioned). What is important for designing the new semantics is that the semantics be well-defined and as "sane" as possible. What is important to acquire the ability for old data to be used with the new tags is that the semantics for the old data be fully known. These two issues are distinct. Microsoft themselves cannot make the right number into a date for OOXML unless they already know they have a date. If they don't, they must keep it as an integer. If they do, then they can convert to the proper definition no matter what the definition is: using the correct alg or using the broken alg.
Jose_X

2008-10-05 22:27:07

AlexH, consider finishing off the example you gave of "31" with more details to give examples of a problem that would be solved if we make ODF have the old semantics but would not be solved if ODF were instead to use the fixed algorithm.
Dan O'Brian

2008-10-05 22:44:58

From my understanding, both OpenOffice and Microsoft Office interpret the date at display time depending on the user's configuration settings.

I don't have a copy of Microsoft Office handy, so I can't check their configuration UI, but in OpenOffice you can find the config setting under Tools / Options / OpenOffice.org Calc / Calculate / Date

Depending on which of the 3 radio buttons you select (12/30/1899 [default], 01/01/1900 (StarCalc 1.0), or 01/01/1904) the spreadsheet interprets the data in a different way.

Now, if someone using Calc (using the default setting) imports a spreadsheet with a date that was created in, say, StarCalc 1.0, and then saved to ODF, where the saved ODF document forced the interpretation (as Jose is suggesting can be done) to be the 12/30/1899 epoch, then the data in the spreadsheet could very well be wrong, but the user might not notice it right away.

I think that's the problem that AlexH is trying to explain.
Jose_X

2008-10-05 22:48:49

Alex, if before, all the information we knew about a particular integer was that it was an integer type, then we map it to the integer type in ODF.

However, if the converter can deduce more type info, then we might be able to map to a date tag or to some other tag.

And in these last cases, where we know enough to identify a (eg) date, if we can map to a tag with the broken date semantics, then we can map to the fixed semantics since this entails adjusting the values in a well-defined way. Ie, if we know to map to "date with broken algorithm" then we can map to "date with fixed algorithm" since there is a well-defined mapping to this fixed algorithm.

However, in other cases, the mappings may not be so nice. In general, we need to create good formats and fix mistakes of the past. If, as customers, we put our data into proprietary closed formats such as what Microsoft offers their customers, then we make a decision that may not be fixable short of knocking down Gates' door demanding relief.. or knocking down his Window if you want longer lasting relief.
Jose_X

2008-10-05 22:57:45

Dan O'Brian,

No. No. No.

When we save, we know how to adjust the number so that it maps properly to the canonical form implied by the corrected definition. The saving process knows the users config info and so can adjust into a canonical form. Then everyone else that reads this does the necessary translations to match their settings.

If we can save into X-1 then we can similarly save into X by adding +1 at the time of save. The semantics of the ODF file date tag would then let everyone know that we have X and not X-1.

ODF is tagged. The tags carry semantic information just like binary Excel files do (but in a closed proprietary way).
Jose_X

2008-10-05 23:01:21

The bottom line is that if we *don't know* something is a date, but only know that it is an integer, we map to integer, whether we are mapping to OOXML or ODF or to anything else that has an integer type. If we *do know* something is a date (with the implied broken semantics), then we can map to a date in ODF such that the necessary adjustment is made.
Jose_X

2008-10-05 23:06:16

So in the case with meaning 1 or meaning 2 or meaning 3 of a date or whatever other context is necessary for proper interpretation:

If we know this, we map to ODF correct date tag, adjusting as necessary.

If we don't know this extra date context, then we play it safe and keep the integer as an integer.
Dan O'Brian

2008-10-05 23:06:40

Jose_X: the problem is, as AlexH has already pointed out, that spreadsheets have historically saved untyped data.

Dates can be saved as "1/31" (interpreted as January 31 of the current year), "1/50" (January 1st, 1950), or "39725" (the number of days since the configured epoch) and possibly other formats.

The question is, which epoch is 39725 counting from? And how do we know it's a date without more context?
Dan O'Brian

2008-10-05 23:09:48

Jose_X: I think that if all spreadsheets had agreed upon a canonical epoch in the very beginning, we would not have this problem. Unfortunately, that did not happen.
Jose_X

2008-10-05 23:18:54

Dan, if it's an integer and that is all we know, we keep it as an integer. If we can deduce that it is a date with a broken formula and that is its sole role, we map to ODF date tag but fix the integer value as necessary.

I suggest that, specifically for Excel, cells containing a simple integer and formatted as a date be mapped to ODF dates but with the correct value to match the ODF epoch.

In any case, the ODF date tag is there for the future. Existing data can be mapped to ODF integers (or strings or whatever) as they are, while new items entered under a date context can be mapped to the ODF correct formula date.
Dan O'Brian

2008-10-05 23:19:49

When we save, we know how to adjust the number so that it maps properly to the canonical form implied by the corrected definition. The saving process knows the users config info and so can adjust into a canonical form.

No, it doesn't - that's the problem. All it knows is that the field looks like a number. It doesn't necessarily know if it is a date or not.
Jose_X

2008-10-05 23:20:16

Dan, not all people agreed to use the same language way back when, yet we live in a world where people using different languages can interact together because of the wonders of translators.
Roy Schestowitz

2008-10-05 23:23:16

Given the ability to interpret the date -- whether as Excel, Lotus, whatever -- you can save it properly for the future. You needn't carry on bugs from the past.

As Sutor said, "OOXML is about the past and ODF is the future."
Dan O'Brian

2008-10-05 23:23:25

Jose_X: There are far more knowledgeable people working on OOo, Gnumeric, Excel, etc spreadsheet apps than we are (combined, even, I'm sure), so I'll leave it up to them to solve (if it can be solved) or not. If they haven't solved it by now, I'd imagine it's not as simple as you imagine it to be.
Roy Schestowitz

2008-10-05 23:26:18

It's an old discussion and a solved discussion.. http://www.robweir.com/blog/2006/10/leap-back.html
Dan O'Brian

2008-10-05 23:26:19

Roy: Like I said, I'll leave it up to more knowledgeable folks to figure out. If they find they need support for different epochs, then they need support for multiple epochs. If they decide they don't, I trust that my data will continue to work (unless proven otherwise) and that's all I care about in the end.
Jose_X

2008-10-05 23:29:35

>> No, it doesn't - that's the problem. All it knows is that the field looks like a number. It doesn’t necessarily know if it is a date or not.

So then we map to an ODF number and not to an ODF date. Simple. The same would go if we had wanted to use OOXML or any other format that has a date tag. We would not map to its date tag but instead would map to the regular number tag.

However, in any particular case, the application may know that we are dealing with a date. In which case, it would be able to save to the ODF date with the proper adjustment along the way.

Say we have an Excel spreadsheet that has a date formatting for a number. Then Excel/OO.o presumably needs to use the broken formula on that number to format it properly. Fine, but what we then do is we save that number as a date but adjusted as necessary when we save to ODF. Then when we read that ODF file later, we use the proper formula on the already adjusted number. If we want to convert back to binary Excel format, we make it a number type again and adjust its value backwards. In either format, we can deal with that *known* date properly. That number was marked for life as a date through its date formatting from the original creation as data within an Excel date formatted cell.
Jose_X

2008-10-05 23:38:56

>> If they haven't solved it by now, I'd imagine it's not as simple as you imagine it to be.

In other words, you aren't willing to provide an opinion on why ODF should be one way or the other.

Do keep in mind that there are many decisions that are taken by people not based on technical feasibility.

Dan, if you have a link to where you think competent people are having this discussion, please post it. I think I might want to get in on the act or at least hear the reasons given.

I provided feedback to the OIIC formation discussion list, but they weren't interested in covering specifics. I started on that road and was told by Mary McRae (is that how you spell it) that engaging in specifics of that nature was prohibited on that list. The specifics will be carried out in private (though joining up is allowed if you pay the $300).

I would have no problem giving a particular pov if it would help a public discussion and if I didn't have to dedicate too many resources to the task beyond the time required to do the contributed postings (the though process, etc).
Jose_X

2008-10-05 23:52:24

>> It’s an old discussion and a solved discussion.. http://www.robweir.com/blog/2006/10/leap-back.html

For the record, I'll quote here from that piece from Rob's blog:

>> The “legacy reasons” argument is entirely bogus. Microsoft could have easily have defined the XML format to require correct dates and managed the compatibility issues when loading/saving files in Excel. A file format is not required to be identical to an application's internal representation.

>> Here is how I would have done it. Define the OOXML specification to encode dates using serial numbers that respect the Gregorian leap year calculations used by 100% of the nations on the planet. Then, if Microsoft desires to maintain this bug in their product, then have Excel add 1 to every date serial number of 60 or greater when loading, and subtract 1 from every such date when saving an OOXML file. This is not rocket science. In any case, don't mandate the bug for every other processor of OOXML. And certainly don't require that every person who wants the correct day of the week in 1900 to perform an extra calculation.

Microsoft's reason for keeping things broken exist, but that doesn't mean ODF should follow their lead. Let Microsoft keep OOXML the laughing stock that it is within tech circles. Let us keep ODF sound. Or I should specify, if Microsoft messes up ISO ODF, OASIS should not follow suit.

People, open source is the key. ODF and other open standards are secondary. Standards are meant to enhance interop, but when that cannot be achieved, these standards lose their value. And interop among independent third parties within the context of a closed source monopoly dominated market is nonsensical.
Jose_X

2008-10-05 23:55:43

I should note that Rob is co-lead of the ODF TC within OASIS and started up an interop effort to complement the main TC. I don't have to agree 100% with Rob, though I have found I share many of his views, including what I quoted above. And I really don't think I am alone in agreeing with Rob.
Roy Schestowitz

2008-10-06 00:03:26

Alex and Dan are only here to carry a "this site is wrong" banner, so I don't expect them to agree with Rob. I would expect them to endlessly try to give the impression that the messenger can't be trusted because they simply don't like the messages. It's a dangerous stubbornness.
Dan O'Brian

2008-10-06 00:59:14

Rob Weir probably counts as someone more knowledgeable than me, so if he says that it's not needed, I'll accept that it's not needed.

I was only explaining what I thought AlexH was trying to explain (I admit to knowing very little about the internal workings of spreadsheet applications).

My position on this subject has always been that I'd leave it up to the experts.
Jose_X

2008-10-06 01:08:40

Dan, experts are bought and sold all the time.

You should pay attention to arguments if you want to avoid being manipulated by the unscrupulous.
Jose_X

2008-10-06 01:10:36

Dan, if you don't want to think something through (it can be difficult at times because it would take a lot of preparation to get up to speed), hire or find someone that you trust would understand and be honest with you about it.
Dan O'Brian

2008-10-06 01:15:38

Jose_X: that's pretty funny coming from you... you refuse to contact the people implementing the specs and Roy refuses to contact anyone ever involved with the processes (e.g. he refuses to contact GNOME developers before accusing GNOME of depending on Mono).

When I say experts, I mean the experts implementing the Free Software office applications that are very unlikely to have been "bought" and/or other experts that I trust (which in this case is limited to the aforementioned group because I don't happen know any proprietary office developers).
Dan O'Brian

2008-10-06 01:20:17

In case it wasn't obvious to you, I use OpenOffice.org, Gnumeric and Abiword - all Free Software office suites. I trust those developers to Do The Right Thing(tm).

I care little about the file formats and the standards committees because I have far too many other things on my plate (like products I'm responsible for), and, as I said above, I trust the people involved with OOo, Gnumeric, etc to DTRT and make my documents continue to work.
Dan O'Brian

2008-10-06 01:21:47

At some point, everyone has to trust other people to do their jobs, otherwise nothing can ever get done because you'd be too busy making sure everyone else was doing their job.
Jose_X

2008-10-06 01:32:12

Dan, I believe in FOSS more than in open standards. Because of general repetitive pleas from Groklaw, I decided to participate in the politics of standards setting briefly but came away dissatisfied. In the end, it's OASIS' sandbox, and they will do what is in their best interest. It certainly looks like they will do a better job than ECMA and Microsoft dominated groups (note that Microsoft may come to dominate OASIS or any other group in time.. it's possible). I am willing to let them do their thing. As is, I have found ODF better than OOXML from what bits I have heard.

And I don't refuse to contact anyone. Do you have contact info because I don't. What I refuse to do is waste time. Everyone has to prioritize their time.

>> When I say experts, I mean the experts implementing the Free Software office applications that are very unlikely to have been “bought” and/or other experts that I trust (which in this case is limited to the aforementioned group because I don’t happen know any proprietary office developers).

I should mention that "experts" disagree all the time.

Also, I should mention that those developing "free" office suites are sometimes (many times perhaps) paid. Their software may be "free software" as defined by the FSF, but that doesn't mean they work for free.

Finally, if you do listen to these groups, you probably want to be trying to explain to AlexH why many of these groups don't like OOXML instead of why some do.
Jose_X

2008-10-06 01:38:49

I should also clarify, FOSS projects have their own politics. But with FOSS you can fork if you think differently and are willing to go to the trouble. To some extent you don't need cooperation from others to get your fork to work. With standards, OTOH, forking is a bigger deal (assuming it would even be acceptable based on OASIS copyrights.. IANAL). The whole concept of standards are to get many to agree. Individual standard is a bit of an oxymoron.
Dan O'Brian

2008-10-06 01:52:08

The people I know working on OOo are paid, the people I know working on Abiword and Gnumeric are not.

However, even though the people I know working on OOo are paid, I trust their honesty.

As far as AlexH, how do we know he doesn't listen to these groups?
Jose_X

2008-10-06 01:55:04

>> At some point, everyone has to trust other people to do their jobs, otherwise nothing can ever get done because you’d be too busy making sure everyone else was doing their job.

Just in case I was misunderstood, I wasn't trying to be condescending or sarcastic. I was honestly saying that we should seek advice/help from individuals/groups that we find trustworthy in order to help us manage complexity. Complexity is anything we haven't yet taken the time to figure out for ourselves. Time is a limited resource. What one day appears to be extremely complex, can later appear to be quite simple. What one can figure out, so can others. But we all have limited time. In my case, I may not be taking the time to try and understand the problem as well as I can or to present it as well as I can, but am so far willing to keep up with contrary arguments. Does someone want other/better examples from me than whatever I may have given?

As an aside, I am here partially watching the "Brad Pitt Troy movie". The scene just shown was of the Trojan king right after the Trojans defeated the Greeks who now supposedly will go back home. An argument is made to the king that attacking the Greeks by their ships would be foolish. The king ignores this because his trusted priest person says that the gods think the Greeks will be vanquished in an attack.

Funny coincidence. We have to trust someone whenever we don't dive into the details of something. Sometimes it works out and sometimes it doesn't.
Jose_X

2008-10-06 02:05:47

>> As far as AlexH, how do we know he doesn't listen to these groups?

Or that he does. Or that I do or don't.

Should we attack the Greeks? Whose advice do we take or do we dig into the details?

Anyway, I don't worry about misunderstandings if people can/will work to fix them. More upsetting is purposeful deception. As long as we stay away from purposeful deception as much as possible everything should work itself out slowly. We all cheat here and there though. Balance is good. I have seen myself and others go overboard at times. On the surface, I think most people will expect anyone trying to defend Microsoft to come a little bit more prepared than usual, and they will be seen very critically if they don't do a convincing job.
Jose_X

2008-10-06 02:32:08

>> And I don’t refuse to contact anyone. Do you have contact info because I don’t. What I refuse to do is waste time. Everyone has to prioritize their time.

Let me add.. I don't think anyone wants to waste time.. ie, others don't want to waste time with me either. To enter into some discussions, you need to do some homework if possible. That takes time.

In any case, if anyone has a link to a related public discussion, feel free to post that info here for the benefit of all.
AlexH

2008-10-06 06:23:25

Good grief, so much comment over such a small issue.

@Jose:

You said, "If we don’t know this extra date context, then we play it safe and keep the integer as an integer.".

That's precisely the situation! We don't have the extra information, so the integer stays as an integer.

However, the integer is still a buggy offset and is usually "one off" (i.e., is x+1 when the real value should be x).

So the situation is that you have to encode various schemes in order to deal with the buggy data, because you cannot convert it when you upgrade the file format.

It's really as simple as that.
Roy Schestowitz

2008-10-06 06:40:34

I assume that you will continue to disagree no matter the evidence you are presented to refute the argument.

As I wrote earlier, this discussion was resolved before; Microsoft just didn't fix its specs though.
AlexH

2008-10-06 06:56:28

@Roy: your 'evidence' in a blog post from 2006 is pretty much trumped by the normative reference to the ODF 1.2 draft.

If this was so easy, no-one would bother to encode the legacy behaviour into ODF 1.2. However, it's not that easy, so that behaviour is being put into the standard.

Coping with legacy data makes ODF actually useful. If we couldn't convert old data, ODF would be a significantly harder sell. There is a big difference between legacy file formats and legacy data, which people here don't seem to understand.
Roy Schestowitz

2008-10-06 07:02:58

http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-25680

Luc Bollen said,

October 3, 2008 at 9:27 am

Openformula (part of ODF 1.2) doesn’t MANDATE the bug, as ECMA376 was doing. From http://wiki.oasis-open.org/office/About_OpenFormula

“Doesn’t mandate mistakes. Just because one program gets something wrong doesn’t mean that everyone should make the same mistake. The specification is carefully written to not require certain bugs, just because someone has a bug. For example, Excel incorrectly believes that 1900 was a leap year, and at least draft version 1.3 of the Excel specification claims that compatible applications must make the same mistake. Nonsense. Instead, OpenDocument wisely stores dates as dates (not just numbers), and thus does not require that applications have this bug. The Excel specification also requires that applications cannot be more capable than Excel (it doesn’t permit support for dates before 1900). Again, nonsense. In fact, at least one OpenDocument spreadsheet application (OpenOffice.org Calc) can correctly calculate dates and date differences going back to 1583! Similarly, many applications handle complex numbers in a very clumsy way; we’ve devised the specification to make sure that future applications can support better approaches, instead of tying their hands to a technique known to be poor.”
Roy Schestowitz

2008-10-06 07:07:26

@AlexH: Why you haven’t a website or blog? I just want to see a bit more about you and your knowledge. PLEASE, show me who you are.

Maybe you’re the AlexH from www.contoso.com? http://en.wikipedia.org/wiki/Contoso

http://center.spoke.com/info/pDJMWq/AlexHankin

Alex Hankin Contoso, Ltd. Senior Director New York, NY

Skype: AlexH Home IM: Alex@hotmail.com Home Email: Alex@hotmail.com Work Email: alexhankin@contoso.com Work IM: alexhankin@contoso.com

Telex: 781 234 Home (208) 555-5656 Mobile: (775) 551-2345 Fax: (207) 555-9999 Direct: (207) 555-1112 Tel: (207) 555-1000

It's actually Alex Hudson.

http://www.alexhudson.com/
AlexH

2008-10-06 07:49:28

Haha, I missed that :)

Slightly sad that people will attempt to tie you to Microsoft for expressing opinions which don't fit with their world view. One thing I respect about Jose is that he always argues on the topic, not ad hominem.
Roy Schestowitz

2008-10-06 07:55:25

AlexH,

What raises this suspicion are actual past incidents. Microsoft deserves no trust anymore as was caught many times before employing forum shills and such (some examples). It continues to this date.

BTW, you have not seen that comment because it's only moments ago that I checked to see what was trapped by the automated filter.
AlexH

2008-10-06 08:14:11

@Roy: sure, and I understand that.

I just think some people find it very easy to wonder aloud at possible connections as a way of avoiding discussion of actual issues.
Pedro Gimeno

2008-10-06 08:51:41

I disagree with AlexH when he says it's impossible to fully support legacy documents through import filters. It would require some support from the ODF spec for it to work, though:

1. Implement a "Legacy Date" cell format. This cell format interprets a cell's number as a date with the 1900 bug for showing. Cells with date format in Excel files would be converted to "Legacy Date" when imported.

2. Implement a "LEGACYDATE()" function which accepts one argument, which converts a number into a proper date taking into account the 1900 bug. Excel formulas which have functions accepting dates as arguments would be fixed so that each argument that is accepted as a date is first passed through LEGACYDATE(). For example, WEEKDAY(a3+b3) would become WEEKDAY(LEGACYDATE(a3+b3)).

Scripts can't be supported, though. It's impossible to analyze a script and they would require manual fixing.

Of course portable documents should never use the legacy cell format or function. Documents intended to be portable should be manually transformed to get rid of the legacy bits.
Pedro Gimeno

2008-10-06 08:55:38

Forgot to say that the very same logic could apply to the 1904 quirk.
AlexH

2008-10-06 09:10:10

@Pedro:

It's a nice idea, but it's not great for a couple of reasons. A big one is that you're adding this LEGACYDATE() function into existing formulas, which will confuse users who are expecting the previous formulas.

That's actually a huge issue: formulas are basically user interface, and is one reason why they are so clunky even in OpenDocument 1.2. If we were designing something from scratch right now, I don't think it would look much like the existing system, but migration is a huge problem.

The second issue is that you're dropping all this *LEGACY() stuff into the sheet, but you're getting the same effect as setting a base epoch sheet-wide.

So I agree that it could work (although it could fail if anyone has written custom functions to do date manipulation within a spreadsheet), but it's the same solution as that already proposed in OpenDocument 1.2: you put in place the facility to manage dates with legacy epochs. I would venture that the ODF solution is cleaner; you're writing the same code (changing date offsets), but putting the function call internally in the spreadsheet code rather than externally in the spreadsheet formula.
AlexH

2008-10-06 09:11:25

Hm, I meant to add that I said you can't convert the data. Obviously, you can convert formulas which do the calculation to take into account different data, but as I said, that's effectively the same solution as ODF 1.2 proposes.

So my point was about numbers in cells, not formula function calls.
Ianp

2008-10-06 09:25:48

I'll use OO as the program in this example. If you open a "xls" file in OO then OO should know it will have the bad date problem. So when you've finished working on it, you then decide what format to save it in. 0O will then make this decision based on your choice of format to save it in, "If save in XLS format, save in bad format else if save in ODF format then save in correct format". If you know the format of the file, (embedded info or by file extention) then you should be able to work out what the integer represents. If you can't work it out then that file is "dead" unless you use the original spreadsheet program that created it.
Jose_X

2008-10-06 11:57:03

AlexH, I don't see what is confusing you. Let me put this simply and then we can fill in the exceptions as we come to them.

Take one:

You said originally: >> The problem is that you can’t just “convert” user data when you convert the file format, because spreadsheet data isn’t typed and you can’t know which numbers to adjust.

Here is my simplified response:

The formatting or some other clues give away the intended usage of a number as a date that uses some (possibly broken) algorithm; thus, you can convert this number type value into an ODF date type value, adjusting so as to map the original value into a value that works with the correct algorithm.

There is no problem. We know we had a date. We know the formulas to use in all cases.

If no such clues can be found then don't convert. In other words, don't convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.

Again there is no problem. We simply mapped each number to itself and used an ODF number type. If the orig was not intended as a date, we correctly left it alone. If the orig was intended as a date by some other application (since the formatting wasn't done for user visual purposes), we still preserved that value. There is no problem. New applications would not treat a number as a date because it got saved under the number type not the date type.

Where is the problem? Also, if you think there is a problem, give an example.

Take two:

>> Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it’s set to 1931-2030 or something, so you know that '98' == '1998', '31' == '1931', etc.

>> It’s exactly like that. Unless you know what the “range” is to begin with, you cannot hope to convert the data accurately, because you’re missing enough information.

In simple terms here is why this example you gave before is not a counter-example.

You are *not* missing the type information in this example. The config option is known if you convert using the app that uses that config option (presumably the same app the user would use to open the file anyway or else the user would be screwed anyway, even prior to any conversion).

The config option is known and the ODF date semantics are also known. The mapping is straightforward.

So why is there a problem here? We know we have a date. We know all the conversions necessary.
AlexH

2008-10-06 12:46:15

@Jose: but you're contradicting yourself. You say:

If no such clues can be found then don’t convert. In other words, don’t convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.

I've told you in many instances you can't know whether or not a certain number is a date. You can't say "don't convert", because then anything which uses the unconverted values starts spitting out the wrong answer!
Jose_X

2008-10-06 13:15:58

>> but you’re contradicting yourself

You didn't show any contradiction. I think you aren't understanding what I am saying. Please show the two items that contradict.

>> I’ve told you in many instances you can’t know whether or not a certain number is a date.

Of course you can know if a number is being used as a date. One way is if it is formatted as a date.

[This formatting information is found within the same file as the number in the case of Excel spreadsheets (or so is what reverse engineering or special access to Micrsoft has determined I believe.. as I think that is how OO.o interprets Excel files).]

>> You can’t say “don’t convert”, because then anything which uses the unconverted values starts spitting out the wrong answer!

What are you talking about? Can you give an example to this nonsensical statement. I must not be understanding you.

You need to give more context in your replies.

[I'm waiting any minute now for me or you to start saying "oops, my bad", but it's not happening. This is emboldening me to be more reckless to see if I go too far, but the problem is that you are not giving examples, as, in fact, you did not challenge my rebuttal of your lone example.]
Jose_X

2008-10-06 13:21:39

Oh, OK, I think I see where you think I was contradicting myself.

Let me rephrase.

>> If no such clues can be found then don’t convert. In other words, don’t convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.

If no such clues can be found for a particular numerical value then don’t convert that value. In other words, don’t convert the original file format to ODF or do convert the original fle to ODF but mapping such a numerical value identically to itself and to an ODF number type.

Anyway, so what is the problem now, with anything of what I wrote in this reply from which you quote? http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-26078
Roy Schestowitz

2008-10-06 13:37:04

Alex, I'm really not following you either. If you can open a file, you can interpret the data and then store it properly, bug-free.
AlexH

2008-10-06 13:59:04

Ok, let's start at the beginning and see if this helps.

First, there are no "dates" - that concept doesn't exist. A spreadsheet stores numbers. Some of those numbers may be formatted as "dates", where the spreadsheet interprets them as date offsets from an epoch.

Formatting is no indicator of type. Where a value is used in a calculation, it may or may not be formatted correctly - that's up to the user.

Second, if you're altering data, you have to ensure you have correctly altered *all* instances within a spreadsheet. So, you cannot do a reverse topological sort from "date formatted" cells to try to work out what other cells contains "dates": there may be data that is in the sheet, but not currently used in a calculation. One obvious example would be the result of a VLOOKUP().

It's that simple. You cannot look at a number like "5" and say "oh, that's actually a date". There are some clues in a spreadsheet. There are not sufficient clues to know. This is why no vendor automatically converts values; if it was that easy people would do it!
AlexH

2008-10-06 14:04:01

@Jose: Ok, here's a simple example in faux CSV for you to convert to non-buggy format:

"15","0",=A1+B1 "0","2",=A2+B2 "0","5",=A3+B3 "=WEEKDAY(OFFSET(C1,2))"

A1:B3 are formatted as "date". Should be easy, no?
Ianp

2008-10-06 14:06:48

@AlexH "I’ve told you in many instances you can’t know whether or not a certain number is a date. "

If you encounter this situation, you've got a dead file. If a new spreadsheet application cannot find out if its operating on a number or a date then that makes the file unusable so this argument is meaningless.
Jose_X

2008-10-06 14:13:09

Consider 3 hypothetical files using the old extension ".old" and how these would be converted into ODF.

file nodates.old: 5 3 6

file hiddendates.old: 5 3 6

file obviousdates.old: 5 3 6 fd232

The first file is a spreadsheet that has 3 number values. These values have to do with how many oranges, pears, and watermelons we sold last week. There are no dates.

The second file has a "3" which is a date. It refers to the third day during the week. Ordinarily, we can't tell this refers to a date (let's assume we can't tell in this case unless we ask the author manually -- ie, there is no hint in the .old file that this is a date).

The third file has a date also, but this information can be deduced because the "fd232" code means that the second field (the "3") is a date interpreted according to the broken leap year formula (I made this code up but let's flow with it).

Here is how I am saying that these three files would be handled. An app that understands these .old formats would map...

.. nodates.old into an ODF file with the numbers not changing values and staying as number types within a table.

.. hiddendates.old into an ODF file with the numbers also not changing values and staying as number types within a table. Thus hiddendates.odf and nodates.odf may look essentially the same.

.. obviousdates.old into an ODF file with the "5" and "6" staying put as number types; with the "fd232" turning into whatever formatting code yields the same effect in ODF as in .old; and with the "3" being turned into the right number so that it maps properly when we use the correct date formula, and the type of the data would be date.

Now, nodates.old -> nodates.odf presented no ambiguities to the app doing the saving. There was nothing much to be done: the data looks identically as it should.

hiddendates.old -> hiddendates.old presented no ambiguities to the app doing the saving. There was nothing much to be done: the data looks identically as it should. Alex has no problem with this case because I did not translate. The data is the same. No information is lost. No new semantics are implied because I used the ODF number type which has ordinary number semantics (not date semantics) just as is found in the .old files.

obviousdates.old -> obviousdates.odf presented no ambiguities to the app doing the saving because it was able to identify the date and know the associated algorithm and it knows the algorithm associated with the ODF date type. New applications know that the ODF date type uses the correct algorithm, so no prob there. Old applications, can't even read ODF, so a helper function would need to be constructed. This helper function knows to convert ODF date values into the values used by the .old as all of the information needed is known: the semantics of the date type for ODF are known and the semantics for the .old dates are also known (otherwise we would not have translated to ODF in the first place.. but the "fd232" was assumed to give this information in total).

Note that if "fd232" did not include everything we needed (eg, the algorithm, any timezone offsets, etc) then we would have applied case 2 and simply mapped the number into an identical valued ODF number of the number type.
Roy Schestowitz

2008-10-06 14:15:56

Alex, I'll repeat myself for the who-knows-what time. If your data file contains enough information for an application to interpret the meaning of values (and type), then you can 'rescue' this data from the bug. It's very simple, really!
AlexH

2008-10-06 14:39:42

You three still don't get it. There is data on a spreadsheet which isn't necessarily part of any calculation chain.

The only way your "conversion" system could work in the face of that is to flag individually each cell which had been "converted", so that the stuff you couldn't convert could be later "fixed". But it's ugly, and quite rightly no-one does it.

ODF 1.2 takes the right approach by allowing variable epoch calculation. It's simple, and it works.
Jose_X

2008-10-06 14:44:12

Alex, I looked over your example. It takes a little while because I haven't coded in this for a while.

>> First, there are no “dates” - that concept doesn’t exist.

Fine. I accepted this from the start. At least we are on the same page so far. Step 1: check.

>> Formatting is no indicator of type....

If it is formatted as a date, I suggested we do assume it is a date; otherwise, this is a bug in the original spreadsheet.

Sure, a value can dub as a date and as a password or something else. These oddball cases should be rooted out. The conversion would presumably be done by someone that has a clue over the specifics of the spreadsheet page. In any case, this odd scenario likely is not common. Also, there is no need to convert. A conservative company would start by not converting anything or converting and checking. However, it makes no sense to bind all time into the future to use the bugs of the past on account of a failure to find a simple rule that would apply 100.00% of the time.

>> ... Where a value is used in a calculation, it may or may not be formatted correctly - that’s up to the user.

Right. Any arbitrary value can be used as a date (but not indicated as such within the same file or through any clues given to the processor converting into ODF) by any arbitrary piece of code, whether that code is called a spreadsheet formula or is a utility application that resides on another file on another computer on another network.

In the absence of date formatting and any other needed information that would be needed by the given file type to suggest an unambiguous date, we don't map into the ODF date type. Instead, we map identically into the/an ODF number type.

>> Second, if you’re altering data, you have to ensure you have correctly altered *all* instances within a spreadsheet. So, you cannot do a reverse topological sort from “date formatted” cells to try to work out what other cells contains “dates”: there may be data that is in the sheet, but not currently used in a calculation. One obvious example would be the result of a VLOOKUP().

First let's start by pointing out the these scenarios may apply to some spreadsheet file types but not to others.

Now, my short answer here is that if we can't tell for sure, then as stated already, we map the numbers unchanged into ODF number types. This amounts to an identity/null conversion and is no worse than what OOXML demands.

I may try and break this down more later to analyze Excel files. Worst case, we would have all Excel file numbers map identically into ODF number types. However, the ODF date type is still there for when we know we have a date.

>> It’s that simple. You cannot look at a number like "5" and say “oh, that’s actually a date”. There are some clues in a spreadsheet. There are not sufficient clues to know. This is why no vendor automatically converts values; if it was that easy people would do it!

If we don't know, we don't adjust the values or map to ODF date types. There is no problem. This simply means we aren't trying to deduce semantics from the old format to identify candidates for the ODF date type.

These cases don't present problems.

And the cases where we do have enough info means that the converter can know if an injective mapping is possible (to guarantee that we can find the inverses uniquely or at least without problems -- depending on the particular semantics of the file format, the mapping may not even need to be injective). BTW, X+1 is essentially injective as are all linear functions (scaling and translating). http://en.wikipedia.org/wiki/Injective_function

The point though is that we would have to be sure we could undo the "damage" of conversion. If we couldn't guarantee that, then we would not attempt the mapping into the ODF date type and just stick with the number type.

Remember that we aren't just talking about Excel spreadsheets. Any arbitrary file might be mappable into ODF. ODF is a general purpose file format. It makes no sense to cripple it when all scenarios can be handled gracefully. Sure, for Excel files, maybe a crippled ODF would smell just as bad, but we don't have to accept a smelly ODF format period, as we can do better.
Jose_X

2008-10-06 14:48:31

Roy, #comment-26098 is not showing up.. did it get filtered? Should I repost?
Roy Schestowitz

2008-10-06 14:55:11

Jose, it entered the queue for moderation and I've just recovered it.
Jose_X

2008-10-06 14:59:11

>> The only way your “conversion” system could work in the face of that is to flag individually each cell which had been “converted”, so that the stuff you couldn’t convert could be later “fixed”. But it’s ugly, and quite rightly no-one does it.

The "flag" is automatic. It is called the date type. Only things converted become the date type. Again, it is automatic.

If a strange bifurcation would be needed to account for all possibilities, then we could just not convert to the date type. The date type implies "date" and nothing else. The/a number type can always be used.

I'll quote from the comment that hasn't showed up yet, >> Sure, a value can dub as a date and as a password or something else. These oddball cases should be rooted out. The conversion would presumably be done by someone that has a clue over the specifics of the spreadsheet page. In any case, this odd scenario likely is not common. Also, there is no need to convert. A conservative company would start by not converting anything or converting and checking. However, it makes no sense to bind all time into the future to use the bugs of the past on account of a failure to find a simple rule that would apply 100.00% of the time.

What I mean here is that something formatted as a date might also take on a very different role. In this case, changing that value, although correct insofar as the role of the number as a date is concerned, would lead to problems for the number's alter ego.

Remember that we can always be conservative and not convert, but this is no reason not to have a correct date type.

In fact, I don't see any argument for having an incorrect date type. If the old Excel files don't have date type information as you say, then why ever would we consider converting into a date type (except to be aggressive)? Hence date types would only be used for new data, in which case what does the legacy argument have to do with anything since legacy means not new.
Shane Coyle

2008-10-06 15:05:40

Maybe I am confused, but look at it this way.

Suppose some govt office has a bunch of old spreadsheets which were saved by this buggy excel version, even suppose they still have that old 386 and the excel version running to access them and print to that ancient printer over there, once a year (if ya think its not likely, ya havent worked for the govt).

Anyhow, we decide we want to open those files in a shared folder from our shiny workstation with Office 14 or OO3 or whatever. The modern app needs to know how to tell if that file has this known bug, render the information correctly in present use, and also cannot (IMO) change the file itself by 'repairing the bug' because it would wreck the file for its native app version, which expects its 'buggy' data in order to give the expected result.

Translating is ugly, but the important thing is the data and getting it right, each time we open it. In terms of file type conversion, different issue because then you know you can safely ignore that version-specific bug and just save the correct information after you translated it in and corrected for the bug.

My wonder is more, was this bug ever fixed, or was this a case of hiding your sins in a closed source/closed format application?
Jose_X

2008-10-06 15:06:16

>> ODF 1.2 takes the right approach by allowing variable epoch calculation. It’s simple, and it works.

Alright. I have not looked at the details. I can accept an attribute that would specify the algorithm to be used for converting the dates/numbers. That is OK.

I would want sane defaults.

But despite this, OOXML's approach of forcing a twisted conversion calculation looks to be a folly. It's even sadder if we consider that Microsoft's own past formats did not type dates (Alex stated this for I would otherwise have no clue). Why, if you are only now going to add dates to the repertoire, would you want a crippled date type? All past "dates" would just map to number types as a conservative default anyway.

Well, I can hypothesize some reasons why Microsoft would do this. I was trying to imply "why would any format based on technical merits want to have a crippled date type on purpose..."
Jose_X

2008-10-06 15:17:08

>> The modern app needs to know how to tell if that file has this known bug, render the information correctly in present use, and also cannot (IMO) change the file itself by ‘repairing the bug’ because it would wreck the file for its native app version, which expects its ‘buggy’ data in order to give the expected result.

Modern apps have all the information they need (barring proprietary secrets of course). If the format is X then use meaning A. If the format is Y then use meaning B. This is possible if formats X and Y existed when the app was created/updated.

You are correct that the old application won't know. But then why would you convert into ODF in the first place since the old application couldn't read ODF? Not every user would convert their old files into ODF. If you build a translator from ODF into the old format, that translator can make the adjustments as it knows the semantics it needs before and after. If it wouldn't know the semantics unambiguously (Alex gives examples where messes can occur if we try to be aggressive converters), then this info would have been known and the conversions would not have taken place in the first place (at least not without user approval).

None of this, we can see, has any implication to lead us to want to cripple the date type semantics of a new format. We are always safe by converting as is into a number type. Future creations of data as date type should be clean.

We can only imagine why OOXML would force the crippling upon us.
AlexH

2008-10-06 15:20:49

@Jose: it doesn't have a crippled date type. This is purely about the "date as a serial integer" format used by older systems. Opening any older data has this same problem, it's not a file format issue.

Seriously, the differences between OOXML and ODF in this area are minimal. Both have a system to deal with older integer encodings. Both have date types which do not feature this bug. Both can cope with different epochs.
Jose_X

2008-10-06 15:37:12

>> it doesn’t have a crippled date type. This is purely about the “date as a serial integer” format used by older systems. Opening any older data has this same problem, it’s not a file format issue.

OK, I went back and read this link http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-25680 and the comparison appears to be against the Excel format and not against OOXML.

This link also mentioned earlier http://www.robweir.com/blog/2006/10/leap-back.html does mention OOXML; however, as you already noted, is dated two years ago.

>> Seriously, the differences between OOXML and ODF in this area are minimal. Both have a system to deal with older integer encodings. Both have date types which do not feature this bug. Both can cope with different epochs.

Are you kidding me? So we worked on a non-issue with the current OOXML and ODF? I mean, sure it was a fun mental exercise to an extent.

Past experiences suggest that just because you say "everything is fine" doesn't make it so; however, I have no other reason to complain about any specific format with what I have currently verified.

I also need to get on with some other work.

PS: Alex, thanks for the examples you eventually gave. It can be annoying to come up with them, but it helps track down where our minds are not meeting. It's still not completely clear to me were the gap existed, but I have a better idea. Of course, this potentially being a non-issue .... .. Roy, this forum is a great time sink! Thanks. Thanks a lot ;-)
Roy Schestowitz

2008-10-06 15:45:55

Roy, this forum is a great time sink!

Well, if it's any solace, this thread/page has been viewed well over 10,000 times and this server fed almost 50 gigs so far this month (mirrors and CORAL excluded).
AlexH

2008-10-06 15:47:02

Are you kidding me? So we worked on a non-issue with the current OOXML and ODF?

Well, I did say at the beginning it wasn't really a file format issue ;)

Both OOXML and ODF use the same format for dates - the ISO format - so it only comes down to how to import legacy data. I guess OOXML mandates .xls-compatible defaults, but in practice that's just stating the bleeding obvious... :)
Ian

2008-10-06 16:50:02

@Roy

"If your data file contains enough information for an application to interpret the meaning of values (and type), then you can ‘rescue’ this data from the bug. It’s very simple, really!"

I'm always nervous with the concept of a "best guess" data conversion. If you have a spreadsheet with 3 rows and five columns, it's really not a big issue. When you have a 20 MB file with thousands of possible rows, you have to trust the computer to not screw anything up. Best guess data conversion isn't necessarily a trustworthy process, certainly something I wouldn't trust. I don't care if it's OO.org, Excel, 1-2-3, whatever.
Roy Schestowitz

2008-10-06 17:10:19

Ian, I was not suggesting that guessing would be involved?
Jose_X

2008-10-06 17:41:11

Ian, yes, as Roy said (I think), sometimes you can be very precise. Alex has been focused on Excel spreadsheets. The ODF date tags could be used as a target from a lot more types of files than Excel spreadsheets, and many of these might make very clear that something is a date (which Excel apparently doesn't do).

Your apprehension is shared. Caution applies to any type of automated data manipulation.
Jose_X

2008-10-06 17:46:17

In the interest of fairness, I'll post a link to the current ODF 1.1. I think focusing on ODF is smarter than helping Microsoft hunt down its bugs with OOXML. Sure, it's more fun to help debug OOXML or find gotchas, but this is not always a desirable exercise if you care about ODF adoption being taken up against the leverages Microsoft will use to help OOXML.

From http://www.oasis-open.org/specs/

We have the full standard on a single webpage. http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1-html/OpenDocument-v1.1.html [note, this is a large webpage]
AlexH

2008-10-06 18:04:13

@Jose: note that current ODF versions don't standardise formula stuff at all, so it's not really relevant to them.
Roy Schestowitz

2008-10-06 18:06:36

Well, this page, unlike the discussion, is not about formulas. It's about formats as a whole.
AlexH

2008-10-06 18:30:30

Given the whole premise of this argument is shaky at best (that ISO freely publish things which haven't been, er, published..), that's arguable.

ISO aren't exactly the place you'd want to go for standards documents. They charge €220 for ISO 26300...
Jose_X

2008-10-06 22:21:34

An interesting plugin for OO.o would be a tool to help the user visually add semantics to a document, perhaps as "hints" or perhaps by selecting specific tags and filling in attributes (for say a paragraph or highlighted section of text, for a set of cells, for a set of images or image parts, etc). This would be a way to facilitate mappings from older formats to ODF in a visual manner without having to deal directly with XML. Maybe this is already on the way or exists.
tuomoks

2008-10-07 03:32:12

First, I don't like the way MS handled the ISO process but that's life, done and closed if not forgotten.. Let's do better next time.

Now, the whole date problem! Or other such problems. Nothing new - I once had to sort out what to do in an insurance company, calculating dates 200 years back and 200 years to the future - already 152 (or was it 162?) different programs, procedures, methods, etc which gave different answers but already in many (18000+) applications and used by different databases for calculations! Talk about nightmare, actually only two gave correct answers in each case(!) and one of them was SLOW! So - I can understand the technical pain but not the politics - deal with it, it's a fact no matter what you think.

The whole problem as I see it is the current love to (new?) metalanguages. Why not then use SGML and enhance it? Clean, simple, proven, etc? Much faster by design than anything after that? Add LaTeX and simple authentication, authorization, AES encryption, etc to that, together they would support any and all requirements we can think today - even binary or interactive data would be no problem? Not invented here syndrome - one again? Besides - prove to me that metalanguages are better for computing than binary! Maybe for humans but the computers do the work and I'm not even sure of the benefits for humans - it is as easy to read hex, octal, whatever as ASCII which actually covers a very small part of all the needed (human) languages - try to read NLS supported meta sometimes - back to interpretation and translation?

Sorry, seen these fights too many times (probably?) - they have nothing to do with "a better way to solve a problem" but who's controlling, who makes the decisions, politics (and money) as usually. Nothing technically difficult but (still?) understandable, IT is a young field and going through the growing pains.
Roy Schestowitz

2008-10-07 07:10:57

Whose control though? There's a fight for the perception that ODF is IBM, which is false. Does any company own or control HTML, for example?
tuomoks

2008-10-07 08:30:29

@Roy - you are right, too many misconceptions who, what, etc. Nobody (not IBM, not Sun, not ...) owns ODF, it is a standard. Now, nobody actually owns OOXML either but it has this small problem, the extensions are not defined the way they should in a standard. The problem with OOXML are those undefined extensions - they can include patented, whatever proprietary methods, handling, etc - once used you are stuck! No way out, DMCA takes care of that, you pay and maybe retroactively a lot - to one company, no one else can help you! Or at least with great pain create an environment for conversions back and forth - fun if it even works? The governments, institutions, large corporations, etc follow the rules and for a reason so..
Roy Schestowitz

2008-10-07 09:12:37

OOXML is not Microsoft Office? That's news to me.
Roy Schestowitz

2008-10-07 09:13:58

I should add: the fact that the documents presented in this post (OOXML) arrive from Microsoft, well... that speaks volumes. Microsoft us just using ISO as a 'front'.
tuomoks

2008-10-07 10:11:44

OOXML (Office Open XML) is the standard, Microsoft created but not yet (if ever) implemented in any product, not even in MS Office! Just to clarify - an copy :

"Microsoft originally developed the specification as a successor to its earlier binary and Office 2003 XML file formats. The specification was later handed over to Ecma International to be developed as the Ecma 376 standard, under the stewardship of Ecma International Technical Committee TC45. Ecma 376 was published in December 2006[9] and can be freely downloaded from Ecma International.

An amended version of the format, ISO/IEC DIS 29500 (Draft International Standard 29500), received the necessary votes for approval as an ISO/IEC Standard as the result of a JTC 1 fast tracking standardization process that concluded in April 2008. Next and last step in the standardization process is the final publication as of ISO/IEC IS 29500, Information technology – Office Open XML formats as an international standard."

Yes, of course MS used (again?) ISO/whatever to force something but this time it may not work well. Seen the rumors that MS wants to take over ODF? Good for them, bad for people who let it happen - if they do, I hope not!

Now, working in/with small and huge corporations I can tell - they think weird! Any, even a small company can participate but for some reason they just refuse and take whatever is given? Huge corporations have their own problems, often slow to react, internal fights which prevent them making decisions before too late, whatever. So, if MS can do the next "cup", start managing ODF as seems with some other OSS projects (amazingly many) - good for them and instead of complaining people should start working if they don't like it.

As I have said, I'm not a big MS fan but at least they react! In some small companies I have worked, the price of a VP lunch would have paid one year in standards committee, guess which one they select - they just keep complaining instead of making their own future! A weird world we have!
Jeetje

2008-10-07 13:25:30

AlexH, there are 2 ways to solve bugs in data that are a result of former (faulty) specs: 1) Carry over the same faulty specs into new specs. We already know what that begets: a 5K+ pages 'spec' describing every fault ever made since the inception of the former spec(s), ironically dubbed OOXML 2) MAP every faulty spec to a correct spec and specify a mathematically correct algoritm to implement that mapping. The resulting correct spec will be a lot leaner and meaner / easier to implement, the 'downside' being that most mappings are not reversible (i.e. it's a one-way street).

The benefits of option 2 are manifold: a) New implementations of the correct spec aren't burdened with the obligation to account for all possible faults, hence the resulting software will be small and fast. b) Converters from old an old, faulty spec to the new, correct spec can be implemented separately, allowing for bulk conversion of old documents into new, cleaned up documents.

The biggest downside: The original manufacturer of the faulty software (based on its own faulty specs) is caught with his pants down and may very well lose a lot of business to people who ARE capable of keeping data accessible for decades to come.

Basically, MS used a process akin to ISO 9000 series certification in the most perverse way possible, asking ISO to confirm the way data has been handled since the inception of Word / Excel aso is compliant with the spec they have now drawn up. From a business point of view, ISO had very little choice but to agree the data is compliant to specs, whereas from a technical PoV they should have rejected the whole spec as being a waste of the trees used to produce the paper it was printed on.

The right way forward is saying byebye to all the errors MS ever made in storing our data, the only corporation that is able to help us do that is MS itself, and if they don't help us out quickly they may very well help themselves out of business pretty fast (considering how fast we are approaching a big recession, as MS Office still is a pretty poor value-for-money proposition).
AlexH

2008-10-07 13:31:33

@Jeetje: I agree with that. It's just that option 2 isn't available in this instance, as I have shown many times.

This isn't a recent problem, nor is it arguably MS's problem. If it was so trivially fixable, Microsoft would have done it already - not least in the early days, since that would have caused added incompatibility with Lotus 1-2-3.
Jose_X

2008-10-07 17:07:19

[Jeetje] >> 2) MAP every faulty spec to a correct spec and specify a mathematically correct algoritm to implement that mapping. The resulting correct spec will be a lot leaner and meaner / easier to implement, the ‘downside’ being that most mappings are not reversible (i.e. it’s a one-way street). ... [AlexH] >> I agree with that. It’s just that option 2 isn’t available in this instance, as I have shown many times.

My two cents:

It is not a problem to create a new item whose map from legacy is not well defined in all cases. This just means that legacy stays legacy, but the new can have new good solid home.

As one example, in the case of "dates" in formats that don't have that type, it just means that you keep them as "numbers", whether in the old format or the new, if you need to be conservative or want maximal flexibility. Where possible, you may migrate to date types. Also, new dates that are created will have their date type as well.

As far as having many choices, eg, dates based on X or Y alg or reference point, that is a different issue. I like choice. I also like constraining choice for use cases (that's what types do.. for particular use cases they limit the range of possibilities). So overall, I have no problem if odd date formats exist, but I like to have "profiles" or whatever you want to call it (eg, "portable documents") where you will find a restricted well-defined environment. Judicious use of limits for well-defined scenarios is a plus.. but you also want an ample toolbox to be able ultimately to handle a great many scenarios.

This brings up extensions and monopoly leverage. Extensions are good if used for good. They are bad (too few contract constraints) in the hands of someone that can and will abuse it, eg, via the embrace, extend, extinguish strategy.

The best of both worlds is to recognize that monopolies and perhaps other types of players need special restrictions but the rest of us don't (at least not yet). Reach monopoly status, and you graduate. The Microsoft clan should have left a long time ago and left Microsoft on cruise .. to one day be overtaken by others. Their existing power reach while still aboard Microsoft is unhealthy for the rest of us.
Luc Bollen

2008-10-07 17:35:31

@AlexH: I come back to this discussion after a couple of days, and I did not read all the comments made since then. I admire you for being patient enough to continue the discussion.

I would just like to say that I fully agree with your analysis: the .xls files have not enough information, in some cases, to reliably adjust the data for the 1900 bug.

We only differ on the semantic analysis of the text contained in the OpenFormula spec: do they *standardise* the "1900 bug" or do they *document* it ?
AlexH

2008-10-07 17:58:16

@Luc:

I'm not sure what difference you see between standardising something and documenting something. At the end of the day, a standard is simply a documented specification for something.

Does ODF mandate handling the leap-year bug? No; both ODF and OOXML have a specific date type for data which doesn't suffer this problem. It only applies to importing legacy data.
oliver

2008-10-07 18:28:56

Is there a Torrent available of these files? I'd like to have it mirrored locally before all public mirrors are shut down...
e7o.de

2008-10-07 19:06:09

Bunkern: OOXML-Dokumentation leaked...

Nach vielen auftauchenden UnregelmäÃÅ¸igkeiten bei der "Normierung" von OOXML ist nun auch der Standard an sich im Netz aufgetaucht. Typisch ist, dass die Copyright-Keule ausgepackt wird und im Blogeintrag deshalb die Datei nicht mehr zu finden ist: ...
Jose_X

2008-10-07 20:08:43

>> Dokumentation leaked

Reminds me of piracy.

It's all good for the vendor.
rcfa

2008-10-07 21:20:59

Putting bugs into the standard is NOT acceptable. A reference to existing user data is not relevant. The standard, if it's worth to be called one, should have the revision of the document format version as part of the file format. Thus any legacy spreadsheets, etc. should have a format version smaller than the first format version of the ISO standard. Converting legacy documents should then where possible make the required adjustments (date), or warn the user (calculation issues). It's not acceptable that astronomy and mathematics are redefined for all ages, just because some programmers decades ago weren't able to think straight. There is NO ROOM for legacy bugs in a NEW STANDARD. The removal of these bugs must be part and parcel of the transition from some proprietary format to an international document standard, and the resulting transition will necessarily require similar care as the y2k issue. In neither case is it acceptable just to keep doing what was done in the past in order to avoid breaking backwards compatibility. People who don't want these transition pains can stick with the old, proprietary document format.
AlexH

2008-10-07 21:30:48

@rcfa: with that attitude, we'd be stuck with .xls forever more.

Having a transition plan is the only way you can get people to upgrade to new formats like OpenDocument.

It's not technically nice, no. But it's a practical necessity. A new format which people can't upgrade to is of very little use to people who need to do real work.
Roy Schestowitz

2008-10-07 21:48:37

And yet, bugs are being rewarded, Watch what Microsoft did to HTML/CSS... deliberately even.

“We’re disheartened because Microsoft helped W3C develop the very standards that they’ve failed to implement in their browser. We’re also dismayed to see Microsoft continue adding proprietary extensions to these standards when support for the essentials remains unfinished.”

–George Olsen, Web Standards Project
AlexH

2008-10-07 21:54:27

@Roy: that is a quote dating back about eight years now, though.

The Web Standards Project has had a Microsoft Task Force for a number of years now, which seems to be having a real effect.

The web browser market has been changed massively by free software, and Microsoft are not in a position to ignore standards now. And if you want to see fewer places use Silverlight, you should be rooting for better standards support in IE, because without SVG/etc. you don't have many other options - and it's only IE behind in that area.
Roy Schestowitz

2008-10-07 21:59:10

@Roy: that is a quote dating back about eight years now, though.

Ah! That makes it OK. Let's just forget all the crime where (age >= 2 years).
Roy Schestowitz

2008-10-07 22:00:51

Just to clarify, I don't compare it to crime in this case, but how often I hear this excuse about age when bringing up heaps of blatant crime!
name required

2008-10-07 22:41:56

download the specification (rar version) from the stealthnet.

stealthnet://?hash=6AED03BB4BA2B91393BB5E97E5CCA8F49BBF650BD33D7D59D446B4EAA4B10FE2A78528CAA3F48E00EDD075E6A014FD5AC924FDEEB7B4B3CF63ED88860437CE48&name=OOXML-ISO-standard-english_leaked-html-edition_october-2008-1080-boycottnovell.com.rar&size=164435005

use stealthnet for your p2p needs and participate to make it larger and stronger and enrich it with your content.

dont let these war- and money mongers rule this planet and enslave humanity any further.
RJoe

2008-10-08 08:11:21

Just my opinion in two or three cases involved here...

First: How can the documentation of an ISO-standard be secret? is there no obligation to publish such a document???

Second: A new standard should not implement errors of previous applications. The 1900 bug should not even have any effect on the OOXML formatted data, because we talk about a calendar date. This should be formatted yyyy.mm.dd or something like that, but not in days starting from a specific date! If an application wants to be compatible with previous versions, it can rebuild it in it's internal data.

It's a shame what happened in norway these days. The oficials from ISO don't have any spine. Otherwize they have rejected this document from MS.
AlexH

2008-10-08 08:20:59

@Roy: I'm not saying forget about it, I'm just pointing out that things have changed in the meantime. Anyone reading what you wrote might have been confused and thought it was a recent quote, when that is not the current outlook of the Web Standards Project.

@RJoe: one of the things ISO has always done is charge money for paper standards. They've never been published openly except where another organisation also has their own copy (e.g., OpenDocument). That's obviously something which ought to change.

Your point about dates is correct, and you've actually pointed out how the modern date type basically works. But as I said previously, it's not as simple as saying "just convert old data", because you can't. This hack will be with us for many years to come.
Roy Schestowitz

2008-10-08 09:11:18

RJoe,

ISO was, in part, stuffed by Microsoft employees, so the decision to let this abomination happen was down to Microsoft, too. This impulsive thing was a response to corruption in the process where people got bullied, bribed, blackmailed. I thought that only the 'non-finalisation' of the text was the reason it was not out there. It's surprising to find that so-called 'open' standards are not open even for access (an afterthought and a realisation that came to me only later, so I removed the files).
Jeetje

2008-10-08 11:14:25

@AlexH, Jose_X: I partly agree, partly disagree with the both of you as far as the 'the right way forward' for the year 1900 bug and similar issues is concerned ^^

I'm on the same page as Jose as far as choice is concerned for using X or Y alg or reference point, however as Rob Weir showed in his piece regarding the YEARFRAC function (http://www.robweir.com/blog/2008/05/fractured-yearfrac-and-discounted-disc.html), those algoritms and reference points need to be unambiguously defined lest we run the risk of crashing another bank or Mars lander ^^

And if we have two well defined algoritms with associated reference points, it's a trivial excercise to specify a mathematical mapping from the faulty one to the correct one AS LONG AS one point in the faulty specs space doesn't map on multiple points of the correct space. If the latter case occurs, context will need to be taken into account to try and estimate the correct mapping and as with all algoritms taking context into account, the best judge of the final result will probably be a human.

The bigger question though is: how many documents CANNOT be mapped automatically, i.e. need context and maybe human intervention to correct any errors?

However, SC 34 is still muddying the waters regarding the future spec unifying ISO 29500 and 26300, diluting that process with the simultaneaous task of ensuring the mapping of legacy MS documents to the new format will be relatively painless for MS (i.e. NOT aiming for the best possible unified format for the next coupla decades). Already a number of countries encompassing a sizable portion of the globe's population have stated their prefered document format is ODF, so if SC 34 doesn't cut away all legacy fluff from ISO 29500 and strive for unification by the end of 2009, their efforts will become wholly irrelevant. And that would definitely be a shame, as that committee is about the only forum outside MS that is at all able to draw up mappings from faulty specs to correct specs...

First things first: 1) A (mathematically correct) unified document format by the end of 2009 2) see 1 3) see 1 4) As soon as 1 has been developed, spawn X workgroups to help out with conversion algoritms from legacy to unified.
AlexH

2008-10-08 11:33:17

@Jeetje:

I think you actually raise two different problems. The "leap year" bug is a very specific and quite unique issue, in that it's basically impossible for software to "fix" spreadsheets. The best approach so far is to put the standard (legacy) epoch back one day into 1899, so that the values are 99% correct without the need for any conversion; only people with spreadsheets that care about days in 1900 will experience problems. That's sound engineering.

The other issues, like YEARFRAC, are where OOXML is not soundly specified enough. I think this is just competition in action: one early advantage of OOXML was that it went much deeper than the OpenDocument specification, and this was touted as a benefit. Now, the boot is somewhat on the other foot, because OpenDocument is reaching the same depths but at a greater level of detail.

We're sadly still in the same situation of copying what Excel does, but that's because this is really user interface. Any change here impacts users, not the vendors.
Pedro Gimeno

2008-10-08 11:51:33

@AlexH:

>> @rcfa: with that attitude, we’d be stuck with .xls forever more.

Wouldn't that be .wks instead?
AlexH

2008-10-08 11:56:13

@Pedro: well, precisely :)
rcfa

2008-10-08 13:26:22

@AlexH&Pedro: No, we wouldn't be stuck with .xls/.wks forever. The transition might take a bit longer, and it might be a bit more painful, but we'd end up with fixed software (and spreadsheets are software, too).

The y2k issue was neither quick, nor cheap; it was what you'd call "paying for past sins". The same needs to happen with these date and calculation bugs. Just define a bug a standard is as ridiculous as redefining the meaning of noon during "summer time" (there's no such time, because noon is when the sun is highest, not when a bunch of politicians decide it to be).

The reason I bring up summer time is no accidental: instead of having summer and winter HOURS (as in opening or business hours), the government decides to "cheat" everyone by redefining an astronomical event. They could equally easily mandate that school and government office hours start one hour earlier in summer, and more or less the rest of the economy would follow suit (working parents have to bring kids to school, business want to sell to government, etc.)

That would be the right approach. It seems that getting things done right doesn't count anymore, only slop counts, as long as it "gets done, who gives a f* how it gets done". And it's that attitude that creates that sort of mess in the first place.

If you screw up, you have to pay for it. You can pay now, or a lot more later. The price just goes up the longer you wait.

So the point of a quick transition is completely lost if the transition doesn't fix the legacy issues in the process. I rather see a much slower transition and adoption, but can count that there are no dead legacy dogs buried in new documents.
oliver

2008-10-08 17:20:59

> This hack will be with us for many years to come.

So if _that_ is already given - what is the plan to get rid of the hack in the long run? I mean, even if I accept that this hack can't be fixed _now_, can I at least expect that people are working to completely fix this over the coming years? Or did you actually mean to say "This hack will be with us for as long as Microsoft is in business"?
rcfa

2008-10-08 17:37:19

@oliver: what they must mean: "This hack will be with us for as long as nobody has the guts to ratify a standard that's worth the name standard." Bugs are there to be squashed, not to be elevated to a standard. What's next, are we going to redefine Pi as the integer 3?
AlexH

2008-10-08 17:52:43

@oliver:

It will die over time as people move to typed spreadsheet formats. At some point, probably in five years or something, the feature will get dropped from the specs., then later the apps will stop supporting it.

There's not really a huge amount of point removing stuff from the specification while it's still in use by users and has to be supported by applications. That's one reason why HTML5 is a lot more promising than XHTML2: in fact, XHTML2 is almost the case study in why technical perfection does not work.

OOXML Leaked: The Stuff ISO Doesn't Want You to Have (Updatedx9)

Comments

Microsoft: We were naivé about standards. No, really!

Recent Techrights' Posts