[
Update: Marius
has produced this HTML version which is easiest to browse and requires no large-sized downloads. Another reader,
Tony Manco, has produced this
HTML version (
another mirror... and
another) of the core of OOXML so that you can access the specs quickly.]
In light of the systematic abuse and the demise of ISO, which IBM loudly protested against [1, 2], we shall no longer let this process remain secretive. We finally have complete copies of the documents which the shenanigans keep behind passwords (unlike ODF which they attack). This includes 6 files, namely:
- 1080.pdf
- OfficeOpenXML-WordprocessingMLArtBorders.zip
- OfficeOpenXML-SpreadsheetMLStyles.zip
- OfficeOpenXML-DrawingMLGeometries.zip
- OfficeOpenXML-RELAXNG-Strict.zip
- OfficeOpenXML-XMLSchema-Strict.zip
[
Note: appended at the bottom of this post we now have 1081c, 1082c, and 1083c.]
[
Note #2: we now have a mirror listed at the bottom.]
For those who forgot the opposition to ISO's bad behaviour, here is another
new article about IBM's action.
In a recent announcement IBM said that it would reconsider its membership in the hundreds of bodies that create global standards for everything from software to servers.
Another article says that
"IBM Nixes Standards Shenanigans" and further to the
exodus in Norway we also have
Glyn Moody's take.
A little while back I noted a provocative call from IBM for standards bodies to do better – a clear reference to the ISO's handling of OOXML. Here are some other people who are clearly very unhappy with the same: 13 members of the Norwegian technical committee that actually took part in the process.
[...]
This particular saga is only just beginning...
Feel free to pass around (or even ridicule)
those ~60 megabytes of lock-in, which Microsoft won't let you see. This probably still contains many of the known flaws, which stayed in tact awaiting and even deserving scrutiny.
⬆
Update (03/10/2008): we've just added
1081c,
1082c, and
1083c.
Update #2 (04/10/2008): this Web server sporadically goes down due to heavy load (over 10 GB of traffic today, plus lots of CPU and RAM). We've made a
mirror available, so please use it instead, if possible.
Update #3 (04/10/2008): we now have an
HTML version of the core of OOXML, but please use
this mirror (HTML), which should be faster.
Update #4 (04/10/2008): the first mirror was downed by the load (thousands of OOXML pages combined with the Slashdot effect can do that), so here is a
second mirror. If it's down as well, come back later when there's less hammering on the servers.
Update #5 (04/10/2008):
third mirror of the HTML version, just in case.
Update #6 (04/10/2008): here is a
mirror of the PDF (1080.pdf).
Update #7 (05/10/2008): here is a
much better HTML version of OOXML (1080). We will have another one soon, but it comprises over 11,000 files, so this may put strain on the server.
Update #8 (06/10/2008): now that the load on the server has declined somewhat (tens of gigabytes in days), we decided that it's safe to upload
this graphics-rich HTML version of 1080 (comprising over 11,000 pertinent files).
Update #9 (07/10/2008): due to
legal intimidation from ISO or its cronies, we have removed OOXML (also from the mirrors).
Comments
Mike Brown
2008-10-03 05:01:05
"For legacy reasons, an implementation using the 1900 backward compatibility date base system shall treat 1900 as though it was a leap year. [Note: That is, serial value 59 corresponds to February 28, and serial value 61 corresponds to March 1, the next day, allowing the (non-existent) date February 29 to have the serial value 60. end note] A consequence of this is that for dates between January 1 and February 28, WEEKDAY shall return a value for the day immediately prior to the correct day, so that the (non-existent) date February 29, 1900, has a day-of-the-week that immediately follows that of February 28, and immediately precedes that of March 1, 1900."
Really, you couldn't make this stuff up.
Roy Schestowitz
2008-10-03 07:25:18
Maybe OOXML should also explicitly state that 850 * 77.1 = 100,000.
http://www.downloadsquad.com/2007/09/25/excel-2007-cant-do-math-unless-850-77-1-100-000/
AlexH
2008-10-03 08:19:36
In the case of spreadsheet data, having an app re-interpret the data as something different is clearly, definitely, and obviously wrong. "Correctness" doesn't matter if "fixing" it actually breaks user data.
Come on, there are better criticisms of OOXML than its legacy support....
Darren
2008-10-03 10:12:32
DanielHedblom
2008-10-03 10:16:42
Thats why nobody besides Microsoft wanted this "standard" go trough the fast track process. It is/was badly broken, unspecified, impossible to implement and really a pure pile of manure.
The "standard" is just a dump of how one specific implementation of a document format works, bugs and all. Thats so wrong that its not even funny.
That the standard contains bugs and that the only halfway implementation contains piles of bugs is actually the best argument against it there could ever be.
Roy Schestowitz
2008-10-03 10:22:35
AlexH
2008-10-03 10:27:47
@Roy: it's not "apologism". Free software implements this same bug as well, because it makes spreadsheets actually work. If we broke people's spreadsheets, that would rightly make them angry.
Roy Schestowitz
2008-10-03 10:33:46
AlexH
2008-10-03 10:37:42
OpenDocument 1.2 is going to standardise the exact same bug that you deride OOXML for, and I'm sure Microsoft will somehow catch the blame for that as well. However, it's just not that simple a problem: you can't play fast and loose with people's existing spreadsheets because this is not a file format issue.
Roy Schestowitz
2008-10-03 10:49:17
Anyway, you used similar logic to justify Microsoft's disobeying of Web standards.
http://boycottnovell.com/2008/09/13/microsoft-admitted-mono-trap/#comment-24236
AlexH
2008-10-03 10:59:52
My point is you can't say "it's calculating it wrong therefore all existing spreadsheets must be wrong": many of the people who care will have adjusted for that bug already, and correcting the bug will actually silently wreck existing data.
And, no, my logic on web standards was completely different. Not least because Microsoft were following web standards, and even though I asked you many times what they should be doing, you had no answer. You like to bash them no matter what they do, which is fine, but trying to pretend like you have a good reason is a sham.
AlexH
2008-10-03 11:00:52
Roy Schestowitz
2008-10-03 11:22:07
Same with the Web by the way. Microsoft had almost a decade to fix its problems, but it didn't until it lost market share.
AlexH
2008-10-03 11:44:10
If we were talking about the ugly text runs that OOXML does, that would be one thing. But we're not talking about the file format in any way here - we're talking about user data. That's totally and utterly different, and I fail to see why you can't grasp that.
And since you brought up the web thing again, do you want to outline what action you think Microsoft should have taken? Or are you still pleading the 5th on that?
DanielHedblom
2008-10-03 11:59:18
AlexH
2008-10-03 12:05:43
The fact that I have a different opinion to other people here doesn't make me a "shill", paid or otherwise.
Roy Schestowitz
2008-10-03 12:09:45
How can one be so blind? http://boycottnovell.com/ooxml-abuse-index/
Roy Schestowitz
2008-10-03 12:11:26
That's like asking how to handle a criminal that expresses remorse. The reasonable thing to do is to jail it.
AlexH
2008-10-03 12:16:37
It has always been the same with ISO, and it will continue to be the same with ISO, because that is what ISO's members and funders want. People who think ISO is irrelevant simply don't understand what it does; it has always been this ugly.
AlexH
2008-10-03 12:20:09
No, it's nothing like that. You're accusing Microsoft of working against web standards in this of vendor extensions. I've pointed out numerous times that a. it's in the standard, and b. other standards-compliant browsers do the exact same thing.
I'm not going to defend Microsoft's abysmal support for web standards, but in this instance you're simply wrong.
Roy Schestowitz
2008-10-03 12:25:20
Ha! The classic "they are as evil as us" excuse that Microsoft has mastered (against Apple, Google, IBM, etc). You're doing it again.
http://boycottnovell.com/2008/04/05/microsoft-ibm-epa-proxy/
AlexH
2008-10-03 12:40:51
I'm not excusing them or defending them, as I keep saying and I wish you'd actually listen.
I'm pointing out that it's not a surprise that they act that way based on past history, and that anyone who thought they would behave differently is being naive.
To put it simplistically, I wouldn't defend a man who beats his wife, but I wouldn't be surprised that he beat her tonight if he's beaten her every night in the past week. (Not that I am equating ISO in any way with domestic violence, which is an extremely serious subject).
It's really not that difficult to understand the difference between those two positions, particularly for someone as educated as yourself.
Roy Schestowitz
2008-10-03 12:47:00
Microsoft: We were naivé about standards. No, really!
"Microsoft was also present at IETF meetings around that time, and was enthusiasticaly gaming the system. I remember one Microsoft attorney with three assistants who were each feeding "audience" questions at the attorney's direction.
"Organizations like Sun, which ran a large standards department, were tremendously concerned with Microsoft's attempts to game the system at the time.
"Microsoft is no newcomer to the standards business. Protests otherwise on their behalf are insincere."
http://technocrat.net/d/2008/6/23/44269
To say that the system was always dysfunctional is a self-serving stretch. Mind you, it was Redmond's own press that presented an interview about C++'s standardisation, which required no manipulation.
Nothing like OOXML (and Microsoft) has ever hit ISO, so let's not become revisionists.
AlexH
2008-10-03 12:53:28
I suggest you do some more research on how ISO operate, who funds them, and how they've handled software stuff in the past.
Roy Schestowitz
2008-10-03 13:15:11
Be specific.
Luc Bollen
2008-10-03 13:19:25
A good indication of this is that ODF don't have this bug specified, and OOo is perfectly able to open .xls files and store the data in ODF format. The problem should be handled in the import filter, not in the format specification.
@Roy: You only published Part 1 of the spec (document N1080). Could you also publish the other parts (documents N1081, N1082 and N1083 ?)
Roy Schestowitz
2008-10-03 13:32:32
I'll update the post in a moment.
Andy
2008-10-03 13:52:01
AlexH
2008-10-03 14:14:56
@Luc: as I said, this isn't a format issue, this is a user data issue. Indeed, ODF 1.1 and previous editions didn't even address this syntax, because it's application-specific. The problem is that you can't just "convert" user data when you convert the file format, because spreadsheet data isn't typed and you can't know which numbers to adjust.
So, ODF "doesn't have this bug" is simply untrue: it left it unspecified, and ODF apps interpret things as they like (= compatible with Excel). ODF 1.2 will standardise this bug as well, so that apps that want to behave "compatibly" can do so.
Roy Schestowitz
2008-10-03 14:24:46
http://reddevnews.com/blogs/weblog.aspx?blog=1203
"Speaking of theater, the IT industry got an eyeful when Microsoft admitted that one of its Swedish employees had offered monetary compensation to Microsoft partners in Sweden if they engaged in the proposal process and voted for the OOXML spec. Sweden invalidated its "yes" vote for OOXML and essentially abstained from the final voting.
"No surprise, broader accusations of ballot stuffing -- by way of getting dozens of companies to suddenly join the ISO voting bodies of individual nations -- abound.
"I asked Bjarne Stroustrup, the creator of the C++ programming language and a guy who has wended his way through the ISO ratification maze a few times himself, if he's ever seen this kind of chicanery in previous ISO votes.
""I have never heard of money changing hands in exchange for votes or anything equivalent," Stroustrup writes back. "I guess every process is vulnerable to political and economic pressures, but I have not personally seen or suspected anything like that in relation to C++.""
Luc Bollen
2008-10-03 14:27:02
"Doesn't mandate mistakes. Just because one program gets something wrong doesn't mean that everyone should make the same mistake. The specification is carefully written to not require certain bugs, just because someone has a bug. For example, Excel incorrectly believes that 1900 was a leap year, and at least draft version 1.3 of the Excel specification claims that compatible applications must make the same mistake. Nonsense. Instead, OpenDocument wisely stores dates as dates (not just numbers), and thus does not require that applications have this bug. The Excel specification also requires that applications cannot be more capable than Excel (it doesn't permit support for dates before 1900). Again, nonsense. In fact, at least one OpenDocument spreadsheet application (OpenOffice.org Calc) can correctly calculate dates and date differences going back to 1583! Similarly, many applications handle complex numbers in a very clumsy way; we've devised the specification to make sure that future applications can support better approaches, instead of tying their hands to a technique known to be poor."
AlexH
2008-10-03 14:28:41
My statement was that Microsoft have a long-standing and deep involvement in ISO. That statement is correct, your hand-waving notwithstanding.
AlexH
2008-10-03 14:36:03
If it doesn't implement that bug, days are off by one. Great.
So, yes, it doesn't mandate, because the default formulas are typed. That's great for new data. It doesn't work for imported data, and that's why they're also standardising that bug in the specification.
Roy Schestowitz
2008-10-03 14:40:10
Luc Bollen
2008-10-03 14:41:56
"Implementations of formulas in an OpenDocument file shall use the epoch specified in the table-null-date attribute of the element, and shall support at least the following epoch values: 1899-12-30, 1900-01-01, and 1904-01-01.
Many applications cannot handle Date values before January 1, 1900. Some applications can handle dates for the years 1900 and on, but include a known defect: they incorrectly presume that 1900 was a leap year (1900 was not a leap year). Applications may reproduce the 1900-as-leap-year bug for compatibility purposes, but should not. Portable documents shall not include date calculations that require the incorrect assumption that 1900 was a leap year. Portable documents shall not assume that negative date values are impossible (many implementations use negative dates to represent dates before the epoch). Portable documents should use the epoch date 1899-12-30 to compensate for serial numbers originating from applications that include a 1900-02-29 leap day in their calculations."
I think we are far from "ODF 1.2 will standardise this bug as well".
AlexH
2008-10-03 14:43:43
I already explained my position to you in very simple terms. I haven't defended the OOXML "scandals", nor have I defended ISO or Microsoft.
So please retract that comment.
AlexH
2008-10-03 14:45:35
Well, you already quoted the relevant text:
"Applications may reproduce the 1900-as-leap-year bug for compatibility purposes, but should not."
That standardises the bug, because it puts that behaviour in the standard.
No-one likes that behaviour, but it is important that it is in the standard, because you cannot convert legacy data correctly without it.
Luc Bollen
2008-10-03 14:54:42
No. The behaviour is not SPECIFIED in the standard. The standard simply acknowledges that applications may implement the bug.
And it is clear that OpenFormula doesn't standardise application behaviour, but only data format.
AlexH
2008-10-03 15:06:43
The first epoch takes into account the leap year bug on PCs (and is the default in OOo 3), at the cost of incorrectly importing data referring to the first few months of 1900, and the last epoch is the Mac bug.
Roy Schestowitz
2008-10-03 15:14:09
My statement stands. Moreover, not necessarily based on just this discussion in isolation, your claims/insinuation that nothing was amiss is defence of Microsoft, OOXML, and ISO.
AlexH
2008-10-03 15:16:47
I think it's sad that you make idle accusations knowing you have no evidence.
Luc Bollen
2008-10-03 15:22:20
I agree with you: the standardised approach "incorrectly" implement the bug. In fact, it recommends a "best effort" approach.
So I maintain that the bug is not standardised in ODF 1.2, and I'm happy to close here our discussion about the "1900 bug", as you implicitly recognised you were wrong in your first statement.
However, could you explain what you mean by the "Mac bug" ???
Roy Schestowitz
2008-10-03 15:22:29
Specifically, you claimed that Microsoft just had more friends than IBM, or something along those lines. You always underplay the abuses, which sometimes leads me to suspecting you're one of these FOSS people who were hired by Microsoft (we have them in the IRC channel).
AlexH
2008-10-03 15:32:20
I think you misunderstand. 1.2 very much says that you can implement the bug. The 1899 "best effort" approach means that you can apply that bug to those dates in the small affected range, as the standard says applications may do - that's the same behaviour as Excel. So my first statement was in fact correct.
@Roy:
If you're not willing to defend accusations, then you shouldn't make them in the first place. I don't need to go into the reasons why that is morally wrong. I'm not going to address the rest of your pathetic insinuations, though.
Just to remind you, what I said about the BSI was that they were perfectly entitled to take the decision that they took, and that the legal challenge would go nowhere. And that's what happened: it didn't "lack funding to be concluded", it fell flat at the first hurdle and no-one was willing to spend more money on a goose chase.
The point remains the ISO's members - the nations - can take decisions on any basis they like. We might not like the conclusions that they arrive at, but they're entitled to make those decisions.
That's not a defence of them, it's a statement of fact. Let me put it in terms you might understand: are many people happy that Bush was elected in the US? And, did the electors in the US have the right to elect him?
Saying that they had the right to elect him doesn't mean that whatever happened in Florida was defensible.
Roy Schestowitz
2008-10-03 15:48:07
Luc Bollen
2008-10-03 15:54:53
ODF 1.2 very much says that you can implement the bug, BUT SHOULD NOT.
If you want to consider this as being a standardisation of the bug, I'm afraid you are as stubborn as Roy, who makes far reaching conclusions from what you've said. ;-)
AlexH
2008-10-03 16:01:14
All I said was that they have the right to make that decision.
Take Norway for an example, then. They dismissed the technical committee, and made a non-technical decision.
It wasn't exactly democracy in action. In that case it seems the org decided that it was more important to bring the standard into ISO than for the standard to be debugged.
It's obviously wrong if you think the decision should be made on technical grounds alone.
AlexH
2008-10-03 16:03:37
Sure, it says should not. But, it's still in the standard, so it's standardised.
Having buggy behaviour standardised is important. You don't want to copy it, but you do want to understand it so that when you do things like import spreadsheets, they continue to work and get the right results.
Most of the ODF apps have implemented all this stuff already anyway, because if you're not Excel compatible then you're not usable.
Roy Schestowitz
2008-10-03 16:04:51
That's just a convenient waiver for you, is it not? Like other technique that include casting "ODF" as "IBM" or "it's just as bad/evil as X".
I'm not buying it.
Roy Schestowitz
2008-10-03 16:06:35
And again... it sound like Redmond Kool-Aid. You're behaving as though it's better to mimic Microsoft.
Luc Bollen
2008-10-03 16:09:03
It's not standardised, it is documented. Having buggy behaviour DOCUMENTED is important.
AlexH
2008-10-03 16:21:17
Yet again you make that accusation, yet again it's absolutely indefensible.
I'm not going to bother to explain the argument further, because you're just going to accuse me of that nonsense yet again, and I can't be bothered. Your style of straw man arguments is boring. Argue the points I make, not the ones I don't make.
@Luc: if it goes into a standard, it's standardised unless it's in a section marked informative.
I'm not sure why there is so much back and forth on this; OpenDocument is clear on this issue. This behaviour is allowed and standardised, because it's a real issue which affects spreadsheet users.
As I said way up there ^^, there are much better reasons to be against OOXML than the bits which make dealing with legacy data possible.
enquiring minds want to know
2008-10-04 03:09:20
jcwarrio0866
2008-10-04 04:16:26
I’m not sure why there is so much back and forth on this; OpenDocument is clear on this issue. This behaviour is allowed and standardised, because it’s a real issue which affects spreadsheet users.
Actually, the behavior you mention is NOT allowed.
A committee member
2008-10-04 06:26:49
I've done a diff of hexdump outputs, which shows that a block of 65536 consecutive bytes has been zeroed.
Roy Schestowitz
2008-10-04 07:41:45
I've just re-uploaded the file. It seems identical to what it was before, at least in terms of size. Since it comes from the source, it can't have been tempered.
Pedro Gimeno
2008-10-04 10:32:59
balloonsHotAir_bottomRight.png balloonsHotAir_left.png balloonsHotAir_right.png balloonsHotAir_top.png balloonsHotAir_topLeft.png
Roy Schestowitz
2008-10-04 11:04:47
Dan O'Brian
2008-10-04 12:15:03
A committee member
2008-10-04 13:26:46
Roy Schestowitz
2008-10-04 13:57:26
That's a fair point that I agree with. Just to shed light on this, I have no doubt that the files have not been tempered because they were obtained directly from the source (twice even). It is possible that the discrepancy you claim to be aware of occurred somewhere along a different route. I have no explanation for it, I'm afraid.
A committee member
2008-10-04 15:54:02
@Pedro: Your list of filenames is exactly correct. My earlier assertion about only one file being affected was wrong.
Michael J
2008-10-04 16:48:17
When composing a spec, most writers use the wording from RFC 2119. The word "Shall" is used to indicate a requirement while "Should" indicates a recommendation.
So the ODF standard quoted probably means to recommend against an application implementing the Excel bug, but not to forbid it. (If the ODF spec's authors are using RFC 2119[1], they will certainly say so in the spec).
The quote from the standard *does* say that "Portable documents" "shall not" require the Excel bug, so I would guess that you could say that the ODF spec (as quoted) *permits* applications to maintain the Excel bug, so long as they don't describe the files as "portable".
The OXML[2] spec, however, seems to *require* that apps maintain the Excel bug. That is somewhat different from permitting it.
So I suggest that the ODF committee's actions do not act as any justification for the ISO's in this case.
But what would I know? I'm just a humble[3] programmer.
[1] http://www.ietf.org/rfc/rfc2119.txt [2] They stopped calling it "OOXML" some time ago. [3] http://en.wikipedia.org/wiki/Uriah_Heep_(David_Copperfield)
Jose_X
2008-10-04 19:28:12
I know you can go back and forth between dates and number rendering, but something has to cue in that this is now meant as a date or else there is no reason to have it be rendered as a date upon opening a sheet (conversely, a similar argument could apply for numbers if dates are preferred). I really don't think when people pass spreadsheets around that numbers and dates are randomly flipped arbitrarily.
Perhaps there is a type and Microsoft does not want to reveal how it is stored. Maybe there is a type and OO.o gets it right.
BTW, if dates are typed, then as mentioned above, they can be converted and it would make no sense to keep the broken legacy leap year rules in the format.
People, as for what ODF says, ODF is not perfect. It does seem from what has been quoted here that 1.2 will allow for the backwards mistake.
Beware of Microsoft within OASIS. They gain if they can get bad decisions to be standardized because then OOXML cannot be singled out as broken. Expect that and more from them because they really hurt if OOXML is not adopted and found legit by a significant number of users. If the backwards thing doesn't have a good reason for staying (this would be true IMO if dates are typed), then I would suggest that a bad leap year interpretation not be allowed in the std period.
We can petition to the TC list. Is this issue something that is worth harassing them over?
Roy Schestowitz
2008-10-04 19:48:34
To all Participants:
The 90-day period for this discussion list has now ended. A charter has been submitted and can be seen at http://lists.oasis-open.org/archives/tc-announce/200808/msg00009.html. Your participation has been greatly appreciated; we at OASIS hope that all individuals interested in furthering this work will join the technical committee.
Regards,
Mary
___________________________________________________________
Mary P McRae
Director, Technical Committee Administration
OASIS: Advancing open standards for the information society
email: mary.mcrae@oasis-open.org
web: www.oasis-open.org
phone: 1.603.232.9090
Join us at the OASIS Forum on Security
30 Sept - 3 Oct, near London
http://events.oasis-open.org/home/forum/2008
I'm the only person left in the #OIIC IRC channel (except the channel guard, which is a bot).
What bothers me is that nobody has really responded to this yet:
http://www.heise-online.co.uk/open/Is-Microsoft-trying-to-take-control-of-ODF--/news/111649
We can probably wait patiently to see how ODFers react, but failure to respond would seem fishy.
AlexH
2008-10-04 22:02:54
You can't use stylistic information as a cue because a. not everything used in the calculation may be so styled, and b. the calculation may use relative dates.
@Roy: the OIIC discussion forum was limited to 90 days from the start. It was never, ever going to be an ongoing forum. I'm happy to answer your questions on why that is if you have any.
jcwarrior0866
2008-10-04 23:38:32
Hello Dan. I think you've rushed to the conclusion that this is the first time I read a spec. I do not think it's important to clarify this in particular because this conversation is not about me.
Let me quote what you mentioned earlier:
Well Dan, I disagree. In no way the SHALL and SHALL NOT verbal forms are recommendations or warnings. They indicate *requirement* instead. Take a look at the OpenFormula spec:
I can also bring here what Annex H of [ISO/IEC Directives] (part 2) mention about this verbal forms:
Verbal form: shall Equivalent expressions for use in exceptional cases (see 6.6.1.3): is to, is required to, it is required that, has to, only ... is permitted, it is necessary.
Verbal form: shall not Equivalent expressions for use in exceptional cases (see 6.6.1.3): is not allowed [permitted] [acceptable] [permissible], is required to be not, is required that ... be not, is not to be.
Annex H of [ISO/IEC Directives] (part 2) also mention the meaning of SHOULD and SHOULD NOT, but I am not going to put them in my comment.
Best regards.
Johan Krüger-Haglert
2008-10-04 23:49:14
John Hardin
2008-10-04 23:59:14
http://boycottnovell.com.nyud.net:8080/forms/ooxml/1080.pdf http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-WordprocessingMLArtBorders.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-SpreadsheetMLStyles.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-DrawingMLGeometries.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-RELAXNG-Strict.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/OfficeOpenXML-XMLSchema-Strict.zip http://boycottnovell.com.nyud.net:8080/forms/ooxml/1081c/1081c.htm http://boycottnovell.com.nyud.net:8080/forms/ooxml/1082c/1082c.htm http://boycottnovell.com.nyud.net:8080/forms/ooxml/1083c/1083c.htm http://boycottnovell.com.nyud.net:8080/forms/ooxml/1080-html/
I don't understand why people don't post CORAL links when they *know* they're going to get slashdotted out of existence...
Dan O'Brian
2008-10-05 00:01:25
Note that it says "Portable". That says nothing of preexisting/imported documents.
Roy Schestowitz
2008-10-05 00:06:20
PaulS
2008-10-05 00:43:43
"It has always been the same with ISO, and it will continue to be the same with ISO, because that is what ISO’s members and funders want. People who think ISO is irrelevant simply don’t understand what it does; it has always been this ugly. "
As someone who has been involved in several standards committees (include some involvement with ISO), I can say that, while members do work to support the interests of the companies they represent, the level of shinanigans in SC-34 is orders of magnitude beyond anything I've ever seen or heard of.
Dan O'Brian
2008-10-05 00:45:25
Roy Schestowitz
2008-10-05 00:47:07
standardize this
2008-10-05 00:57:09
Marius
2008-10-05 00:57:40
I've exported the PDF document in a series of PNG images and created an index for them, so users can access it just like the HTML version you have.
There are several advantages to this:
1. the page layout is preserved and the information is easier to follow 2. readers only download 3-15KB (one image) at a time, not the full 40-60MB 3. you don't waste so much bandwidth with that very large document
The only downside is that readers can't use copy and paste to extract information but they might as well download the full PDF file then.
If you wish, the blog readers can use the following link to view the document:
http://www.definethis.org/temp/ooxml/
or you can download a copy (http://www.definethis.org/temp/ooxml/1080.rar - ~160MB) and extract it on your server.
I'll leave it on my server for a few weeks but I won't be able to keep it there forever.
Roy Schestowitz
2008-10-05 01:02:16
Jose_X
2008-10-05 01:50:39
Thanks AlexH. I'm not trying to make your life difficult, but I still don't quite follow what you meant by (a) or (b). Could you provide a rough example? It need not be legal syntax but enough to convey the idea.
Here is the wall before me. Something has to cue in the renderer that we have a number that is a date and not something else. Why isn't this good enough as the type information?
If the formatting cue and overall context wasn't good enough, how would the renderer know to format that specific number specifically as a date without messing up anything else? So we have this specific number precisely being identified as a date. If that information isn't a type definition, what is?
It's not clear to me that I am covering everything, being precise, or even making sense. If I had more experience here, I would be better able to judge. Still, I don't see it. I may have to dig into the specs to get to the bottom of this (or read something online that is clear and save myself the effort).
Dan O'Brian
2008-10-05 02:58:14
I'm not sure if AlexH can provide an example or not, but I would guess that the OOo and Gnumeric developers could.
If I wanted to know the information you are after, those are the people I'd be asking.
Jose_X
2008-10-05 03:16:33
Short of sitting down with ODF or OOXML (no) and putting all the pieces on the table to look at them carefully, I would probably get the fastest insight by directly asking those guys you mentioned.
Anyway, AlexH had mentioned that typing wasn't involved. That would explain why you'd want to keep legacy, but I don't then understand how the proper thing could be rendered from a common old (untyped) number. If typing info is available, then it would make no sense to keep the error in a standard format. That would make the format (even the quasi exception being suggested for ODF) problematic and distasteful without reason.
I have not looked at this too carefully, or I would say so. That's why I think a few examples with specifics might quickly clarify things for me. Also, I got interested in the conversation but otherwise am not that motivated right now to follow up on this.
AlexH
2008-10-05 10:03:39
I'll try to explain as best I can. One thing you might want to do is look at the OpenFormula spec, which for the first time does actually include typed information.
You're right in that the formatting cue will enable you to see information which is being treated as a date. The problem is that not all that information will be formatted like that.
But there is no such thing as a 'date' in legacy spreadsheets: all you have is numbers which are being treated as an offset from an epoch. Some of those offsets will be "dates", and some will just be offsets: e.g., what is the number 5? Does it refer to 5th January 1900 (or 4th)? Or are we using that to say "5 days from now"?
Even worse, many spreadsheet users will calculate things based on references to other spreadsheets - e.g., having a master sales sheet, and then various report sheets. In that instance, you can't even see the other data unless you're in the "top" spreadsheet. If you rely on the stored values in the sheet you opened, you have again no idea what those values actually represented on the other sheet.
Yfrwlf
2008-10-05 16:20:32
A program like OOo can interpret an ODF document in one of two ways, it can either read the document via "the buggy way" or "the non-buggy way", but it can only do one. If the ODF format allows for either way to be used, then the readers like OOo and others could read the document correctly, or incorrectly, and it is technically impossible for these programs to always read the document correctly, all because the document standard hasn't specified which method it prefers?
If that's correct, then of course ODF is a broken format in that regard, however that depends on how broken it is in "the wild", and you'd think that there would be something you could do to correct it, some way of fixing any older documents simply by having a converter which upconverts them to a newer standard format which does away with the bug entirely without breaking anything for anyone. Formats should tie up any loose ends, whatever it takes in order to allow readers to always read the format correctly. I thought this was the problem with OOXML, as it included certain things which would allow an OOXML document to be interpreted in two different ways, and in order to do it the correct way it required the use of proprietary software that wasn't available for all platforms/users/etc and was basically controlled. Obviously a controlled standard like that isn't a true open standard, and obviously an "internally used" or "controlled" standard isn't a standard.
Any way, I hope all formats can be made better, but obviously the ISO should never accept proprietary or borked formats as being standards. It's obvious to anyone who knows Microsoft well that this move was simply to E.E.E. the office document format to prevent competition, when the horrible (bad for them) truth is they are going to have to start competing (good for consumers) without pulling backstabbing unlawful business tactics.
James
2008-10-05 16:27:04
Maybe you're the AlexH from www.contoso.com? http://en.wikipedia.org/wiki/Contoso
http://center.spoke.com/info/pDJMWq/AlexHankin
Alex Hankin Contoso, Ltd. Senior Director New York, NY
Skype: AlexH Home IM: Alex@hotmail.com Home Email: Alex@hotmail.com Work Email: alexhankin@contoso.com Work IM: alexhankin@contoso.com
Telex: 781 234 Home (208) 555-5656 Mobile: (775) 551-2345 Fax: (207) 555-9999 Direct: (207) 555-1112 Tel: (207) 555-1000
Here:
http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Apt-BR%3Aofficial&hs=VHr&q=microsoft+Alex+Hankin+alexh&btnG=Search
AlexH
2008-10-05 16:27:33
Indeed, in ODF 1.1 the formula stuff isn't specified at all: it was deemed out of scope for the standard.
The issue is very much "data in the wild" though. If you open a file in an older format, or cut and paste from one, or link to one, or otherwise get the data from elsewhere, then it's a problem.
Yfrwlf
2008-10-05 17:00:17
Regardless, all I know is that a properly done document format will be able to account for all data correctly, so that if a program implements the format how it's supposed to be implemented, all data will be correct. If it's impossible or difficult for the program to read or write certain kinds of data correctly due to a lack of specification by the format, it's the format's job to implement additional standards to allow for correct interpretation, or aid in that process to allow for greater format uptake in the various office programs which exist today.
Fleep
2008-10-05 18:11:15
Jose_X
2008-10-05 19:20:32
You have some code out there. The code uses a broken algorithm for turning that number into a date.
Is this a reason to break ODF or any new format?
No, it is not.
Just keep the legacy documents as is (eg, keep as is the text file with the "5" on line 27 offset 12).
If you change formats for that file (eg, to ODF 1.2), the old code is not likely to work anyway. If you change formats for that file, you'll need new code anyway. Why make a broken format to then have to create new code that is also broken?
AlexH, if you try to be specific maybe you will be able to convince people here because it just doesn't make sense that the old mistakes "need" to be carried forward. If so, we'd still be using cavemen data formats and no new code would ever be written (eg, no converters or even new code to replace the old code).
The main reason I can see to keep things as is is as yet another way to help out Microsoft's vast investments in this brokenness. If things change, new players will be on a similar footing (wrt to date interpretation) as Microsoft.
It makes sense to fix past mistakes. In a competitive marketplace, the old garbage instituted by a particular vendor would not carry forward.
Jose_X
2008-10-05 19:21:50
Jose_X
2008-10-05 19:28:12
Another reason to keep the brokenness would be to allow (eg) Novell to maintain their special advantages if Novell also has a bunch of investments in re-implementations of this brokenness or know that this brokenness will somehow give them an advantage (eg, if Microsoft stays on top, Novell's existing income stream might be more likely to stay in tact).
Roy Schestowitz
2008-10-05 19:38:05
All the 'weird' stuff in OOXML serves Microsoft. The more bizarre the format, the less manageable it is for competitors.
This conversation got latched onto one particular flaws among much more serious ones, which is a shame. Shouldn't we discuss what Microsoft put in a separate 'baskets' and all those Windows-only 'features' and 'loopholes' of OOXML?
AlexH
2008-10-05 20:10:45
When you're converting a file format, you have to re-use the existing data, yes?
What I'm trying to get across to you is that there is no way to tell whether a given number in the old data needs to be changed, because there isn't enough information to be able to do that. The "fix" is basically to decrement a number by one; but you have no idea which numbers need to be changed.
Adjusting user data on import is an extremely dodgy practice in general; you have to be absolutely 100% sure you're getting it right.
AlexH
2008-10-05 20:15:45
Roy Schestowitz
2008-10-05 20:31:43
Jose_X
2008-10-05 20:36:56
Let's mention some more things of interest that are demonstrated well through this simple date example.
I think this example helps demonstrate that there are many types of data that are interrelated. Eg, the date numerical representation ..is related to.. the type attributes identifying that number as a date convertable using algorithm X ..is related to ....
Microsoft's extensive closed source (still ongoing) history and investments means that the pertinent data for proper interpretation of any other data is spread across the entire of their product line.
A format brought up by people working in the open is likely to be much better than something that got cooked up based on this closed stew. When diverse groups openly try to agree on stds, they are led to formats that work well among diverse groups. One such item is that related data should be accounted for somewhere centrally.
No doubt Microsoft keeps tabs on their data centrally, but they don't reveal this within the OOXML format they make public. OOXML is a piece to a complicated puzzle. This piece is missing key info for interworking with the rest of Microsoft's software. The crucial bits of data are scattered all over the place and they are only opening some portions. Of course, they can open up whatever they want and then create new bits that they keep close.
Don't expect change from them as long as they have closed source and interlocking monopolies -- lack of checks and balances: no real penalty for changing; HUGE existing investments: in a Gordian Knot body of source code, in the Microsoft Way Mindsets, in existing contracts made valuable by their unique position; HUGE business reasons for preserving the existing frameworks and methods: so that powerful business levers don't disappear, so that they can be (very) cash positive and subsidize businesses they need/want to control but in which they currently aren't competitive.... The lists go on and on.
Microsoft can't afford to be broken up in a way where important bits of the code end up in different companies. That would not only initially lead to chaos, but long term they lose their advantages if they can't keep closed source the secret info about many product interactions (the source code itself implies some of this secret info) interspersed across these product lines. If you have different companies, who would hold the central knowledge and who would ensure this would stay in sync with the evolving products of the now distinct companies?
Because of this, the likely result leading up to the breakup would be a reshuffling internally so that one company would get the real goods. This would allow that one company to eventually take over where Microsoft currently sits. UNLESS you prohibited these new companies from building products to service both sides of the interfaces. The problem here is that what constitutes an interface?
I think that the idea of having an evolving closed source OS API makes no sense from a fair competition point of view. In fact, closed source and competition are incompatible items. Closed source implies monopolies. The OS is simply the most important software component on a device. And software, traditionally, is the much more powerful way to implement rapid changes that do lead to losing interop assuming interop existing the second prior to the new change.
The only advice I can currently offer generally to users is to avoid closed source.
And developers that want to produce competitive code should also stick to open source environments and libraries (the assumption is that money would be made other than through the powerful lock-in exclusivity of closed source).
Dan O'Brian
2008-10-05 20:43:01
ODF needs the same workaround for the same reason.
Jose_X
2008-10-05 20:55:18
Ie, the info needed to know which alg to use is available. Data in old formats use the broken algs and data in new formats use the good alg. And you can use software converters to convert from one format to the other (statically once and for all or dynamically as the various formats are encountered).
Again, you are not giving examples where this could not be done or would be foolish to try it. Your vague argument generalizes to "we should keep the formats we had back in 1940 so that we don't have problems moving forward."
Yeah, maybe we should have kept the year 2000 bug as well.
Yfrwlf (as I read the replies) was assuming pretty much this and then adding that the point is to make *specific* which alg to use in the new formats.
Using the old rules (as OOXML does) is foolish. Leaving it up in the air (ODF 1.2 might do this in part) will just lead to excusable incompatibilities.
Microsoft needs formats that are underspecified (plus broken in as many ways as possible) in order to allow monopoly backed lock-in secrets to exist in an excusable manner.
"Hey, the std was not specified precisely so we picked...."
The excuses are probably primarily meant to keep them safe in court actions from the government.
AlexH
2008-10-05 21:01:12
It's not that the application can't use the right algorithm. It's that when you open the old data, you cannot adjust it so that it is "correct".
So the "just use software converters" argument simply doesn't hold: you cannot do it. The spreadsheet doesn't hold enough information to know which data needs to be corrected and which data doesn't.
You can save the data in the new format, with the new algorithms, but it doesn't help. Unless you can correct the data, it's wrong. And you can't correct the data because you don't know which data is dates, which is date offsets, and which is just numbers.
AlexH
2008-10-05 21:04:33
Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it's set to 1931-2030 or something, so you know that '98' == '1998', '31' == '1931', etc.
It's exactly like that. Unless you know what the "range" is to begin with, you cannot hope to convert the data accurately, because you're missing enough information.
Roy Schestowitz
2008-10-05 21:33:03
Jose_X
2008-10-05 22:00:44
AlexH, we can't automate something across the board 100.00% when that information is not known in one place 100.00% of the time. The place to address the date issue is where it is known that something is a date. Where this info is not captured in the same file as the data, the conversion can be done with the apps or library calls that interpret the particular numbers as dates or by users doing the work manually when they identify something should be a date but it is not (with the help of existing handy filters ready to come to the rescue).
In any of these cases, changing to a *new* format with *new* semantics means that something of this nature *has to be done* anyway.
[In the case of Excel date-formatted items, the process can be automated because that date formatting info is included in the same file as the date integer. However, there are spreadsheets generated in ways were that data need not be maintained. It's for these cases that we are talking about.]
If you don't want to deal with any manual process in a particular case and be willing to keep all the bugs of the past then you use a legacy app or app mode that works as legacy. But then you can't convert to the new format that has new semantics from an app that doesn't have access to the type info. You can't convert to OOXML, ODF, or anything else that might have new semantics.. unless you want to do some hand tweaking. To convert to a new format with new semantics that were previously kept in an ad hoc way, you have to add a bit of manualness to the process.
AlexH, can you give any example at all where this would not be a manageable situation? Remember that you can keep the old data in the old formats read by the old applications.
The ODF should use the proper date semantics. If something in some old file somewhere is not known to be a date based on info inherent in that file, then you can't save it in ODF anyway where that unknown date maps to a date. You would just get a number. This has nothing to do with the new format. It has to do with the app making the conversion not having access to the missing info. This would not be the case for Excel files where the dates were formatted to look like dates, but it might be the case where a text file is interpreted oddly by some app X. In which case, the conversion should be handled by an upgrade to that app X.
... [thinking I better repeat some of this again]...
In other words (darn this is tiring), you can't take a number that is not known to be a date and make it a date automatically without error. This has nothing to do with the new format. It has everything to do with requiring access to the missing semantic info. If you have access, you can do it. If you don't have access, you can't do it no matter what OOXML or ODF says. It would just be a number.
And once again: in the case of Excel files with numbers formatted as dates, that info IS AVAILABLE* so there is no problem for this common case. [*: it's available subject to the proper reverse engineering of the old MS binary formats.. of course the EU could force Microsoft to reveal this info so if they don't they would be in violation.. in any case, there is no excuse.]
AlexH, you have not shown a single example, and you are mixing issues. Some might even say you are using FUD to give the impression that the task is unmanageable. If it's unmanageable, you should be able to give many examples instead of 0 examples. Please give examples, AlexH. [Ed- note comment at top.]
Dan O'Brian, that Microsoft kept the Lotus bugs doesn't justify that as a good decision. Let's give reasons other than to say that X person did Y so therefor Y must be good.
[I am not trying to be verbally abusive, but I don't like to see anyone defending Microsoft or their ways without reasons that pass muster. If you want to defend Microsoft, come with real reasons or expect to anger a lot of people.]
Jose_X
2008-10-05 22:17:29
AlexH, this is the sort of thing I mentioned in the last comment. If you know the info (ie, that X is to be interpreted as Y), then you know the info. If you don't, then you don't.
If you do, then you can perform the conversion. If you don't, then you can't.
In any case, if you have a new format with a previously non-existent semantics/type named "date", then you can't have things magically appear as dates, no matter the specific semantics/algorithm, unless the converter has info of what were dates in the old formats. If you do, then you can make the conversion. If you don't for whatever reason, then the mapping will be to a common old integer just as before and not to the new date type.
In the case you gave, the conversion can be done if that semantic bit can be deduced from the data in the file. Otherwise, the conversion should be done by whatever entity knows that we are dealing with a date. Otherwise, it can't be done no matter how OOXML or ODF define the new date data type.
To repeat from earlier comments, if you have an Excel number formatted as a date, then that info can be deduced from the binary files and that formatting knowledge could be designated to map into the new "date" tags of the new format. Here, you knew that the old number is a date that would need to be adjusted to match the semantics of the new tags.
The exact semantics aren't the stumbling block so long as they are well-defined (as Yfrwlf mentioned). What is important for designing the new semantics is that the semantics be well-defined and as "sane" as possible. What is important to acquire the ability for old data to be used with the new tags is that the semantics for the old data be fully known. These two issues are distinct. Microsoft themselves cannot make the right number into a date for OOXML unless they already know they have a date. If they don't, they must keep it as an integer. If they do, then they can convert to the proper definition no matter what the definition is: using the correct alg or using the broken alg.
Jose_X
2008-10-05 22:27:07
Dan O'Brian
2008-10-05 22:44:58
I don't have a copy of Microsoft Office handy, so I can't check their configuration UI, but in OpenOffice you can find the config setting under Tools / Options / OpenOffice.org Calc / Calculate / Date
Depending on which of the 3 radio buttons you select (12/30/1899 [default], 01/01/1900 (StarCalc 1.0), or 01/01/1904) the spreadsheet interprets the data in a different way.
Now, if someone using Calc (using the default setting) imports a spreadsheet with a date that was created in, say, StarCalc 1.0, and then saved to ODF, where the saved ODF document forced the interpretation (as Jose is suggesting can be done) to be the 12/30/1899 epoch, then the data in the spreadsheet could very well be wrong, but the user might not notice it right away.
I think that's the problem that AlexH is trying to explain.
Jose_X
2008-10-05 22:48:49
However, if the converter can deduce more type info, then we might be able to map to a date tag or to some other tag.
And in these last cases, where we know enough to identify a (eg) date, if we can map to a tag with the broken date semantics, then we can map to the fixed semantics since this entails adjusting the values in a well-defined way. Ie, if we know to map to "date with broken algorithm" then we can map to "date with fixed algorithm" since there is a well-defined mapping to this fixed algorithm.
However, in other cases, the mappings may not be so nice. In general, we need to create good formats and fix mistakes of the past. If, as customers, we put our data into proprietary closed formats such as what Microsoft offers their customers, then we make a decision that may not be fixable short of knocking down Gates' door demanding relief.. or knocking down his Window if you want longer lasting relief.
Jose_X
2008-10-05 22:57:45
No. No. No.
When we save, we know how to adjust the number so that it maps properly to the canonical form implied by the corrected definition. The saving process knows the users config info and so can adjust into a canonical form. Then everyone else that reads this does the necessary translations to match their settings.
If we can save into X-1 then we can similarly save into X by adding +1 at the time of save. The semantics of the ODF file date tag would then let everyone know that we have X and not X-1.
ODF is tagged. The tags carry semantic information just like binary Excel files do (but in a closed proprietary way).
Jose_X
2008-10-05 23:01:21
Jose_X
2008-10-05 23:06:16
If we know this, we map to ODF correct date tag, adjusting as necessary.
If we don't know this extra date context, then we play it safe and keep the integer as an integer.
Dan O'Brian
2008-10-05 23:06:40
Dates can be saved as "1/31" (interpreted as January 31 of the current year), "1/50" (January 1st, 1950), or "39725" (the number of days since the configured epoch) and possibly other formats.
The question is, which epoch is 39725 counting from? And how do we know it's a date without more context?
Dan O'Brian
2008-10-05 23:09:48
Jose_X
2008-10-05 23:18:54
I suggest that, specifically for Excel, cells containing a simple integer and formatted as a date be mapped to ODF dates but with the correct value to match the ODF epoch.
In any case, the ODF date tag is there for the future. Existing data can be mapped to ODF integers (or strings or whatever) as they are, while new items entered under a date context can be mapped to the ODF correct formula date.
Dan O'Brian
2008-10-05 23:19:49
No, it doesn't - that's the problem. All it knows is that the field looks like a number. It doesn't necessarily know if it is a date or not.
Jose_X
2008-10-05 23:20:16
Roy Schestowitz
2008-10-05 23:23:16
As Sutor said, "OOXML is about the past and ODF is the future."
Dan O'Brian
2008-10-05 23:23:25
Roy Schestowitz
2008-10-05 23:26:18
Dan O'Brian
2008-10-05 23:26:19
Jose_X
2008-10-05 23:29:35
So then we map to an ODF number and not to an ODF date. Simple. The same would go if we had wanted to use OOXML or any other format that has a date tag. We would not map to its date tag but instead would map to the regular number tag.
However, in any particular case, the application may know that we are dealing with a date. In which case, it would be able to save to the ODF date with the proper adjustment along the way.
Say we have an Excel spreadsheet that has a date formatting for a number. Then Excel/OO.o presumably needs to use the broken formula on that number to format it properly. Fine, but what we then do is we save that number as a date but adjusted as necessary when we save to ODF. Then when we read that ODF file later, we use the proper formula on the already adjusted number. If we want to convert back to binary Excel format, we make it a number type again and adjust its value backwards. In either format, we can deal with that *known* date properly. That number was marked for life as a date through its date formatting from the original creation as data within an Excel date formatted cell.
Jose_X
2008-10-05 23:38:56
In other words, you aren't willing to provide an opinion on why ODF should be one way or the other.
Do keep in mind that there are many decisions that are taken by people not based on technical feasibility.
Dan, if you have a link to where you think competent people are having this discussion, please post it. I think I might want to get in on the act or at least hear the reasons given.
I provided feedback to the OIIC formation discussion list, but they weren't interested in covering specifics. I started on that road and was told by Mary McRae (is that how you spell it) that engaging in specifics of that nature was prohibited on that list. The specifics will be carried out in private (though joining up is allowed if you pay the $300).
I would have no problem giving a particular pov if it would help a public discussion and if I didn't have to dedicate too many resources to the task beyond the time required to do the contributed postings (the though process, etc).
Jose_X
2008-10-05 23:52:24
For the record, I'll quote here from that piece from Rob's blog:
>> The “legacy reasons” argument is entirely bogus. Microsoft could have easily have defined the XML format to require correct dates and managed the compatibility issues when loading/saving files in Excel. A file format is not required to be identical to an application's internal representation.
>> Here is how I would have done it. Define the OOXML specification to encode dates using serial numbers that respect the Gregorian leap year calculations used by 100% of the nations on the planet. Then, if Microsoft desires to maintain this bug in their product, then have Excel add 1 to every date serial number of 60 or greater when loading, and subtract 1 from every such date when saving an OOXML file. This is not rocket science. In any case, don't mandate the bug for every other processor of OOXML. And certainly don't require that every person who wants the correct day of the week in 1900 to perform an extra calculation.
Microsoft's reason for keeping things broken exist, but that doesn't mean ODF should follow their lead. Let Microsoft keep OOXML the laughing stock that it is within tech circles. Let us keep ODF sound. Or I should specify, if Microsoft messes up ISO ODF, OASIS should not follow suit.
People, open source is the key. ODF and other open standards are secondary. Standards are meant to enhance interop, but when that cannot be achieved, these standards lose their value. And interop among independent third parties within the context of a closed source monopoly dominated market is nonsensical.
Jose_X
2008-10-05 23:55:43
Roy Schestowitz
2008-10-06 00:03:26
Dan O'Brian
2008-10-06 00:59:14
I was only explaining what I thought AlexH was trying to explain (I admit to knowing very little about the internal workings of spreadsheet applications).
My position on this subject has always been that I'd leave it up to the experts.
Jose_X
2008-10-06 01:08:40
You should pay attention to arguments if you want to avoid being manipulated by the unscrupulous.
Jose_X
2008-10-06 01:10:36
Dan O'Brian
2008-10-06 01:15:38
When I say experts, I mean the experts implementing the Free Software office applications that are very unlikely to have been "bought" and/or other experts that I trust (which in this case is limited to the aforementioned group because I don't happen know any proprietary office developers).
Dan O'Brian
2008-10-06 01:20:17
I care little about the file formats and the standards committees because I have far too many other things on my plate (like products I'm responsible for), and, as I said above, I trust the people involved with OOo, Gnumeric, etc to DTRT and make my documents continue to work.
Dan O'Brian
2008-10-06 01:21:47
Jose_X
2008-10-06 01:32:12
And I don't refuse to contact anyone. Do you have contact info because I don't. What I refuse to do is waste time. Everyone has to prioritize their time.
>> When I say experts, I mean the experts implementing the Free Software office applications that are very unlikely to have been “bought” and/or other experts that I trust (which in this case is limited to the aforementioned group because I don’t happen know any proprietary office developers).
I should mention that "experts" disagree all the time.
Also, I should mention that those developing "free" office suites are sometimes (many times perhaps) paid. Their software may be "free software" as defined by the FSF, but that doesn't mean they work for free.
Finally, if you do listen to these groups, you probably want to be trying to explain to AlexH why many of these groups don't like OOXML instead of why some do.
Jose_X
2008-10-06 01:38:49
Dan O'Brian
2008-10-06 01:52:08
However, even though the people I know working on OOo are paid, I trust their honesty.
As far as AlexH, how do we know he doesn't listen to these groups?
Jose_X
2008-10-06 01:55:04
Just in case I was misunderstood, I wasn't trying to be condescending or sarcastic. I was honestly saying that we should seek advice/help from individuals/groups that we find trustworthy in order to help us manage complexity. Complexity is anything we haven't yet taken the time to figure out for ourselves. Time is a limited resource. What one day appears to be extremely complex, can later appear to be quite simple. What one can figure out, so can others. But we all have limited time. In my case, I may not be taking the time to try and understand the problem as well as I can or to present it as well as I can, but am so far willing to keep up with contrary arguments. Does someone want other/better examples from me than whatever I may have given?
As an aside, I am here partially watching the "Brad Pitt Troy movie". The scene just shown was of the Trojan king right after the Trojans defeated the Greeks who now supposedly will go back home. An argument is made to the king that attacking the Greeks by their ships would be foolish. The king ignores this because his trusted priest person says that the gods think the Greeks will be vanquished in an attack.
Funny coincidence. We have to trust someone whenever we don't dive into the details of something. Sometimes it works out and sometimes it doesn't.
Jose_X
2008-10-06 02:05:47
Or that he does. Or that I do or don't.
Should we attack the Greeks? Whose advice do we take or do we dig into the details?
Anyway, I don't worry about misunderstandings if people can/will work to fix them. More upsetting is purposeful deception. As long as we stay away from purposeful deception as much as possible everything should work itself out slowly. We all cheat here and there though. Balance is good. I have seen myself and others go overboard at times. On the surface, I think most people will expect anyone trying to defend Microsoft to come a little bit more prepared than usual, and they will be seen very critically if they don't do a convincing job.
Jose_X
2008-10-06 02:32:08
Let me add.. I don't think anyone wants to waste time.. ie, others don't want to waste time with me either. To enter into some discussions, you need to do some homework if possible. That takes time.
In any case, if anyone has a link to a related public discussion, feel free to post that info here for the benefit of all.
AlexH
2008-10-06 06:23:25
@Jose:
You said, "If we don’t know this extra date context, then we play it safe and keep the integer as an integer.".
That's precisely the situation! We don't have the extra information, so the integer stays as an integer.
However, the integer is still a buggy offset and is usually "one off" (i.e., is x+1 when the real value should be x).
So the situation is that you have to encode various schemes in order to deal with the buggy data, because you cannot convert it when you upgrade the file format.
It's really as simple as that.
Roy Schestowitz
2008-10-06 06:40:34
As I wrote earlier, this discussion was resolved before; Microsoft just didn't fix its specs though.
AlexH
2008-10-06 06:56:28
If this was so easy, no-one would bother to encode the legacy behaviour into ODF 1.2. However, it's not that easy, so that behaviour is being put into the standard.
Coping with legacy data makes ODF actually useful. If we couldn't convert old data, ODF would be a significantly harder sell. There is a big difference between legacy file formats and legacy data, which people here don't seem to understand.
Roy Schestowitz
2008-10-06 07:02:58
Roy Schestowitz
2008-10-06 07:07:26
It's actually Alex Hudson.
http://www.alexhudson.com/
AlexH
2008-10-06 07:49:28
Slightly sad that people will attempt to tie you to Microsoft for expressing opinions which don't fit with their world view. One thing I respect about Jose is that he always argues on the topic, not ad hominem.
Roy Schestowitz
2008-10-06 07:55:25
What raises this suspicion are actual past incidents. Microsoft deserves no trust anymore as was caught many times before employing forum shills and such (some examples). It continues to this date.
BTW, you have not seen that comment because it's only moments ago that I checked to see what was trapped by the automated filter.
AlexH
2008-10-06 08:14:11
I just think some people find it very easy to wonder aloud at possible connections as a way of avoiding discussion of actual issues.
Pedro Gimeno
2008-10-06 08:51:41
1. Implement a "Legacy Date" cell format. This cell format interprets a cell's number as a date with the 1900 bug for showing. Cells with date format in Excel files would be converted to "Legacy Date" when imported.
2. Implement a "LEGACYDATE()" function which accepts one argument, which converts a number into a proper date taking into account the 1900 bug. Excel formulas which have functions accepting dates as arguments would be fixed so that each argument that is accepted as a date is first passed through LEGACYDATE(). For example, WEEKDAY(a3+b3) would become WEEKDAY(LEGACYDATE(a3+b3)).
Scripts can't be supported, though. It's impossible to analyze a script and they would require manual fixing.
Of course portable documents should never use the legacy cell format or function. Documents intended to be portable should be manually transformed to get rid of the legacy bits.
Pedro Gimeno
2008-10-06 08:55:38
AlexH
2008-10-06 09:10:10
It's a nice idea, but it's not great for a couple of reasons. A big one is that you're adding this LEGACYDATE() function into existing formulas, which will confuse users who are expecting the previous formulas.
That's actually a huge issue: formulas are basically user interface, and is one reason why they are so clunky even in OpenDocument 1.2. If we were designing something from scratch right now, I don't think it would look much like the existing system, but migration is a huge problem.
The second issue is that you're dropping all this *LEGACY() stuff into the sheet, but you're getting the same effect as setting a base epoch sheet-wide.
So I agree that it could work (although it could fail if anyone has written custom functions to do date manipulation within a spreadsheet), but it's the same solution as that already proposed in OpenDocument 1.2: you put in place the facility to manage dates with legacy epochs. I would venture that the ODF solution is cleaner; you're writing the same code (changing date offsets), but putting the function call internally in the spreadsheet code rather than externally in the spreadsheet formula.
AlexH
2008-10-06 09:11:25
So my point was about numbers in cells, not formula function calls.
Ianp
2008-10-06 09:25:48
Jose_X
2008-10-06 11:57:03
Take one:
You said originally: >> The problem is that you can’t just “convert” user data when you convert the file format, because spreadsheet data isn’t typed and you can’t know which numbers to adjust.
Here is my simplified response:
The formatting or some other clues give away the intended usage of a number as a date that uses some (possibly broken) algorithm; thus, you can convert this number type value into an ODF date type value, adjusting so as to map the original value into a value that works with the correct algorithm.
There is no problem. We know we had a date. We know the formulas to use in all cases.
If no such clues can be found then don't convert. In other words, don't convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.
Again there is no problem. We simply mapped each number to itself and used an ODF number type. If the orig was not intended as a date, we correctly left it alone. If the orig was intended as a date by some other application (since the formatting wasn't done for user visual purposes), we still preserved that value. There is no problem. New applications would not treat a number as a date because it got saved under the number type not the date type.
Where is the problem? Also, if you think there is a problem, give an example.
Take two:
>> Ok, I just thought of a great example. You know that config option in OpenOffice.org, where you state what range double-digit years are in? By default it’s set to 1931-2030 or something, so you know that '98' == '1998', '31' == '1931', etc.
>> It’s exactly like that. Unless you know what the “range” is to begin with, you cannot hope to convert the data accurately, because you’re missing enough information.
In simple terms here is why this example you gave before is not a counter-example.
You are *not* missing the type information in this example. The config option is known if you convert using the app that uses that config option (presumably the same app the user would use to open the file anyway or else the user would be screwed anyway, even prior to any conversion).
The config option is known and the ODF date semantics are also known. The mapping is straightforward.
So why is there a problem here? We know we have a date. We know all the conversions necessary.
AlexH
2008-10-06 12:46:15
I've told you in many instances you can't know whether or not a certain number is a date. You can't say "don't convert", because then anything which uses the unconverted values starts spitting out the wrong answer!
Jose_X
2008-10-06 13:15:58
You didn't show any contradiction. I think you aren't understanding what I am saying. Please show the two items that contradict.
>> I’ve told you in many instances you can’t know whether or not a certain number is a date.
Of course you can know if a number is being used as a date. One way is if it is formatted as a date.
[This formatting information is found within the same file as the number in the case of Excel spreadsheets (or so is what reverse engineering or special access to Micrsoft has determined I believe.. as I think that is how OO.o interprets Excel files).]
>> You can’t say “don’t convert”, because then anything which uses the unconverted values starts spitting out the wrong answer!
What are you talking about? Can you give an example to this nonsensical statement. I must not be understanding you.
You need to give more context in your replies.
[I'm waiting any minute now for me or you to start saying "oops, my bad", but it's not happening. This is emboldening me to be more reckless to see if I go too far, but the problem is that you are not giving examples, as, in fact, you did not challenge my rebuttal of your lone example.]
Jose_X
2008-10-06 13:21:39
Let me rephrase.
>> If no such clues can be found then don’t convert. In other words, don’t convert to ODF or convert to ODF but mapping the original numerical value identically into the same numerical value of an ODF number type.
If no such clues can be found for a particular numerical value then don’t convert that value. In other words, don’t convert the original file format to ODF or do convert the original fle to ODF but mapping such a numerical value identically to itself and to an ODF number type.
Anyway, so what is the problem now, with anything of what I wrote in this reply from which you quote? http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-26078
Roy Schestowitz
2008-10-06 13:37:04
AlexH
2008-10-06 13:59:04
First, there are no "dates" - that concept doesn't exist. A spreadsheet stores numbers. Some of those numbers may be formatted as "dates", where the spreadsheet interprets them as date offsets from an epoch.
Formatting is no indicator of type. Where a value is used in a calculation, it may or may not be formatted correctly - that's up to the user.
Second, if you're altering data, you have to ensure you have correctly altered *all* instances within a spreadsheet. So, you cannot do a reverse topological sort from "date formatted" cells to try to work out what other cells contains "dates": there may be data that is in the sheet, but not currently used in a calculation. One obvious example would be the result of a VLOOKUP().
It's that simple. You cannot look at a number like "5" and say "oh, that's actually a date". There are some clues in a spreadsheet. There are not sufficient clues to know. This is why no vendor automatically converts values; if it was that easy people would do it!
AlexH
2008-10-06 14:04:01
"15","0",=A1+B1 "0","2",=A2+B2 "0","5",=A3+B3 "=WEEKDAY(OFFSET(C1,2))"
A1:B3 are formatted as "date". Should be easy, no?
Ianp
2008-10-06 14:06:48
If you encounter this situation, you've got a dead file. If a new spreadsheet application cannot find out if its operating on a number or a date then that makes the file unusable so this argument is meaningless.
Jose_X
2008-10-06 14:13:09
file nodates.old: 5 3 6
file hiddendates.old: 5 3 6
file obviousdates.old: 5 3 6 fd232
The first file is a spreadsheet that has 3 number values. These values have to do with how many oranges, pears, and watermelons we sold last week. There are no dates.
The second file has a "3" which is a date. It refers to the third day during the week. Ordinarily, we can't tell this refers to a date (let's assume we can't tell in this case unless we ask the author manually -- ie, there is no hint in the .old file that this is a date).
The third file has a date also, but this information can be deduced because the "fd232" code means that the second field (the "3") is a date interpreted according to the broken leap year formula (I made this code up but let's flow with it).
Here is how I am saying that these three files would be handled. An app that understands these .old formats would map...
.. nodates.old into an ODF file with the numbers not changing values and staying as number types within a table.
.. hiddendates.old into an ODF file with the numbers also not changing values and staying as number types within a table. Thus hiddendates.odf and nodates.odf may look essentially the same.
.. obviousdates.old into an ODF file with the "5" and "6" staying put as number types; with the "fd232" turning into whatever formatting code yields the same effect in ODF as in .old; and with the "3" being turned into the right number so that it maps properly when we use the correct date formula, and the type of the data would be date.
Now, nodates.old -> nodates.odf presented no ambiguities to the app doing the saving. There was nothing much to be done: the data looks identically as it should.
hiddendates.old -> hiddendates.old presented no ambiguities to the app doing the saving. There was nothing much to be done: the data looks identically as it should. Alex has no problem with this case because I did not translate. The data is the same. No information is lost. No new semantics are implied because I used the ODF number type which has ordinary number semantics (not date semantics) just as is found in the .old files.
obviousdates.old -> obviousdates.odf presented no ambiguities to the app doing the saving because it was able to identify the date and know the associated algorithm and it knows the algorithm associated with the ODF date type. New applications know that the ODF date type uses the correct algorithm, so no prob there. Old applications, can't even read ODF, so a helper function would need to be constructed. This helper function knows to convert ODF date values into the values used by the .old as all of the information needed is known: the semantics of the date type for ODF are known and the semantics for the .old dates are also known (otherwise we would not have translated to ODF in the first place.. but the "fd232" was assumed to give this information in total).
Note that if "fd232" did not include everything we needed (eg, the algorithm, any timezone offsets, etc) then we would have applied case 2 and simply mapped the number into an identical valued ODF number of the number type.
Roy Schestowitz
2008-10-06 14:15:56
AlexH
2008-10-06 14:39:42
The only way your "conversion" system could work in the face of that is to flag individually each cell which had been "converted", so that the stuff you couldn't convert could be later "fixed". But it's ugly, and quite rightly no-one does it.
ODF 1.2 takes the right approach by allowing variable epoch calculation. It's simple, and it works.
Jose_X
2008-10-06 14:44:12
>> First, there are no “dates” - that concept doesn’t exist.
Fine. I accepted this from the start. At least we are on the same page so far. Step 1: check.
>> Formatting is no indicator of type....
If it is formatted as a date, I suggested we do assume it is a date; otherwise, this is a bug in the original spreadsheet.
Sure, a value can dub as a date and as a password or something else. These oddball cases should be rooted out. The conversion would presumably be done by someone that has a clue over the specifics of the spreadsheet page. In any case, this odd scenario likely is not common. Also, there is no need to convert. A conservative company would start by not converting anything or converting and checking. However, it makes no sense to bind all time into the future to use the bugs of the past on account of a failure to find a simple rule that would apply 100.00% of the time.
>> ... Where a value is used in a calculation, it may or may not be formatted correctly - that’s up to the user.
Right. Any arbitrary value can be used as a date (but not indicated as such within the same file or through any clues given to the processor converting into ODF) by any arbitrary piece of code, whether that code is called a spreadsheet formula or is a utility application that resides on another file on another computer on another network.
In the absence of date formatting and any other needed information that would be needed by the given file type to suggest an unambiguous date, we don't map into the ODF date type. Instead, we map identically into the/an ODF number type.
>> Second, if you’re altering data, you have to ensure you have correctly altered *all* instances within a spreadsheet. So, you cannot do a reverse topological sort from “date formatted” cells to try to work out what other cells contains “dates”: there may be data that is in the sheet, but not currently used in a calculation. One obvious example would be the result of a VLOOKUP().
First let's start by pointing out the these scenarios may apply to some spreadsheet file types but not to others.
Now, my short answer here is that if we can't tell for sure, then as stated already, we map the numbers unchanged into ODF number types. This amounts to an identity/null conversion and is no worse than what OOXML demands.
I may try and break this down more later to analyze Excel files. Worst case, we would have all Excel file numbers map identically into ODF number types. However, the ODF date type is still there for when we know we have a date.
>> It’s that simple. You cannot look at a number like "5" and say “oh, that’s actually a date”. There are some clues in a spreadsheet. There are not sufficient clues to know. This is why no vendor automatically converts values; if it was that easy people would do it!
If we don't know, we don't adjust the values or map to ODF date types. There is no problem. This simply means we aren't trying to deduce semantics from the old format to identify candidates for the ODF date type.
These cases don't present problems.
And the cases where we do have enough info means that the converter can know if an injective mapping is possible (to guarantee that we can find the inverses uniquely or at least without problems -- depending on the particular semantics of the file format, the mapping may not even need to be injective). BTW, X+1 is essentially injective as are all linear functions (scaling and translating). http://en.wikipedia.org/wiki/Injective_function
The point though is that we would have to be sure we could undo the "damage" of conversion. If we couldn't guarantee that, then we would not attempt the mapping into the ODF date type and just stick with the number type.
Remember that we aren't just talking about Excel spreadsheets. Any arbitrary file might be mappable into ODF. ODF is a general purpose file format. It makes no sense to cripple it when all scenarios can be handled gracefully. Sure, for Excel files, maybe a crippled ODF would smell just as bad, but we don't have to accept a smelly ODF format period, as we can do better.
Jose_X
2008-10-06 14:48:31
Roy Schestowitz
2008-10-06 14:55:11
Jose_X
2008-10-06 14:59:11
The "flag" is automatic. It is called the date type. Only things converted become the date type. Again, it is automatic.
If a strange bifurcation would be needed to account for all possibilities, then we could just not convert to the date type. The date type implies "date" and nothing else. The/a number type can always be used.
I'll quote from the comment that hasn't showed up yet, >> Sure, a value can dub as a date and as a password or something else. These oddball cases should be rooted out. The conversion would presumably be done by someone that has a clue over the specifics of the spreadsheet page. In any case, this odd scenario likely is not common. Also, there is no need to convert. A conservative company would start by not converting anything or converting and checking. However, it makes no sense to bind all time into the future to use the bugs of the past on account of a failure to find a simple rule that would apply 100.00% of the time.
What I mean here is that something formatted as a date might also take on a very different role. In this case, changing that value, although correct insofar as the role of the number as a date is concerned, would lead to problems for the number's alter ego.
Remember that we can always be conservative and not convert, but this is no reason not to have a correct date type.
In fact, I don't see any argument for having an incorrect date type. If the old Excel files don't have date type information as you say, then why ever would we consider converting into a date type (except to be aggressive)? Hence date types would only be used for new data, in which case what does the legacy argument have to do with anything since legacy means not new.
Shane Coyle
2008-10-06 15:05:40
Suppose some govt office has a bunch of old spreadsheets which were saved by this buggy excel version, even suppose they still have that old 386 and the excel version running to access them and print to that ancient printer over there, once a year (if ya think its not likely, ya havent worked for the govt).
Anyhow, we decide we want to open those files in a shared folder from our shiny workstation with Office 14 or OO3 or whatever. The modern app needs to know how to tell if that file has this known bug, render the information correctly in present use, and also cannot (IMO) change the file itself by 'repairing the bug' because it would wreck the file for its native app version, which expects its 'buggy' data in order to give the expected result.
Translating is ugly, but the important thing is the data and getting it right, each time we open it. In terms of file type conversion, different issue because then you know you can safely ignore that version-specific bug and just save the correct information after you translated it in and corrected for the bug.
My wonder is more, was this bug ever fixed, or was this a case of hiding your sins in a closed source/closed format application?
Jose_X
2008-10-06 15:06:16
Alright. I have not looked at the details. I can accept an attribute that would specify the algorithm to be used for converting the dates/numbers. That is OK.
I would want sane defaults.
But despite this, OOXML's approach of forcing a twisted conversion calculation looks to be a folly. It's even sadder if we consider that Microsoft's own past formats did not type dates (Alex stated this for I would otherwise have no clue). Why, if you are only now going to add dates to the repertoire, would you want a crippled date type? All past "dates" would just map to number types as a conservative default anyway.
Well, I can hypothesize some reasons why Microsoft would do this. I was trying to imply "why would any format based on technical merits want to have a crippled date type on purpose..."
Jose_X
2008-10-06 15:17:08
Modern apps have all the information they need (barring proprietary secrets of course). If the format is X then use meaning A. If the format is Y then use meaning B. This is possible if formats X and Y existed when the app was created/updated.
You are correct that the old application won't know. But then why would you convert into ODF in the first place since the old application couldn't read ODF? Not every user would convert their old files into ODF. If you build a translator from ODF into the old format, that translator can make the adjustments as it knows the semantics it needs before and after. If it wouldn't know the semantics unambiguously (Alex gives examples where messes can occur if we try to be aggressive converters), then this info would have been known and the conversions would not have taken place in the first place (at least not without user approval).
None of this, we can see, has any implication to lead us to want to cripple the date type semantics of a new format. We are always safe by converting as is into a number type. Future creations of data as date type should be clean.
We can only imagine why OOXML would force the crippling upon us.
AlexH
2008-10-06 15:20:49
Seriously, the differences between OOXML and ODF in this area are minimal. Both have a system to deal with older integer encodings. Both have date types which do not feature this bug. Both can cope with different epochs.
Jose_X
2008-10-06 15:37:12
OK, I went back and read this link http://boycottnovell.com/2008/10/02/ooxml-leaked/#comment-25680 and the comparison appears to be against the Excel format and not against OOXML.
This link also mentioned earlier http://www.robweir.com/blog/2006/10/leap-back.html does mention OOXML; however, as you already noted, is dated two years ago.
>> Seriously, the differences between OOXML and ODF in this area are minimal. Both have a system to deal with older integer encodings. Both have date types which do not feature this bug. Both can cope with different epochs.
Are you kidding me? So we worked on a non-issue with the current OOXML and ODF? I mean, sure it was a fun mental exercise to an extent.
Past experiences suggest that just because you say "everything is fine" doesn't make it so; however, I have no other reason to complain about any specific format with what I have currently verified.
I also need to get on with some other work.
PS: Alex, thanks for the examples you eventually gave. It can be annoying to come up with them, but it helps track down where our minds are not meeting. It's still not completely clear to me were the gap existed, but I have a better idea. Of course, this potentially being a non-issue .... .. Roy, this forum is a great time sink! Thanks. Thanks a lot ;-)
Roy Schestowitz
2008-10-06 15:45:55
Well, if it's any solace, this thread/page has been viewed well over 10,000 times and this server fed almost 50 gigs so far this month (mirrors and CORAL excluded).
AlexH
2008-10-06 15:47:02
Well, I did say at the beginning it wasn't really a file format issue ;)
Both OOXML and ODF use the same format for dates - the ISO format - so it only comes down to how to import legacy data. I guess OOXML mandates .xls-compatible defaults, but in practice that's just stating the bleeding obvious... :)
Ian
2008-10-06 16:50:02
"If your data file contains enough information for an application to interpret the meaning of values (and type), then you can ‘rescue’ this data from the bug. It’s very simple, really!"
I'm always nervous with the concept of a "best guess" data conversion. If you have a spreadsheet with 3 rows and five columns, it's really not a big issue. When you have a 20 MB file with thousands of possible rows, you have to trust the computer to not screw anything up. Best guess data conversion isn't necessarily a trustworthy process, certainly something I wouldn't trust. I don't care if it's OO.org, Excel, 1-2-3, whatever.
Roy Schestowitz
2008-10-06 17:10:19
Jose_X
2008-10-06 17:41:11
Your apprehension is shared. Caution applies to any type of automated data manipulation.
Jose_X
2008-10-06 17:46:17
From http://www.oasis-open.org/specs/
We have the full standard on a single webpage. http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1-html/OpenDocument-v1.1.html [note, this is a large webpage]
AlexH
2008-10-06 18:04:13
Roy Schestowitz
2008-10-06 18:06:36
AlexH
2008-10-06 18:30:30
ISO aren't exactly the place you'd want to go for standards documents. They charge €220 for ISO 26300...
Jose_X
2008-10-06 22:21:34
tuomoks
2008-10-07 03:32:12
Now, the whole date problem! Or other such problems. Nothing new - I once had to sort out what to do in an insurance company, calculating dates 200 years back and 200 years to the future - already 152 (or was it 162?) different programs, procedures, methods, etc which gave different answers but already in many (18000+) applications and used by different databases for calculations! Talk about nightmare, actually only two gave correct answers in each case(!) and one of them was SLOW! So - I can understand the technical pain but not the politics - deal with it, it's a fact no matter what you think.
The whole problem as I see it is the current love to (new?) metalanguages. Why not then use SGML and enhance it? Clean, simple, proven, etc? Much faster by design than anything after that? Add LaTeX and simple authentication, authorization, AES encryption, etc to that, together they would support any and all requirements we can think today - even binary or interactive data would be no problem? Not invented here syndrome - one again? Besides - prove to me that metalanguages are better for computing than binary! Maybe for humans but the computers do the work and I'm not even sure of the benefits for humans - it is as easy to read hex, octal, whatever as ASCII which actually covers a very small part of all the needed (human) languages - try to read NLS supported meta sometimes - back to interpretation and translation?
Sorry, seen these fights too many times (probably?) - they have nothing to do with "a better way to solve a problem" but who's controlling, who makes the decisions, politics (and money) as usually. Nothing technically difficult but (still?) understandable, IT is a young field and going through the growing pains.
Roy Schestowitz
2008-10-07 07:10:57
tuomoks
2008-10-07 08:30:29
Roy Schestowitz
2008-10-07 09:12:37
Roy Schestowitz
2008-10-07 09:13:58
tuomoks
2008-10-07 10:11:44
"Microsoft originally developed the specification as a successor to its earlier binary and Office 2003 XML file formats. The specification was later handed over to Ecma International to be developed as the Ecma 376 standard, under the stewardship of Ecma International Technical Committee TC45. Ecma 376 was published in December 2006[9] and can be freely downloaded from Ecma International.
An amended version of the format, ISO/IEC DIS 29500 (Draft International Standard 29500), received the necessary votes for approval as an ISO/IEC Standard as the result of a JTC 1 fast tracking standardization process that concluded in April 2008. Next and last step in the standardization process is the final publication as of ISO/IEC IS 29500, Information technology – Office Open XML formats as an international standard."
Yes, of course MS used (again?) ISO/whatever to force something but this time it may not work well. Seen the rumors that MS wants to take over ODF? Good for them, bad for people who let it happen - if they do, I hope not!
Now, working in/with small and huge corporations I can tell - they think weird! Any, even a small company can participate but for some reason they just refuse and take whatever is given? Huge corporations have their own problems, often slow to react, internal fights which prevent them making decisions before too late, whatever. So, if MS can do the next "cup", start managing ODF as seems with some other OSS projects (amazingly many) - good for them and instead of complaining people should start working if they don't like it.
As I have said, I'm not a big MS fan but at least they react! In some small companies I have worked, the price of a VP lunch would have paid one year in standards committee, guess which one they select - they just keep complaining instead of making their own future! A weird world we have!
Jeetje
2008-10-07 13:25:30
The benefits of option 2 are manifold: a) New implementations of the correct spec aren't burdened with the obligation to account for all possible faults, hence the resulting software will be small and fast. b) Converters from old an old, faulty spec to the new, correct spec can be implemented separately, allowing for bulk conversion of old documents into new, cleaned up documents.
The biggest downside: The original manufacturer of the faulty software (based on its own faulty specs) is caught with his pants down and may very well lose a lot of business to people who ARE capable of keeping data accessible for decades to come.
Basically, MS used a process akin to ISO 9000 series certification in the most perverse way possible, asking ISO to confirm the way data has been handled since the inception of Word / Excel aso is compliant with the spec they have now drawn up. From a business point of view, ISO had very little choice but to agree the data is compliant to specs, whereas from a technical PoV they should have rejected the whole spec as being a waste of the trees used to produce the paper it was printed on.
The right way forward is saying byebye to all the errors MS ever made in storing our data, the only corporation that is able to help us do that is MS itself, and if they don't help us out quickly they may very well help themselves out of business pretty fast (considering how fast we are approaching a big recession, as MS Office still is a pretty poor value-for-money proposition).
AlexH
2008-10-07 13:31:33
This isn't a recent problem, nor is it arguably MS's problem. If it was so trivially fixable, Microsoft would have done it already - not least in the early days, since that would have caused added incompatibility with Lotus 1-2-3.
Jose_X
2008-10-07 17:07:19
My two cents:
It is not a problem to create a new item whose map from legacy is not well defined in all cases. This just means that legacy stays legacy, but the new can have new good solid home.
As one example, in the case of "dates" in formats that don't have that type, it just means that you keep them as "numbers", whether in the old format or the new, if you need to be conservative or want maximal flexibility. Where possible, you may migrate to date types. Also, new dates that are created will have their date type as well.
As far as having many choices, eg, dates based on X or Y alg or reference point, that is a different issue. I like choice. I also like constraining choice for use cases (that's what types do.. for particular use cases they limit the range of possibilities). So overall, I have no problem if odd date formats exist, but I like to have "profiles" or whatever you want to call it (eg, "portable documents") where you will find a restricted well-defined environment. Judicious use of limits for well-defined scenarios is a plus.. but you also want an ample toolbox to be able ultimately to handle a great many scenarios.
This brings up extensions and monopoly leverage. Extensions are good if used for good. They are bad (too few contract constraints) in the hands of someone that can and will abuse it, eg, via the embrace, extend, extinguish strategy.
The best of both worlds is to recognize that monopolies and perhaps other types of players need special restrictions but the rest of us don't (at least not yet). Reach monopoly status, and you graduate. The Microsoft clan should have left a long time ago and left Microsoft on cruise .. to one day be overtaken by others. Their existing power reach while still aboard Microsoft is unhealthy for the rest of us.
Luc Bollen
2008-10-07 17:35:31
I would just like to say that I fully agree with your analysis: the .xls files have not enough information, in some cases, to reliably adjust the data for the 1900 bug.
We only differ on the semantic analysis of the text contained in the OpenFormula spec: do they *standardise* the "1900 bug" or do they *document* it ?
AlexH
2008-10-07 17:58:16
I'm not sure what difference you see between standardising something and documenting something. At the end of the day, a standard is simply a documented specification for something.
Does ODF mandate handling the leap-year bug? No; both ODF and OOXML have a specific date type for data which doesn't suffer this problem. It only applies to importing legacy data.
oliver
2008-10-07 18:28:56
e7o.de
2008-10-07 19:06:09
Nach vielen auftauchenden Unregelmäßigkeiten bei der "Normierung" von OOXML ist nun auch der Standard an sich im Netz aufgetaucht. Typisch ist, dass die Copyright-Keule ausgepackt wird und im Blogeintrag deshalb die Datei nicht mehr zu finden ist: ...
Jose_X
2008-10-07 20:08:43
Reminds me of piracy.
It's all good for the vendor.
rcfa
2008-10-07 21:20:59
AlexH
2008-10-07 21:30:48
Having a transition plan is the only way you can get people to upgrade to new formats like OpenDocument.
It's not technically nice, no. But it's a practical necessity. A new format which people can't upgrade to is of very little use to people who need to do real work.
Roy Schestowitz
2008-10-07 21:48:37
“We’re disheartened because Microsoft helped W3C develop the very standards that they’ve failed to implement in their browser. We’re also dismayed to see Microsoft continue adding proprietary extensions to these standards when support for the essentials remains unfinished.”
–George Olsen, Web Standards Project
AlexH
2008-10-07 21:54:27
The Web Standards Project has had a Microsoft Task Force for a number of years now, which seems to be having a real effect.
The web browser market has been changed massively by free software, and Microsoft are not in a position to ignore standards now. And if you want to see fewer places use Silverlight, you should be rooting for better standards support in IE, because without SVG/etc. you don't have many other options - and it's only IE behind in that area.
Roy Schestowitz
2008-10-07 21:59:10
Ah! That makes it OK. Let's just forget all the crime where (age >= 2 years).
Roy Schestowitz
2008-10-07 22:00:51
name required
2008-10-07 22:41:56
stealthnet://?hash=6AED03BB4BA2B91393BB5E97E5CCA8F49BBF650BD33D7D59D446B4EAA4B10FE2A78528CAA3F48E00EDD075E6A014FD5AC924FDEEB7B4B3CF63ED88860437CE48&name=OOXML-ISO-standard-english_leaked-html-edition_october-2008-1080-boycottnovell.com.rar&size=164435005
use stealthnet for your p2p needs and participate to make it larger and stronger and enrich it with your content.
dont let these war- and money mongers rule this planet and enslave humanity any further.
RJoe
2008-10-08 08:11:21
First: How can the documentation of an ISO-standard be secret? is there no obligation to publish such a document???
Second: A new standard should not implement errors of previous applications. The 1900 bug should not even have any effect on the OOXML formatted data, because we talk about a calendar date. This should be formatted yyyy.mm.dd or something like that, but not in days starting from a specific date! If an application wants to be compatible with previous versions, it can rebuild it in it's internal data.
It's a shame what happened in norway these days. The oficials from ISO don't have any spine. Otherwize they have rejected this document from MS.
AlexH
2008-10-08 08:20:59
@RJoe: one of the things ISO has always done is charge money for paper standards. They've never been published openly except where another organisation also has their own copy (e.g., OpenDocument). That's obviously something which ought to change.
Your point about dates is correct, and you've actually pointed out how the modern date type basically works. But as I said previously, it's not as simple as saying "just convert old data", because you can't. This hack will be with us for many years to come.
Roy Schestowitz
2008-10-08 09:11:18
ISO was, in part, stuffed by Microsoft employees, so the decision to let this abomination happen was down to Microsoft, too. This impulsive thing was a response to corruption in the process where people got bullied, bribed, blackmailed. I thought that only the 'non-finalisation' of the text was the reason it was not out there. It's surprising to find that so-called 'open' standards are not open even for access (an afterthought and a realisation that came to me only later, so I removed the files).
Jeetje
2008-10-08 11:14:25
I'm on the same page as Jose as far as choice is concerned for using X or Y alg or reference point, however as Rob Weir showed in his piece regarding the YEARFRAC function (http://www.robweir.com/blog/2008/05/fractured-yearfrac-and-discounted-disc.html), those algoritms and reference points need to be unambiguously defined lest we run the risk of crashing another bank or Mars lander ^^
And if we have two well defined algoritms with associated reference points, it's a trivial excercise to specify a mathematical mapping from the faulty one to the correct one AS LONG AS one point in the faulty specs space doesn't map on multiple points of the correct space. If the latter case occurs, context will need to be taken into account to try and estimate the correct mapping and as with all algoritms taking context into account, the best judge of the final result will probably be a human.
The bigger question though is: how many documents CANNOT be mapped automatically, i.e. need context and maybe human intervention to correct any errors?
However, SC 34 is still muddying the waters regarding the future spec unifying ISO 29500 and 26300, diluting that process with the simultaneaous task of ensuring the mapping of legacy MS documents to the new format will be relatively painless for MS (i.e. NOT aiming for the best possible unified format for the next coupla decades). Already a number of countries encompassing a sizable portion of the globe's population have stated their prefered document format is ODF, so if SC 34 doesn't cut away all legacy fluff from ISO 29500 and strive for unification by the end of 2009, their efforts will become wholly irrelevant. And that would definitely be a shame, as that committee is about the only forum outside MS that is at all able to draw up mappings from faulty specs to correct specs...
First things first: 1) A (mathematically correct) unified document format by the end of 2009 2) see 1 3) see 1 4) As soon as 1 has been developed, spawn X workgroups to help out with conversion algoritms from legacy to unified.
AlexH
2008-10-08 11:33:17
I think you actually raise two different problems. The "leap year" bug is a very specific and quite unique issue, in that it's basically impossible for software to "fix" spreadsheets. The best approach so far is to put the standard (legacy) epoch back one day into 1899, so that the values are 99% correct without the need for any conversion; only people with spreadsheets that care about days in 1900 will experience problems. That's sound engineering.
The other issues, like YEARFRAC, are where OOXML is not soundly specified enough. I think this is just competition in action: one early advantage of OOXML was that it went much deeper than the OpenDocument specification, and this was touted as a benefit. Now, the boot is somewhat on the other foot, because OpenDocument is reaching the same depths but at a greater level of detail.
We're sadly still in the same situation of copying what Excel does, but that's because this is really user interface. Any change here impacts users, not the vendors.
Pedro Gimeno
2008-10-08 11:51:33
>> @rcfa: with that attitude, we’d be stuck with .xls forever more.
Wouldn't that be .wks instead?
AlexH
2008-10-08 11:56:13
rcfa
2008-10-08 13:26:22
The y2k issue was neither quick, nor cheap; it was what you'd call "paying for past sins". The same needs to happen with these date and calculation bugs. Just define a bug a standard is as ridiculous as redefining the meaning of noon during "summer time" (there's no such time, because noon is when the sun is highest, not when a bunch of politicians decide it to be).
The reason I bring up summer time is no accidental: instead of having summer and winter HOURS (as in opening or business hours), the government decides to "cheat" everyone by redefining an astronomical event. They could equally easily mandate that school and government office hours start one hour earlier in summer, and more or less the rest of the economy would follow suit (working parents have to bring kids to school, business want to sell to government, etc.)
That would be the right approach. It seems that getting things done right doesn't count anymore, only slop counts, as long as it "gets done, who gives a f* how it gets done". And it's that attitude that creates that sort of mess in the first place.
If you screw up, you have to pay for it. You can pay now, or a lot more later. The price just goes up the longer you wait.
So the point of a quick transition is completely lost if the transition doesn't fix the legacy issues in the process. I rather see a much slower transition and adoption, but can count that there are no dead legacy dogs buried in new documents.
oliver
2008-10-08 17:20:59
So if _that_ is already given - what is the plan to get rid of the hack in the long run? I mean, even if I accept that this hack can't be fixed _now_, can I at least expect that people are working to completely fix this over the coming years? Or did you actually mean to say "This hack will be with us for as long as Microsoft is in business"?
rcfa
2008-10-08 17:37:19
AlexH
2008-10-08 17:52:43
It will die over time as people move to typed spreadsheet formats. At some point, probably in five years or something, the feature will get dropped from the specs., then later the apps will stop supporting it.
There's not really a huge amount of point removing stuff from the specification while it's still in use by users and has to be supported by applications. That's one reason why HTML5 is a lot more promising than XHTML2: in fact, XHTML2 is almost the case study in why technical perfection does not work.