01.11.07
Gemini version available ♊︎Proprietary Open XML Extensions (Already!)
As you are likely aware, Excel 2007 includes a new file format for storing data, well actually it has a few new file formats apparently. And, none of them are OpenDocument, in case you were wondering.
Rob Weir takes Office 2007 for a spin, and has some interesting things to report regarding the file formats being used by Excel 2007.
In addition to the default Open XML file format (.xlsx) that has been added to Office 2007, there is also an additional format called the Excel Macro-Enabled Workbook (.xlsxm) which contains binary-only data not specified in the ECMA standard. There is also an all-new binary-only format (.xlsb), which Microsoft says provides "optimal performance and backward compatibility" (wasn’t that the point of Open XML?).
The “Excel Macro-Enabled Workbook” option saves as an “xlsxm” extension. It is OOXML plus proprietary Microsoft extensions. These extensions, in the form of binary blob called vbaProject.bin, represent the source code of the macros. This part of the format is not described in the OOXML specification. It does not appear to be a compiled version of the macro. I could reload the document in Excel and restore the original text of my macro, including whitespace and comments. So source code appears to be stored, but in an opaque format that defied my attempts at deciphering it.
(What’s so hard about storing a macro, guys? It’s frickin’ text. How could you you[sic] screw it up? )
This has some interesting consequences. It is effectively a container for source code that not only requires Office to run it, but requires Office to even read it. So you could have your intellectual property in the form of extensive macros that you have written, and if Microsoft one day decides that your copy of Office is not “genuine” you could effectively be locked out of your own source code.
There is also a method to add in additional file formats for saving to, including PDF and Microsoft’s XPS, but there is no native ODF support yet.
Overall, Rob’s experience was a bit buggy, and there was an incident where trying to save to Open XML prompted a message about incompatible features (so much for backward compatibility, hey try the new binary-only format…).
I wonder how Novell OpenOffice.org’s VBA support is going to handle the new binary information in the macro-enabled workbook? Still better than the next MS Office for Mac, I suppose.
Stephane Rodriguez said,
January 23, 2007 at 12:27 pm
I am a new visitor. While I find your blog instructive (subscribed), I’d like to shed some light on a confusion here.
VBA projects are encoded in bin parts in the new Office 2007 file formats, whether it’s Word, Excel or Powerpoint. Those parts are the actual VBA streams that we find in older versions of the corresponding applications. That’s a direct extraction.
There are other binary parts, such as printer settings parts and OLE objects parts.
The new XLSB file format pushes XLSM even further by making the important XML parts encoded in much the same way than BIFF used to do with older versions. Actually, those bin parts are christened as BIFF12. BIFF12 is undocumented right now.
I have written an article on the subject here :
http://www.codeproject.com/useritems/office2007bin.asp
Roy Schestowitz said,
January 23, 2007 at 12:47 pm
Thank you for the information, Stephane. For the record, I notice that you are among the Open XML team members:
http://openxmldeveloper.org/members/Stephane+Rodriguez.aspx
Also, yesterday’s news indicate that Microsoft pays people to “bring balance” to content which speaks about Open XML. I just hope you are not being paid to post here.
In any event, your comment does not invalidate the fact that Open XML has undocumented, binary parts.
Stephane Rodriguez said,
January 23, 2007 at 1:41 pm
If you think I may be a paid shill, take a look at this : http://xlsgen.arstdesign.com/special/OOXML_objections.pdf
(this is on my website)
The OOXML specs are so bad I had to develop my own tool, called “diffopc”, to make any progress in my product (an Excel file format component which recently added partial support for Excel 2007).
shane said,
January 23, 2007 at 1:52 pm
Thank you, Stephane. We’ve had a huge influx of trolling as of the last few days, and perhaps are a bit on edge. There is no doubt from your first linked article (I didn’t read the pdf yet) that you are well versed in the OOXML spec.
I love the fact that we do have discussions on the site, and want to encourage further discussion and corrections. I want our arguments to withstand scrutiny, and encourage any readers to question or comment on our premises.
Let’s all just keep it mature, and there is no reason to post using pseudonyms folks, just type in "anonymous" or "i disagree". I still do not intend to censor any non-spam comments, nor shut down commenting after x number of days on an article, since many are still actively being viewed and linked to.
Michael said,
September 12, 2007 at 9:10 am
Thanks for the nice post!