On Jan 2, 7:04 pm, John Machin <[email protected]> wrote: > On Jan 3, 2:01 am, brooklineTom <[email protected]> wrote: > > <snip> > > My point was that however the original XLS files were created or > acquired, the first step in your solution involves converting the XLS > file to "XML Spreadsheet" format, which requires a copy of Excel on a > Windows box. Many people start with an XLS file, no Excel and no > Windows box, no COM, and users who can't be relied on to open a file > and save it in the right format with the right name and extension.
True enough. I develop on a WinXP box and have Excel. I just used it. > BTW, did you consider opening the XLS files with OpenOffice.org's Calc > and saving it in their default ods format (chunks of XML in a zip > file)? No. As I said, I have Excel. > > 3. The largest file I used was about 228M, > > Is that the XLS file or the XML file? That's the xml file size. I just *love* file-bloat, don't you? :-) > > <snip> > > > I knew that I needed only > > a small subset of the xlrd behavior, and I concluded (perhaps > > incorrectly) that it would be easier to roll my own parser than find, > > extract, and then port (to my own framework) the corresponding parts > > of xlrd. > > Possibly incorrectly. If approached at the time, I would have said: > (a) if desperate to DIY: > (a1) ignore any code for old Excel versions (self.biff_version < 80) > (a2) ignore any code for extracting formatting info > (self.formatting_info) > (a3) find the Sheet.put_cell* methods in sheet.py e.g. > > def put_cell(self, rowx, colx, ctype, value, xf_index): > > ignore the xf_index arg and subvert them to your own needs instead of > filling up a big rectangular arena with data > > (b) if not really so desperate, talk to me about implementing an > option in xlrd where callers can specify a callback to be used instead > of the Sheet.put_cell* methods > > (c) What is all this "port to my own framework" caper anyway? If you > need to extract data from a database, do you rummage in their code > libraries and port the relevant bits to your own framework? I'm sure xlrd is fine, I wasn't any way suggesting that any "improvements" are needed. I was just explaining what I did. > > <snip> > > "Crack"? It's not the Enigma code. It's not even rot13. Spreadsheet > XML tells you the type (String, Boolean, ...). xlrd tells you the type > (XL_CELL_TEXT, XL_CELL_BOOLEAN, ...). Sure, once you've sorted through the excess MS-specific stuff, dealt with the runs of empty cells, and so on. I needed to write an exception file, containing the original spreadsheet entries that failed, so that our curators could find and fix (or delete) them. Thus, I already had to do row-by-row handling of the input, I had to catch and handle exceptions, I had to know and remember enough about the original SS format/layout to replicate it in the exception file, and so on. > I think so. Thanks. You didn't directly address the "steep learning > curve" question, but you explained enough of where you came from. > > It's a matter of "horses for courses". I guess some people might > regard (xml.dom, xml.dom.minidom, DIY "pullparser") as having a > slightly non-horizontal learning curve :-) *LOL* Yes. Sadly, I had already climbed the dom/minidom/pullparser curve for all the other stuff I have to do. In the space I work in, xml handling is pervasive enough that I already have all that stuff "in my hands" anyway. I wonder if the OP ever read any of this.... :-) Thx, Tom -- http://mail.python.org/mailman/listinfo/python-list
