Now that there is some filter work going on, there are people waking up to the idea of adding more features. Notably some would like to see that the msoscheme parser is more lenient with invalid data.
The technical background is this: in powerpoint files, the data is split in records. Each record starts with a header that has a type number and a size. Records can be nested. Even without knowing the details of all the records, one can still parse them. One simply cannot assign a meaning to them. The parser in msoscheme fails when it does not recognize some data. There are two cases where some would like the parser to be more lenient: - if the order of data members in a record is wrong - if some record has data members that are invalid - if a record has unknown data members The historic reason why the parser is very strict is simple: we want to follow the documentation published by Microsoft and be clear on any exceptions in it. So far there are quite a few exceptions noted in the mso.xml file from which the parser is generated. Yet, there are still ppt files out there that cannot be parsed. The reasons vary from bugs in the creating software to lacking documentation. If the parser would be more lenient, it would probably be able to parse these structures. I think making the parser more lenient is a good idea, considering these limitations: - the mso.xml will not change: being lenient does not change the original definition - the leniency is optional for the separate aspects of leniency (see above) - there are callbacks to report where a file violates the specification (position in the file, size of record, type of record) Patches to the parser generator that meet these requirements are very welcome. There is an additional cost though. If the parser is more lenient, this has consequences for the assumptions made in code that uses the results of the parser. For the three types of leniency here are the consequences: - if the order of data members in a record is wrong, just parse them This has no consequences if each member in a record has a unique type. If there are two members with the same type they might be swapped. Typically, swapping will not have large consequences. - if some record has data members that are invalid, just parse them This is dangerous and invasive. If you need to check if each member is valid after parsing, the size of the code interpreting the parser results will blow up with 'if' statements. - if a record has unknown data members, just ignore them This is fine, unless you want to be able to save them back or if you are worried about losing information. Cheers, Jos -- Jos van den Oever, software architect +49 391 25 19 15 53 074 3491911 http://kogmbh.com/legal/ _______________________________________________ calligra-devel mailing list calligra-devel@kde.org https://mail.kde.org/mailman/listinfo/calligra-devel