Re: Review Request 109887: Create a stream reader better suited for ODF than QXmlStreamReader

Sebastian Sauer Tue, 09 Apr 2013 08:57:53 -0700


> On April 9, 2013, 9:05 a.m., Inge Wallin wrote:
> > libs/odf/KoXmlStreamReader.cpp, line 229
> > <http://git.reviewboard.kde.org/r/109887/diff/2/?file=131690#file131690line229>
> >
> >     This code is indeed slower than the normal QXmlStreamReader. Since it 
> > builds on top of it, it would be impossible not to be. But you shouldn't 
> > compare this class with the normal stream reader. You should compare 
> > QXmlStreamReader plus the code that calls it, i.e. the odf parser on top of 
> > it to this class plus the code that calls _it_.
> >     
> >     The purpose of this class is not primarily to make the code faster but 
> > to make it nicer. As I wrote in the description of the review request it 
> > will result in much easier to read code. See the description for more 
> > details.
> >     
> >     Regarding the plans, I have 3 terms:
> >     
> >     Short term, I want to use this for the docx export filter. After good 
> > feedback to the odf traversing library I realized that stream reading was a 
> > much better approach. I will port the odf traverser to use stream reading 
> > and use that for the filter.
> >     
> >     Medium term I think that we could port the KoXmlReader (DOM based) to 
> > KoXmlStreamReader. We would have to do extensive testing before doing that. 
> > This would let us remove the call to the extremely ugly fixNamespaces() 
> > function and therefore make it a lot faster.  Since 
> > KoXmlReader.setDocument() uses around 15% of the total parsing time I think 
> > we could get a nice speedup there.
> >     
> >     Long term it would be very nice if we could start to use stream reading 
> > in our loading code. But for that to happen we need to change the structure 
> > of loading so that we can combine DOM based loading with stream based. I 
> > don't have any particular plans but I think it will mature when the current 
> > classes are used more.
> 
> Sebastian Sauer wrote:
>     > This code is indeed slower than the normal QXmlStreamReader.
>     
>     :(
>     
>     I still thing the overall idea may indeed not bad if done under 
> performance pov. I mean there certainly is room for optimization, always :)
>     
>     > The purpose of this class is not primarily to make the code faster
>     
>     Then we may disagree about priorities. For me that code-path is very hot 
> and I know that I am not the only one who spend *WEEKS* on profiling that to 
> win as less as a single digit percent more performance. Same goes for 
> memory-consumption which is the other hot topic in that code.
>     
>     > but to make it nicer.
>     
>     I am not even sure it is nicer. Sure it is at lease less verbose if you 
> compare:
>     if (e.ns() == KoXML::foo && e.tagName() == "tag")
>     with
>     if (e.name() == "foo:tag")
>     
>     but the first statement can be easily made nicer without any performance 
> hit using a macro or a template function like:
>     
>     inline bool elementEquals(e, ns, tag) {}
>     if (elementEquals(e, KoXML::foo, "tag"))
>     
>     or since we have a well known small number of namespaces we could even go 
> with:
>     
>     inline bool elementEqualsFoo(e, tagname) {}
>     if (elementEqualsFoo(e, "tag"))
>     
>     or if we are very funny we could inherit from QXMLStreamReader and add a 
> == operator and restructure the code to actually use the namespace in a 
> useful manner like:
>     
>     if (e == KoXML::foo) {
>        if(e == "tag1") {}
>        else if(e == "tag2") {}
>     } else if (e == KoXML::bar) {
>        if(e == "tag3") {}
>        else if(e == "tag4") {}
>     }
>     
>     or add proper specialized operators or even methods to use like:
>     
>     if (e.isFoo("tag"))
>     
>     > let us remove the call to the extremely ugly fixNamespaces() function 
> and therefore make it a lot faster.
>     
>     Well, it cannot be removed. Its clear defined how namespaces work and 
> hard-coding the prefix violates XML and ODF. See 
> https://tools.oasis-open.org/issues/browse/OFFICE-3712
>     
>     "Instances of XML documents can employ any [xml-names]-admissable 
> declarations of prefixes and default namespaces that accomplish binding of 
> the listed namespaces."
>     
>     What we could do is indeed to make that if-conditions smaller/nicer since 
> that's what this is about or? For that we not need to write our own 
> StreamReader that introduces as side-effect performance and probably memory 
> consumption regressions. Definitively not.
>


> Since KoXmlReader.setDocument() uses around 15% of the total parsing time

Because it does

QXmlStreamReader reader(device);
reader.setNamespaceProcessing(namespaceProcessing);
bool result = doc.setContent(&reader, errorMsg, errorLine, errorColumn);
return result;

What is processing the whole document :)

But now I wonder. Is it faster or is it not?

> Long term it would be very nice if we could start to use stream reading in 
> our loading code.

+1 since it may indeed gives us performance++ and memory++ :)


- Sebastian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/109887/#review30755
-----------------------------------------------------------


On April 6, 2013, 3:56 p.m., Inge Wallin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/109887/
> -----------------------------------------------------------
> 
> (Updated April 6, 2013, 3:56 p.m.)
> 
> 
> Review request for Calligra, Jarosław Staniek and Jos van den Oever.
> 
> 
> Description
> -------
> 
> This patch contains a new XML stream reader based on the QXmlStreamReader 
> that is better suited for ODF.
> 
> Much ODF parsing code in Calligra looks like:
> 
>   if (el.namespaceUri() == KoXml::fo && el.name == "border") { ... }
> 
> The reason for this complicated construction is that the prefix (the "fo" in 
> "fo:border") is not unique but is declared at the beginning of each Xml file. 
> Even though "fo" is the normal prefix there is no guarantee that it is  the 
> same in every document. 
> 
> However, it is a very rare document where it is *not* the normal prefix, so 
> what we want to do is to be able to write code like this:
> 
>   If (el.qualifiedName() == "fo:border") { ... }
> 
> and make the XML stream reader or dom tree rewrite the qualified name in the 
> very rare cases that the prefix does not match what we want.
> 
> This is exactly what the KoXmlStreamReader does. It allows you to write 
> easier and faster code while still be correct in the few cases where the 
> prefixes are not the expected ones. It does this by letting the user enter 
> the expected namespace declarations and then compare those to the actual 
> namespace declarations in the document. If they match everything will be as 
> fast as possible. If they don't, it will be slower but still correct.
> 
> As an extra feature I have allowed the user to declare some extra namespaces 
> (see fixNamespace() in KoXmlReader.cpp). This will let documents created by 
> old versions of OpenOffice.org be read even though they use other namespaces.
> 
> I have code that uses this file but that is not yet ready for review. I 
> wanted to put this up early to get feedback while the rest of the yet 
> unfinished code is maturing.
> 
> 
> Diffs
> -----
> 
>   libs/odf/CMakeLists.txt 3680486 
>   libs/odf/KoXmlStreamReader.h PRE-CREATION 
>   libs/odf/KoXmlStreamReader.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/109887/diff/
> 
> 
> Testing
> -------
> 
> Not much. I will do that when the code that uses this code is ready. This 
> review is for getting feedback on the ideas and implementation details.
> 
> 
> Thanks,
> 
> Inge Wallin
> 
>

_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel

Re: Review Request 109887: Create a stream reader better suited for ODF than QXmlStreamReader

Reply via email to