[ 
https://issues.apache.org/jira/browse/XERCESJ-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698715#comment-16698715
 ] 

Cristian Talau commented on XERCESJ-1704:
-----------------------------------------

h1. Improvement 1

If XML validation is enabled, almost half of the time is spent in:
{code:java}
org.apache.xerces.impl.dtd.XMLDTDProcessor.attributeDecl(String, String, 
String, String[], String, XMLString, XMLString, Augmentations){code}
namely in the following call whose performance is linear in the number of 
attributes. It is called for each attribute => a quadratic total complexity:
{code:java}
grammar.getAttributeDeclIndex(elementIndex, attributeName){code}
An improvement would be to compute {{duplicateAttributeDef}} lazily, for 
example if {{fWarnDuplicateAttdef}} is enabled.
h1. Improvement 2

In
{code:java}
org.apache.xerces.impl.dtd.DTDGrammar.setAttributeDecl(int, int, 
XMLAttributeDecl){code}
The code looks for the freshly allocated {{attributeDeclIndex}} in the 
attribute declarations linked list. It will never be there, so we can avoid 
this linear search.
h1. Improvement 3

There is another hotspot
{code:java}
org.apache.xerces.impl.dtd.DTDGrammar.attributeDecl(String, String, String, 
String[], String, XMLString, XMLString, Augmentations)  > 
org.apache.xerces.impl.dtd.DTDGrammar.getAttributeDeclIndex(int, String){code}
Here, inlining some methods should also improve the performance. Also using a 
HashSet would also increase the performance for the case in which the attribute 
was not previously declared.

> XML Parsing slow when many attributes are declared for each element in DTD
> --------------------------------------------------------------------------
>
>                 Key: XERCESJ-1704
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1704
>             Project: Xerces2-J
>          Issue Type: Improvement
>          Components: DTD
>    Affects Versions: 2.12.0
>            Reporter: Cristian Talau
>            Priority: Minor
>         Attachments: file.xml, perf.dtd
>
>
> If there are ~1000 attributes declared in the DTD for each XML element 
> parsing an XML document is very slow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to