On 09.03.2004 13:43, Vadim Gritsenko wrote:
Yes, I see your objection - and asked for them already in the bug http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25934 ;)
So what are the practical use cases this might occure? Maybe it's only a theoretical problem depending on the "thing" the index is created from? On which SAX stream the LuceneIndexHandler operates?
I remember there were issues already in other components with text being splitted up onto multiple character events. So, think of this as of preventive maintenance.
Yes, for example this bug: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26219. Two character events following eachother come out of an XSLT process using <xsl:value> twice following eachother. And the AbstractDOMTransformer or more probable one of the component it uses drops the second and following text events.
I also don't get your implications for "had_start_or_end_element_in_between_char_events". But I had a look on the endElement(). It gets the elements from a stack and already tests for text:
if (text != null && text.length() > 0) {
Would it make sense to add the space in endElement, if the element contains text, i.e. the above is true?
This was my first though... But then, multiple closing tags will cause multiple spaces...
Ok, if this disturbs.
So, I thought, this should work:
startElement: flag = true;
endElement: flag = true;
characters: if (flag) x.append(' '); flag = false;
Does it solves the problem?
Unfortunately not:
startElement event character event 'key' character event 'word' character event 'test' endElement event
So you would have 'key wordtest'.
What about
characters: flag = true;
endElement: if (flag) x.append(' '); flag = false;
This is similar like the above mentioned endElement text check, but would prevent multiple spaces from output, wouldn't it?
startElement a character 'key' startElement b character 'word'
Will become "keyword" instead of "key word". No, this won't work, again :-) Addition of
startElement:
if (flag)
x.append(' ');
flag = false;Should fix it, shouldn't it?
Vadim
