On 09.03.2004 13:43, Vadim Gritsenko wrote:

Yes, I see your objection - and asked for them already in the bug http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25934 ;)

So what are the practical use cases this might occure? Maybe it's only a theoretical problem depending on the "thing" the index is created from? On which SAX stream the LuceneIndexHandler operates?

I remember there were issues already in other components with text being splitted up onto multiple character events. So, think of this as of preventive maintenance.

Yes, for example this bug: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26219. Two character events following eachother come out of an XSLT process using <xsl:value> twice following eachother. And the AbstractDOMTransformer or more probable one of the component it uses drops the second and following text events.


I also don't get your implications for "had_start_or_end_element_in_between_char_events". But I had a look on the endElement(). It gets the elements from a stack and already tests for text:
if (text != null && text.length() > 0) {
Would it make sense to add the space in endElement, if the element contains text, i.e. the above is true?

This was my first though... But then, multiple closing tags will cause multiple spaces...

Ok, if this disturbs.


So, I thought, this should work:

startElement:
   flag = true;

endElement:
   flag = true;

characters:
   if (flag)
       x.append(' ');
       flag = false;

Does it solves the problem?

Unfortunately not:


startElement event
character event 'key'
character event 'word'
character event 'test'
endElement event

So you would have 'key wordtest'.

What about

characters:
    flag = true;

endElement:
    if (flag)
        x.append(' ');
        flag = false;

This is similar like the above mentioned endElement text check, but would prevent multiple spaces from output, wouldn't it?

Joerg

Reply via email to