(11/07/15 8:23), SBS wrote:
I have come across a somewhat baffling problem.  I am indexing HTML documents
and one of them is larger than the rest at about 200K.  For some reason when
I search for terms which occur only towards the end of the document (i.e.
after some apparent "cutoff" point in the document), the document itself is
returned as a match but when I call Highlighter#getBestFragments() it
returns an empty array.  This same method returns fragments if the terms
occur in the first part of the document.

So, am I running into some size limitation in either documents or fragments?
What else could be causing this behaviour?

There is a limitation. Try to set the following parameter to high (default is 
50*1024):

http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/search/highlight/Highlighter.html#setMaxDocCharsToAnalyze%28int%29

koji
--
http://www.rondhuit.com/en/

Reply via email to