(11/09/23 8:57), O. Klein wrote:
The content_text field is filled with text from pdf's. So this is not the
problem. Besides the regex fragmenter gives back multiple snippets like
expected.

This doesn't show that BoundaryScanner has the bug. Highlighter's fragmenter
and FVH FragmentsBuilder are totally different.

Have you tested to see if a boundaryscanner of type LINE gives back multiple
snippets with your content?

No, I haven't. Do you mean LINE type causes the problem? Can you get two 
snippets
if you use WORD type BreakIteratorBoundaryScanner?

You can implement your own BoundaryScanner instead, if you think
LINE BreakIterator doesn't work as you expected.

koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Reply via email to