(11/09/23 8:57), O. Klein wrote:
The content_text field is filled with text from pdf's. So this is not the problem. Besides the regex fragmenter gives back multiple snippets like expected.
This doesn't show that BoundaryScanner has the bug. Highlighter's fragmenter and FVH FragmentsBuilder are totally different.
Have you tested to see if a boundaryscanner of type LINE gives back multiple snippets with your content?
No, I haven't. Do you mean LINE type causes the problem? Can you get two snippets if you use WORD type BreakIteratorBoundaryScanner? You can implement your own BoundaryScanner instead, if you think LINE BreakIterator doesn't work as you expected. koji -- Check out "Query Log Visualizer" for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/