Hi Tomas, I think this is same case Marian reported before.
https://issues.apache.org/jira/browse/SOLR-3193 https://issues.apache.org/jira/browse/SOLR-3901 --- On Mon, 10/29/12, Tomas Zerolo <tomas.zer...@axelspringer.de> wrote: > From: Tomas Zerolo <tomas.zer...@axelspringer.de> > Subject: SOLR 4.0 + ReversedWildcardFilterFactory + DefaultSolrHighlighter + > multibyte chars => crash? > To: solr-user@lucene.apache.org > Date: Monday, October 29, 2012, 5:23 PM > Hi, SOLR gurus > > we're experiencing a crash with SOLR 4.0 whenever the > results contain > multibyte characters (more precisely: German umlauts, utf-8 > encoded). > > The crashes only occur when using > ReversedWildcardFilterFactory (which > is necessary in 4.0 to be able to have wildcards at the > beginning of > the search pattern, as far as I understand), *and* the > highlighter is > on. The stack trace (heavily snipped) looks like this: > > | 12.09.2012 13:08:12 org.apache.solr.common.SolrException > log > | SCHWERWIEGEND: org.apache.solr.common.SolrException: > org.apache.lucene.search.highlight.InvalidTokenOffsetsException: > Token substantial exceeds length of provided text sized > 5107 > | at > org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:517) > | at > org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) > | at > org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136) > | at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) > | [...] > | at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) > | at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) > | at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) > | at > java.lang.Thread.run(Thread.java:662) > | Caused by: > org.apache.lucene.search.highlight.InvalidTokenOffsetsException: > Token substantial exceeds length of provided text sized > 5107 > | at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) > | at > org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510) > | ... 32 more > > (excuse the German locale.) > > Poking around in the sources seems to point (to my untrained > eye, that > is) to: > > <https://issues.apache.org/jira/browse/LUCENE-3080> > > Is this the issue biting us? Any known workarounds? > Anything > we might try to pin-point the problem resp. to fix the bug? > > Thanks for any insights, regards > -- > Tomás Zerolo > Axel Springer AG > Axel Springer media Systems > BILD Produktionssysteme > Axel-Springer-Straße 65 > 10888 Berlin > Tel.: +49 (30) 2591-72875 > tomas.zer...@axelspringer.de > www.axelspringer.de > > Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, > HRB 4998 > Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita > Vorstand: Dr. Mathias Döpfner (Vorsitzender) > Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele >