Re: SOLR 4.0 + ReversedWildcardFilterFactory + DefaultSolrHighlighter + multibyte chars => crash?

Ahmet Arslan Mon, 29 Oct 2012 08:55:58 -0700

Hi Tomas,

I think this is same case Marian reported before.


https://issues.apache.org/jira/browse/SOLR-3193
https://issues.apache.org/jira/browse/SOLR-3901


--- On Mon, 10/29/12, Tomas Zerolo <tomas.zer...@axelspringer.de> wrote:

> From: Tomas Zerolo <tomas.zer...@axelspringer.de>
> Subject: SOLR 4.0 + ReversedWildcardFilterFactory + DefaultSolrHighlighter + 
> multibyte chars => crash?
> To: solr-user@lucene.apache.org
> Date: Monday, October 29, 2012, 5:23 PM
> Hi, SOLR gurus
> 
> we're experiencing a crash with SOLR 4.0 whenever the
> results contain
> multibyte characters (more precisely: German umlauts, utf-8
> encoded).
> 
> The crashes only occur when using
> ReversedWildcardFilterFactory (which
> is necessary in 4.0 to be able to have wildcards at the
> beginning of
> the search pattern, as far as I understand), *and* the
> highlighter is
> on. The stack trace (heavily snipped) looks like this:
> 
>  | 12.09.2012 13:08:12 org.apache.solr.common.SolrException
> log
>  | SCHWERWIEGEND: org.apache.solr.common.SolrException:
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
> Token substantial exceeds length of provided text sized
> 5107
>  |         at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:517)
>  |         at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
>  |         at
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
>  |         at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
>  | [...]
>  |         at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
>  |         at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>  |         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>  |         at
> java.lang.Thread.run(Thread.java:662)
>  | Caused by:
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
> Token substantial exceeds length of provided text sized
> 5107
>  |         at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
>  |         at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510)
>  |         ... 32 more
> 
> (excuse the German locale.) 
> 
> Poking around in the sources seems to point (to my untrained
> eye, that
> is) to:
> 
>   <https://issues.apache.org/jira/browse/LUCENE-3080>
> 
> Is this the issue biting us? Any known workarounds?
> Anything
> we might try to pin-point the problem resp. to fix the bug?
> 
> Thanks for any insights, regards
> -- 
> Tomás Zerolo
> Axel Springer AG
> Axel Springer media Systems
> BILD Produktionssysteme
> Axel-Springer-Straße 65
> 10888 Berlin
> Tel.: +49 (30) 2591-72875
> tomas.zer...@axelspringer.de
> www.axelspringer.de
> 
> Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg,
> HRB 4998
> Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita
> Vorstand: Dr. Mathias Döpfner (Vorsitzender)
> Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele
>

Re: SOLR 4.0 + ReversedWildcardFilterFactory + DefaultSolrHighlighter + multibyte chars => crash?

Reply via email to