We are using Solr trunk (1.4) - currently " nightly exported - yonik - 2009-02-05 08:06:00"
-Peter On Mon, Feb 23, 2009 at 8:07 AM, Koji Sekiguchi <k...@r.email.ne.jp> wrote: > Jacob, > > What Solr version are you using? There is a bug in SolrHighlighter of Solr > 1.3, > you may want to look at: > > https://issues.apache.org/jira/browse/SOLR-925 > https://issues.apache.org/jira/browse/LUCENE-1500 > > regards, > > Koji > > > Jacob Singh wrote: >> >> Hi, >> >> We ran into a weird one today. We have a document which is written in >> German and everytime we make a query which matches it, we get the >> following: >> >> java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 >> at java.lang.String.substring(String.java:1935) >> at >> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) >> >> >> >From source diving it looks like Lucene's highlighter is trying to >> subStr against an offset that is outside the bounds of the body field >> which it is highlighting against. Running a fq against the ID of the >> doucment returns it fine (because no highlighting is done) and I took >> the body and tried to cut the first 2822 chars and while it is near >> the end of the body, it is still in range. >> >> Here is the related code: >> >> startOffset = tokenGroup.matchStartOffset; >> endOffset = tokenGroup.matchEndOffset; >> tokenText = text.substring(startOffset, endOffset); >> >> >> This leads me to believe there is some problem with mb string encoding >> and Lucene's counting. >> >> Any ideas here? Tomcat is configured with UTF-8 btw. >> >> Best, >> Jacob >> >> >> > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com