Hi, We ran into a weird one today. We have a document which is written in German and everytime we make a query which matches it, we get the following:
java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 at java.lang.String.substring(String.java:1935) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) >From source diving it looks like Lucene's highlighter is trying to subStr against an offset that is outside the bounds of the body field which it is highlighting against. Running a fq against the ID of the doucment returns it fine (because no highlighting is done) and I took the body and tried to cut the first 2822 chars and while it is near the end of the body, it is still in range. Here is the related code: startOffset = tokenGroup.matchStartOffset; endOffset = tokenGroup.matchEndOffset; tokenText = text.substring(startOffset, endOffset); This leads me to believe there is some problem with mb string encoding and Lucene's counting. Any ideas here? Tomcat is configured with UTF-8 btw. Best, Jacob -- +1 510 277-0891 (o) +91 9999 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com