: I see now that getBestTextFragments() takes in a token stream - and : each token in this steam already has start/end positions set. So, the : patch at LUCENE-1500 would mitigate the exception, but looks like the : real bug is in Solr.
so what does the analysis screen tell you about each token produced with that input text given your configuration? in verbose mode it will show the start/end offsets for every token, so it should be fairly easy to identify where the bug is. -Hoss