Hello all

I would like to handle german accents (Umlaute) by replacing the accented char 
with its two-letter substitute (e.g ä => ae). For this reason I use the 
char-filter solr.MappingCharFilterFactory configured with a mapping file 
containing entries like “ä” => “ae”. I also want to use the 
solr.DictionaryCompoundWordTokenFilterFactory to find words which are part of 
compound words (e.g. revision in totalrevision). And finally I want to use Solr 
highlighting. But there seems to be a problem if I combine the char filter and 
the compound word filter in combination with highlighting (an 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException is raised).

Here are the details:

types:
--------
    <fieldType name="textAnalyzedFailed" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.DictionaryCompoundWordTokenFilterFactory" 
dictionary=“words.txt”/>
      </analyzer>
    </fieldType>

schema:
-----------
  <fields>
     <field name="id"         type="string"               indexed="true" 
stored="true" required="true" /> 
     <field name="title"      type="textAnalyzedFailed"   indexed="true" 
stored="true"/>
  </fields>

document:
--------------
  <doc>
     <field name="id">1</field> 
     <field name="title">banküberfall</field> 
  </doc>

mapping.txt:
-----------------
"ü" => "ue"

words.txt:
--------------
fall

The resulting error when search with:

http://localhost:8080/solr/select/?q=banküberfall&hl=true&hl.fl=title

Nov 4, 2011 4:29:12 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select/ 
params={q=bank?berfall&hl.fl=title_hl&hl=true} hits=1 status=0 QTime=13 
Nov 4, 2011 4:29:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token fall 
exceeds length of provided text sized 12
        at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:469)
        at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378)
        at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
        at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
        at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
        at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:851)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
        at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:278)
        at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
        at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: 
Token fall exceeds length of provided text sized 12
        at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:228)
        at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:462)
        ... 23 more

Thanks a lot for any suggestions and best regards,
Edwin

Reply via email to