We use the PatternTokenizerFactory. We have the following in our schema: <tokenizer class="solr.PatternTokenizerFactory" pattern="([^a-zA-Z0-9_])"/>
And to get rid of '_' we just remove it from the pattern. -----Original Message----- From: solr-user-return-32434-laurent.vauthrin=disney....@lucene.apache.org [mailto:solr-user-return-32434-laurent.vauthrin=disney....@lucene.apache .org] On Behalf Of Christopher Ball Sent: Thursday, February 11, 2010 6:35 AM To: solr-user@lucene.apache.org Cc: 'Ahmet Arslan' Subject: RE: The Riddle of the Underscore and the Dollar Sign . . . Unfortunately, the underscore is being quite resilient =( I tried the solr.MappingCharFilterFactory and know the mapping is working as I am changing "c" => "q" just fine. But the underscore refuses to go! I am baffled . . . -----Original Message----- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, February 11, 2010 3:11 AM To: solr-user@lucene.apache.org Subject: Re: The Riddle of the Underscore and the Dollar Sign . . . > 1) How can I get rid of underscores('_') without using the > wordDelimiter > Filter (which gets rid of other syntax I need)? Before TokenizerFactory you can apply <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> that will replace "_" with " " or "" depending of your needs. mapping.txt will contain: "_" => "" or "_" => " "