We use the PatternTokenizerFactory.  We have the following in our
schema:

 <tokenizer class="solr.PatternTokenizerFactory"
pattern="([^a-zA-Z0-9_])"/>

And to get rid of '_' we just remove it from the pattern.

-----Original Message-----
From:
solr-user-return-32434-laurent.vauthrin=disney....@lucene.apache.org
[mailto:solr-user-return-32434-laurent.vauthrin=disney....@lucene.apache
.org] On Behalf Of Christopher Ball
Sent: Thursday, February 11, 2010 6:35 AM
To: solr-user@lucene.apache.org
Cc: 'Ahmet Arslan'
Subject: RE: The Riddle of the Underscore and the Dollar Sign . . .

Unfortunately, the underscore is being quite resilient =(

I tried the solr.MappingCharFilterFactory and know the mapping is
working as
I am changing "c" => "q" just fine. But the underscore refuses to go!

I am baffled . . .



-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Thursday, February 11, 2010 3:11 AM
To: solr-user@lucene.apache.org
Subject: Re: The Riddle of the Underscore and the Dollar Sign . . .

> 1) How can I get rid of underscores('_') without using the
> wordDelimiter
> Filter (which gets rid of other syntax I need)?

Before TokenizerFactory you can apply <charFilter
class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> that will
replace "_" with " " or "" depending of your needs.

mapping.txt will contain:

"_" => "" or 
"_" => " " 


      


Reply via email to