I have done this using a custom tokenfilter that (among other things)
detects hyphenated words and converts it to the 3 variations, using a
regex match on the incoming token:
(\w+)-(\w+)

that runs the following regex transform:

s/(\w+)-(\w+)/$1$2__$1 $2/

and then splits by "__" and passes the original token, the one word and
two word versions through a SynonymFilter further down the chain (see
Lucene in Action, 2nd Edition for code).

-sujit

On Tue, 2011-08-09 at 06:27 -0700, roySolr wrote:
> Hello,
> 
> I have some terms in my index with specials characters. An example is
> "manchester-united". I want that a user can search for
> "manchester-united","manchester united" and  "manchesterunited". What's the
> best way to fix this? i have used the patternReplaceFilter and some
> tokenizers but it couldn't fix the last situation(manchesterunited). Can
> someone helps me?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3238942.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to