I have done this using a custom tokenfilter that (among other things) detects hyphenated words and converts it to the 3 variations, using a regex match on the incoming token: (\w+)-(\w+)
that runs the following regex transform: s/(\w+)-(\w+)/$1$2__$1 $2/ and then splits by "__" and passes the original token, the one word and two word versions through a SynonymFilter further down the chain (see Lucene in Action, 2nd Edition for code). -sujit On Tue, 2011-08-09 at 06:27 -0700, roySolr wrote: > Hello, > > I have some terms in my index with specials characters. An example is > "manchester-united". I want that a user can search for > "manchester-united","manchester united" and "manchesterunited". What's the > best way to fix this? i have used the patternReplaceFilter and some > tokenizers but it couldn't fix the last situation(manchesterunited). Can > someone helps me? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3238942.html > Sent from the Solr - User mailing list archive at Nabble.com.