Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread roySolr
THANK YOU!! I thought i only could use one character for the pattern.. Now i use a regular expression:) I don't need the wordDelimiter anymore. It's split on # and whitespace dataset: mcdonald's#burgerking#Free record shop#h&m mcdonald's burgerking free record shop h&m This is exactly how we

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread Erick Erickson
It's a little obscure, but you can use http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory in front of WhitespaceTokenizer if you prefer. Note that a CharFilterFactory is different than a FilterFactory, so read carefully .. Best Erick On Tue, Jun 14,

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread lee carroll
do you need the word delimiter ? #|\s i think its just regex in the pattern tokeniser - i might be wrong though ? On 14 June 2011 11:15, roySolr wrote: > Ok, with catenatewords the index term will be mcdonalds. But that's not what > i want. > > I only use the wordDelimiter to split on whitespa

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread roySolr
Ok, with catenatewords the index term will be mcdonalds. But that's not what i want. I only use the wordDelimiter to split on whitespace. I have already used the PatternTokenizerFactory so i can't use the whitespacetokenizer. I want my index looks like this: dataset: mcdonald's#burgerking#Free r

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-10 Thread Erick Erickson
Hmmm, that is confusing. the stemEnglishPossessive=0 actually leaves the 's' in the index, just not attached to the word. The admin/analysis page can help show this Setting it equal to 1 removes it entirely from the stream. If you set catenateWords=1, you'll get "mcdonalds" in your index if s