THANK YOU!!
I thought i only could use one character for the pattern.. Now i use a
regular expression:)
I don't need the wordDelimiter anymore. It's split on # and whitespace
dataset: mcdonald's#burgerking#Free record shop#h&m
mcdonald's
burgerking
free
record
shop
h&m
This is exactly how we
It's a little obscure, but you can use
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory
in front of WhitespaceTokenizer if you prefer. Note that
a CharFilterFactory is different than a FilterFactory, so
read carefully ..
Best
Erick
On Tue, Jun 14,
do you need the word delimiter ?
#|\s
i think its just regex in the pattern tokeniser - i might be wrong though ?
On 14 June 2011 11:15, roySolr wrote:
> Ok, with catenatewords the index term will be mcdonalds. But that's not what
> i want.
>
> I only use the wordDelimiter to split on whitespa
Ok, with catenatewords the index term will be mcdonalds. But that's not what
i want.
I only use the wordDelimiter to split on whitespace. I have already used the
PatternTokenizerFactory so i can't use the whitespacetokenizer.
I want my index looks like this:
dataset: mcdonald's#burgerking#Free r
Hmmm, that is confusing. the stemEnglishPossessive=0
actually leaves the 's' in the index, just not attached to the
word. The admin/analysis page can help show this
Setting it equal to 1 removes it entirely from the stream.
If you set catenateWords=1, you'll get "mcdonalds" in
your index if s