--- On Sat, 10/2/10, Ahmet Arslan <iori...@yahoo.com> wrote:

> > I don't understand. Many tags like "electric吉他"
> or
> > "古典吉他" have no whitespace at all, so how does
> > WhitespaceTokenizer help?
> 
> It makes sense for tags having more than one words. i.e.
> "electric guitar"
> 
> If you tokenize this using whitespacetokenizer, you obtain
> two tokens.
> If you use keywordtokenizer, you obtain only one token,
> always.
> 
> In other words, if you want query qui to return "electric
> guitar" you need whitespacetokenizer.


But I thought NGramFilterFactory would generate substrings that start in the 
"middle", hence ensuring autocomplete matching in the middle.

So in the case of "electric guitar", keywordtokenizer would create one token - 
"electric guitar"

NGramFilterFactory would then take that one toke ("electric guitar") and 
generate N-grams out of it. One of the ngrams would be "guit" because "guit" is 
a substring of "electric guitar".

Or did I misunderstand how NGramFilterFactory work?






Reply via email to