--- On Sat, 10/2/10, Ahmet Arslan <iori...@yahoo.com> wrote:
> > I don't understand. Many tags like "electric吉他" > or > > "古典吉他" have no whitespace at all, so how does > > WhitespaceTokenizer help? > > It makes sense for tags having more than one words. i.e. > "electric guitar" > > If you tokenize this using whitespacetokenizer, you obtain > two tokens. > If you use keywordtokenizer, you obtain only one token, > always. > > In other words, if you want query qui to return "electric > guitar" you need whitespacetokenizer. But I thought NGramFilterFactory would generate substrings that start in the "middle", hence ensuring autocomplete matching in the middle. So in the case of "electric guitar", keywordtokenizer would create one token - "electric guitar" NGramFilterFactory would then take that one toke ("electric guitar") and generate N-grams out of it. One of the ngrams would be "guit" because "guit" is a substring of "electric guitar". Or did I misunderstand how NGramFilterFactory work?