Hi, I've been trying to use the NGramTokenizer and I ran into a problem. It seems like solr is trying to match documents with all the tokens that the analyzer returns from the query term. So if I index a document with a title field with the value "nice dog" and search for "dog" (where the NGramtokenizer is defined to generate tokens of min 2 and max 2) I won't get any results. I can see in the Analysis tool that the tokenizer generates the right tokens, but then when solr searches it tries to match the exact Phrase instead of the tokens.
I tried the same in Lucene and it works as expected. So it seems to be a Solr issue. Any hint of where should I look in order to fix it? Here you have the lucene code that I used to test the behavior of the lucene NGramTokenizer: public static void main(String[] args) throws ParseException, CorruptIndexException, LockObtainFailedException, IOException { Analyzer n = new Analyzer() { @Override public TokenStream tokenStream(String s, Reader reader) { TokenStream result = new NGramTokenizer(reader,2,2); result = new LowerCaseFilter(result); return result; } }; IndexWriter writer = new IndexWriter("sample_index", n); Document doc = new Document(); Field f = new Field("title", new StringReader("nice dog")); doc.add(f); writer.addDocument(doc); writer.close(); IndexSearcher is = new IndexSearcher("sample_index"); QueryParser qp = new QueryParser("", n); Query parse = qp.parse("title:dog"); Hits hits = is.search(parse); System.out.println(hits.length()); System.out.println(parse.toString()); } Thanks!!! Jonathan