NGramTokenizer issue

Jonathan Ariel Wed, 25 Jun 2008 11:37:39 -0700

Hi,
I've been trying to use the NGramTokenizer and I ran into a problem.
It seems like solr is trying to match documents with all the tokens that the
analyzer returns from the query term. So if I index a document with a title
field with the value "nice dog" and search for "dog" (where the
NGramtokenizer is defined to generate tokens of min 2 and max 2) I won't get
any results.
I can see in the Analysis tool that the tokenizer generates the right
tokens, but then when solr searches it tries to match the exact Phrase
instead of the tokens.


I tried the same in Lucene and it works as expected. So it seems to be a
Solr issue. Any hint of where should I look in order to fix it?

Here you have the lucene code that I used to test the behavior of the lucene
NGramTokenizer:

    public static void main(String[] args) throws ParseException,
CorruptIndexException, LockObtainFailedException, IOException {

        Analyzer n = new Analyzer() {

            @Override
            public TokenStream tokenStream(String s, Reader reader) {
                TokenStream result = new NGramTokenizer(reader,2,2);
                result = new LowerCaseFilter(result);
                return result;
            }

        };

        IndexWriter writer = new IndexWriter("sample_index", n);
        Document doc = new Document();
        Field f = new Field("title", new StringReader("nice dog"));
        doc.add(f);
        writer.addDocument(doc);
        writer.close();

        IndexSearcher is = new IndexSearcher("sample_index");

        QueryParser qp = new QueryParser("", n);
        Query parse = qp.parse("title:dog");

        Hits hits = is.search(parse);

        System.out.println(hits.length());
        System.out.println(parse.toString());
   }


Thanks!!!

Jonathan

NGramTokenizer issue

Reply via email to