port of Nutch CommonGrams to Solr for help with slow phrase queries

Burton-West, Tom Mon, 24 Nov 2008 10:39:54 -0800

Hello all,

We are having problems with extremely slow phrase queries when the
phrase query contains a common words. We are reluctant to just use stop
words due to various problems with false hits and some things becoming
impossible to search with stop words turned on. (For example "to be or
not to be", "the who", "man in the moon" vs "man on the moon" etc.)


The approach to this problem used by Nutch looks promising.  Has anyone
ported the Nutch CommonGrams filter to Solr?

"Construct n-grams for frequently occuring terms and phrases while
indexing. Optimize phrase queries to use the n-grams. Single terms are
still indexed too, with n-grams overlaid."
http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C
ommonGrams.html


Tom

Tom Burton-West
Information Retrieval Programmer
Digital Library Production Services
University of Michigan Library

port of Nutch CommonGrams to Solr for help with slow phrase queries

Reply via email to