Some thoughts below, sorry for the late reply...
On Aug 6, 2009, at 2:27 PM, Mark Bennett wrote:
I'm investigating a problem I bet some of you have hit before, and
exploring
several options to address it. I suspect that this specific IDF
scenario is
common enough that it even has a name, t
As soon as I started reading your message I started thinking "common
grams", so that is what I would try first, esp. since somebody already
did the work of porting that from Nutch to Solr (see Solr JIRA).
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Ka