On 30-Aug-07, at 4:01 PM, Chris Hostetter wrote:
You could accomplish the goal without any coding by using phrase
queries: "calico calico calico"~10000 will match only documents
that have at least three occurrences of calico. If this is
performant enough, you are done. Otherwise, you'll have to do
some custom coding.
I'll be searching article content so literals like "cat cat cat"
are improbable.
i think you missunderstood Mike's point ... the query string...
foo:"cat cat cat"~10000
...will only match documents containing three instances of the term
"cat" in the field "foo" where those instances are all withing
10000 term positions of eachother ... hte idea being that as long
as the "slop" (number) used is bigger then the largest document you
expect to deal with, this will esentially give you want you want.
Note too that by default solr only indexes the first 10k tokens, so
this should work for all documents in the index.
-Mike