Mike Klaas wrote:
On 30-Aug-07, at 4:01 PM, Chris Hostetter wrote:
You could accomplish the goal without any coding by using phrase
queries: "calico calico calico"~10000 will match only documents
that have at least three occurrences of calico. If this is
performant enough, you are done. Otherwise, you'll have to do some
custom coding.
I'll be searching article content so literals like "cat cat cat" are
improbable.
i think you missunderstood Mike's point ... the query string...
foo:"cat cat cat"~10000
...will only match documents containing three instances of the term
"cat" in the field "foo" where those instances are all withing 10000
term positions of eachother ... hte idea being that as long as the
"slop" (number) used is bigger then the largest document you expect
to deal with, this will esentially give you want you want.
Note too that by default solr only indexes the first 10k tokens, so
this should work for all documents in the index.
-Mike
Whoa! When I first read the original suggestion, I was thinking ^10000
because I happened to be googling "solr filter by score" (another topic
I learned is hardly worth persuing).
Yeah, I'm going to try that right now....
Jed