: how would you handle a query like "johnson AND johnson"? i don't want
: something that has "author: linden b. johnson" to hit, only things that
: actually have two occurrences.
I'm not even sure if/how that would be possible using the underlying
lucene Query objects available -- IIUC the BooleanQuery(.Builder) class
will optimize away duplicate clauses.
I think what you would need is something like TermQuery with a "minTf"
option? ... the code for that probably wouldn't be too hard -- but not
sure how you'd solve it for the general case, ex: "(+(A B) +(B C))" ...
such that if neither A nor C match then there must be 2 instances of B)
Oh wait ... one way you could probably do this would be with SpanNotQuery?
I think sometihng like "SpanNotQuery(SpanTermQuery("johnson"),
SpanTermQuery("johnson"))" would work.
I believe the only existing solr QParser that can create SpanNotQueries is
the XMLQueryParser...
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-XMLQueryParser
-Hoss
http://www.lucidworks.com/