: how would you handle a query like "johnson AND johnson"? i don't want : something that has "author: linden b. johnson" to hit, only things that : actually have two occurrences.
I'm not even sure if/how that would be possible using the underlying lucene Query objects available -- IIUC the BooleanQuery(.Builder) class will optimize away duplicate clauses. I think what you would need is something like TermQuery with a "minTf" option? ... the code for that probably wouldn't be too hard -- but not sure how you'd solve it for the general case, ex: "(+(A B) +(B C))" ... such that if neither A nor C match then there must be 2 instances of B) Oh wait ... one way you could probably do this would be with SpanNotQuery? I think sometihng like "SpanNotQuery(SpanTermQuery("johnson"), SpanTermQuery("johnson"))" would work. I believe the only existing solr QParser that can create SpanNotQueries is the XMLQueryParser... https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-XMLQueryParser -Hoss http://www.lucidworks.com/