On 12-Jul-07, at 5:58 PM, Lance Lance wrote:
Are there any known bugs in the syntax parser? We're using
lucene-2.2.0 and
Solr 1.2.
We have documents with searchable text and a field 'collection'.
This query works as expected, finding everything except for
collections
'pile1' and 'pile2'.
text -(collection:pile1 OR collection:pile2)
When we apply De Morgan's Law, we get 0 records:
text (-collection:pile1 AND -collection:pile2)
This should return all records, but it returns nothing:
text (-collection:pile1 OR -collection:pile2)
Lucene's "boolean" operators are not true boolean operators.
Instead, every clause is one of:
OPTIONAL
REQUIRED
PROHIBITED
for a query (or parenthesized subqueries) to match, all REQUIRED
clauses must match, zero PROHIBITED clauses must match, and if there
are not REQUIRED clauses, at least one OPTIONAL must match. You
cannot have only PROHIBITED clauses.
Now, the syntax for each is (nothing), +, -, and they can be applied
to entire subqueries using brackets:
+hello -(goodbye -night)
returns docs that have hello, and do not have (goodbye without night)
In lucene, AND/OR/NOT are syntactic sugar that translates clauses to
the above form. However, it imperfectly matches people's (rational)
expectations of how boolean operators work. Also, brackets _create
subqueries_, not just group operators. I suggest that AND and OR
never be used programmatically, if possible.
Try these alternatives:
docs (must) containing 'text' that do not match (col=pile1 or col=pile2)
text -(collection:pile1 collection:pile2)
same as above
text -collection:pile1 -collection:pile2
docs (must) contain 'text' that (must) match (col=pile1 or col=pile2)
+text +(collection:pile1 collection:pile2)
Note in the last example, the + is necessary before the text because
otherwise it would be optional and not required (as there are other
required clauses).
-Mike