On 12-Jul-07, at 5:58 PM, Lance Lance wrote:

Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and
Solr 1.2.

We have documents with searchable text and a field 'collection'.

This query works as expected, finding everything except for collections
'pile1' and 'pile2'.

    text -(collection:pile1 OR collection:pile2)

When we apply De Morgan's Law, we get 0 records:

    text (-collection:pile1 AND -collection:pile2)

This should return all records, but it returns nothing:

    text (-collection:pile1 OR -collection:pile2)

Lucene's "boolean" operators are not true boolean operators. Instead, every clause is one of:

OPTIONAL
REQUIRED
PROHIBITED

for a query (or parenthesized subqueries) to match, all REQUIRED clauses must match, zero PROHIBITED clauses must match, and if there are not REQUIRED clauses, at least one OPTIONAL must match. You cannot have only PROHIBITED clauses.

Now, the syntax for each is (nothing), +, -, and they can be applied to entire subqueries using brackets:

+hello -(goodbye -night)

returns docs that have hello, and do not have (goodbye without night)

In lucene, AND/OR/NOT are syntactic sugar that translates clauses to the above form. However, it imperfectly matches people's (rational) expectations of how boolean operators work. Also, brackets _create subqueries_, not just group operators. I suggest that AND and OR never be used programmatically, if possible.

Try these alternatives:

docs (must) containing 'text' that do not match (col=pile1 or col=pile2)
    text -(collection:pile1 collection:pile2)

same as above
    text -collection:pile1 -collection:pile2

docs (must) contain 'text' that (must) match (col=pile1 or col=pile2)
    +text +(collection:pile1 collection:pile2)

Note in the last example, the + is necessary before the text because otherwise it would be optional and not required (as there are other required clauses).

-Mike




Reply via email to