Re: Is there a parsing issue with "OR NOT" or is something else going on? (Solr 6)

Erick Erickson Mon, 02 Oct 2017 15:44:04 -0700

Solr does not (and never has) implemented pure boolean logic. See:
https://lucidworks.com/2011/12/28/why-not-and-or-and-not/


I think your second clause is evaluated as though it were:

("batman" AND "indiana jones") OR (*:* -"cancer")

which is much more what you want.

Best,
Erick

On Mon, Oct 2, 2017 at 10:41 AM, Michael Joyner <mich...@newsrx.com> wrote:
> Hello all,
>
> What is the difference between the following two queries that causes them to
> give different results? Is there a parsing issue with "OR NOT" or is
> something else going on?
>
> a) ("batman" AND "indiana jones") OR NOT ("cancer") /*only seems to match
> the and clause*/
>
> parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 |
> (_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1)
> +((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 |
> (_text_txt_en_split:"indiana jone")^0.1)) -(+((_text_ws:cancer)^2.0 |
> (_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1))))
>
> b) ("batman" AND "indiana jones") OR (NOT ("cancer")) /*gives the results we
> expected*/
>
> parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 |
> (_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1)
> +((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 |
> (_text_txt_en_split:"indiana jone")^0.1)) (-(+((_text_ws:cancer)^2.0 |
> (_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1)) +*:*)^1.0))
>
> The first thing I notice is the '+*.*)^1.0' component in the 2nd query's
> parsedquery which is not in the 1st query's parsedquery response. The first
> query does not seem to be matching any of the "NOT" articles to include in
> the union of sets and is not giving us the expected results. Is wrapping
> "NOT" a general requirement when preceded by an operator?
>
> We are using SolrCloud 6.6 and are using q.op=AND with edismax.
>
> Thanks!
>
> -Michael/NewsRx
>
> Full debug outputs:
>
> {rawquerystring={!boost
> b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND
> "indiana jones") OR NOT ("cancer")), querystring={!boost
> b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND
> "indiana jones") OR NOT ("cancer")),
> parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 |
> (_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1)
> +((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 |
> (_text_txt_en_split:"indiana jone")^0.1)) -(+((_text_ws:cancer)^2.0 |
> (_text_txt:cancer)^0.5 |
> (_text_txt_en_split:cancer)^0.1)))),1.0/(3.16E-11*float(ms(const(1506916800000),date(issuedate_tdt)))+1.0))),
> parsedquery_toString=boost(+(+((+((_text_ws:batman)^2.0 |
> (_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1)
> +((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 |
> (_text_txt_en_split:"indiana jone")^0.1)) -(+((_text_ws:cancer)^2.0 |
> (_text_txt:cancer)^0.5 |
> (_text_txt_en_split:cancer)^0.1)))),1.0/(3.16E-11*float(ms(const(1506916800000),date(issuedate_tdt)))+1.0)),
> QParser=ExtendedDismaxQParser, altquerystring=null, boost_queries=null,
> parsed_boost_queries=[], boostfuncs=null,
> boost_str=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1),
> boost_parsed=org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction:1.0/(3.16E-11*float(ms(const(1506916800000),date(issuedate_tdt)))+1.0),
> filter_queries=[issuedate_tdt:[2000\-09\-18T04\:00\:00Z/DAY TO
> 2017\-10\-02T04\:00\:00Z/DAY+1DAY}, types_ss:(TrademarkApp OR Stockmarket OR
> AllClinicalTrials OR PressRelease OR Patent OR SEC OR Scholarly OR
> ClinicalTrial)], parsed_filter_queries=[+issuedate_tdt:[969249600000 TO
> 1507003200000}, +(types_ss:TrademarkApp types_ss:Stockmarket
> types_ss:AllClinicalTrials types_ss:PressRelease types_ss:Patent
> types_ss:SEC types_ss:Scholarly types_ss:ClinicalTrial)]}
>
> {rawquerystring={!boost
> b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND
> "indiana jones") OR (NOT ("cancer"))), querystring={!boost
> b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND
> "indiana jones") OR (NOT ("cancer"))),
> parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 |
> (_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1)
> +((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 |
> (_text_txt_en_split:"indiana jone")^0.1)) (-(+((_text_ws:cancer)^2.0 |
> (_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1))
> +*:*)^1.0)),1.0/(3.16E-11*float(ms(const(1506916800000),date(issuedate_tdt)))+1.0))),
> parsedquery_toString=boost(+(+((+((_text_ws:batman)^2.0 |
> (_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1)
> +((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 |
> (_text_txt_en_split:"indiana jone")^0.1)) (-(+((_text_ws:cancer)^2.0 |
> (_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1))
> +*:*)^1.0)),1.0/(3.16E-11*float(ms(const(1506916800000),date(issuedate_tdt)))+1.0)),
> QParser=ExtendedDismaxQParser, altquerystring=null, boost_queries=null,
> parsed_boost_queries=[], boostfuncs=null,
> boost_str=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1),
> boost_parsed=org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction:1.0/(3.16E-11*float(ms(const(1506916800000),date(issuedate_tdt)))+1.0),
> filter_queries=[issuedate_tdt:[2000\-09\-18T04\:00\:00Z/DAY TO
> 2017\-10\-02T04\:00\:00Z/DAY+1DAY}, types_ss:(TrademarkApp OR Stockmarket OR
> AllClinicalTrials OR PressRelease OR Patent OR SEC OR Scholarly OR
> ClinicalTrial)], parsed_filter_queries=[+issuedate_tdt:[969249600000 TO
> 1507003200000}, +(types_ss:TrademarkApp types_ss:Stockmarket
> types_ss:AllClinicalTrials types_ss:PressRelease types_ss:Patent
> types_ss:SEC types_ss:Scholarly types_ss:ClinicalTrial)]}

Re: Is there a parsing issue with "OR NOT" or is something else going on? (Solr 6)

Reply via email to