Re: Solr Clustering Issue

Upayavira Thu, 23 Jul 2015 07:37:39 -0700

I've seen something like this on another system - where the OR is
consumed as a query term rather than an operator.


Remember that Edismax will use the Lucene query parser (which supports
OR, etc) unless there is an exception, and defer to dismax if there is a
syntax error.

What I'd suggest here is trying the same query on the standard /select
URL (i.e. using the lucene query parser) and see whether it works there.
Remember to add debugQuery=true to see how it parses the query.

Upayavira

On Thu, Jul 23, 2015, at 02:51 PM, Joseph Obernberger wrote:
> Hi Upayavira - the URL was:
> 
> http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)&wt=json&indent=true&clustering=true&rows=1&df=FULL_DOCUMENT&debugQuery=true
> 
> Here is the relevant part of the response - notice that the default 
> field (FULL_DOCUMENT) is not in the response, and that it appears to 
> ignore parts of the query string.
> 
> 
>      "rawquerystring":"Collection:(COLLECT1008 OR COLLECT2587) AND 
> (amazon AND soap)",
>      "querystring":"Collection:(COLLECT1008 OR COLLECT2587) AND (amazon 
> AND soap)",
>      "parsedquery":"(+(Collection:(COLLECT1008 
> DisjunctionMaxQuery((id:OR^10.0 | text:or^0.5)) 
> DisjunctionMaxQuery((id:COLLECT2587)^10.0 | text:collect2587^0.5)) 
> DisjunctionMaxQuery((id:AND^10.0 | text:and^0.5)) 
> DisjunctionMaxQuery((id:(amazon^10.0 | text:amazon^0.5)) 
> DisjunctionMaxQuery((id:AND^10.0 | text:and^0.5)) 
> DisjunctionMaxQuery((id:soap)^10.0 | text:soap^0.5))))/no_coord",
>      "parsedquery_toString":"+(Collection:(COLLECT1008 (id:OR^10.0 | 
> text:or^0.5) (id:COLLECT2587)^10.0 | text:collect2587^0.5) (id:AND^10.0 
> | text:and^0.5) (id:(amazon^10.0 | text:amazon^0.5) (id:AND^10.0 | 
> text:and^0.5) (id:soap)^10.0 | text:soap^0.5))",
>      "QParser":"ExtendedDismaxQParser",
>      "altquerystring":null,
>      "boost_queries":null,
>      "parsed_boost_queries":[],
>      "boostfuncs":null,
>      "explain":{
>        "COLLECT20001188691550":"\n0.05504096 = product of:\n 0.09632167 
> = sum of:\n    0.0077696578 = max of:\n      0.0077696578 = 
> weight(text:and^0.5 in 209834) [DefaultSimilarity], result of:\n        
> 0.0077696578 = score(doc=209834,freq=1.0), product of:\n          
> 0.005366315 = queryWeight, product of:\n 0.5 = boost\n            
> 4.633143 = idf(docFreq=431817, maxDocs=16336337)\n            
> 0.0023164903 = queryNorm\n 1.4478571 = fieldWeight in 209834, product 
> of:\n            1.0 = tf(freq=1.0), with freq of:\n              1.0 = 
> termFreq=1.0\n            4.633143 = idf(docFreq=431817, 
> maxDocs=16336337)\n            0.3125 = fieldNorm(doc=209834)\n 
> 0.03348729 = max of:\n      0.03348729 = weight(text:amazon^0.5 in 
> 209834) [DefaultSimilarity], result of:\n        0.03348729 = 
> score(doc=209834,freq=1.0), product of:\n          0.01114077 = 
> queryWeight, product of:\n            0.5 = boost\n 9.618664 = 
> idf(docFreq=2951, maxDocs=16336337)\n 0.0023164903 = 
> queryNorm\n          3.0058324 = fieldWeight in 209834, product 
> of:\n            1.0 = tf(freq=1.0), with freq of:\n              1.0 = 
> termFreq=1.0\n            9.618664 = idf(docFreq=2951, 
> maxDocs=16336337)\n            0.3125 = fieldNorm(doc=209834)\n    
> 0.0077696578 = max of:\n 0.0077696578 = weight(text:and^0.5 in 209834) 
> [DefaultSimilarity], result of:\n        0.0077696578 = 
> score(doc=209834,freq=1.0), product of:\n          0.005366315 = 
> queryWeight, product of:\n            0.5 = boost\n            4.633143 
> = idf(docFreq=431817, maxDocs=16336337)\n            0.0023164903 = 
> queryNorm\n          1.4478571 = fieldWeight in 209834, product 
> of:\n            1.0 = tf(freq=1.0), with freq of:\n 1.0 = 
> termFreq=1.0\n            4.633143 = idf(docFreq=431817, 
> maxDocs=16336337)\n            0.3125 = fieldNorm(doc=209834)\n 
> 0.047295064 = max of:\n      0.047295064 = weight(text:soap^0.5 in 
> 209834) [DefaultSimilarity], result of:\n        0.047295064 = 
> score(doc=209834,freq=1.0), product of:\n          0.013239852 = 
> queryWeight, product of:\n            0.5 = boost\n 11.430959 = 
> idf(docFreq=481, maxDocs=16336337)\n 0.0023164903 = queryNorm\n          
> 3.5721745 = fieldWeight in 209834, product of:\n            1.0 = 
> tf(freq=1.0), with freq of:\n              1.0 = 
> termFreq=1.0\n            11.430959 = idf(docFreq=481, 
> maxDocs=16336337)\n            0.3125 = fieldNorm(doc=209834)\n  
> 0.5714286 = coord(4/7)\n"}}}
> 
> On 7/22/2015 3:36 PM, Upayavira wrote:
> > I'd be curious to see the parsed query that you get when adding
> > debugQuery=true to the URL. I bet that the clustering component is
> > extracting terms from the parsed query, and perhaps each of those
> > queries is parsed in some way differently?
> >
> > Upayavira
> >
> > On Wed, Jul 22, 2015, at 08:29 PM, Joseph Obernberger wrote:
> >> Upon further investigation, it looks like it is either ignoring the
> >> default field, or when the default field is specified the rest of the
> >> query is ignored.
> >>
> >> Example:
> >> q=Field1:(term1 OR term2) AND (item1 OR item2)&df=Field2
> >> that does not cluster correctly, but this does:
> >> q=Field1:(term1 OR term2) AND Field2:(item1 OR item2)
> >>
> >> -Joe
> >>
> >> On 7/22/2015 3:21 PM, Joseph Obernberger wrote:
> >>> Hi - I'm using carrot2 inside of solr cloud and have noticed that
> >>> queries that involve parenthesis don't seem to work correctly.  For
> >>> example if I have:
> >>> q=Field1:(term1 OR term2) AND Field2:(item1 OR item2)
> >>> The clustering seems to ignore the values in parenthesis.  If instead
> >>> I do:
> >>> q=(Field1:term1 OR Field1:term2) AND (Field2:item1 OR Field2:item2)
> >>> this works as expected.  Anyone else having this issue?  Apart from
> >>> re-writing the cluster query, I'm not sure of a solution.
> >>>
> >>> Thank you!
> >>>
> >>> -Joe
> >>>
>

Re: Solr Clustering Issue

Reply via email to