Re: Solr Clustering Issue

Joseph Obernberger Thu, 23 Jul 2015 06:52:21 -0700

Hi Upayavira - the URL was:

http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)&wt=json&indent=true&clustering=true&rows=1&df=FULL_DOCUMENT&debugQuery=true

Here is the relevant part of the response - notice that the defaultfield (FULL_DOCUMENT) is not in the response, and that it appears toignore parts of the query string.

"rawquerystring":"Collection:(COLLECT1008 OR COLLECT2587) AND(amazon AND soap)","querystring":"Collection:(COLLECT1008 OR COLLECT2587) AND (amazonAND soap)","parsedquery":"(+(Collection:(COLLECT1008DisjunctionMaxQuery((id:OR^10.0 | text:or^0.5))DisjunctionMaxQuery((id:COLLECT2587)^10.0 | text:collect2587^0.5))DisjunctionMaxQuery((id:AND^10.0 | text:and^0.5))DisjunctionMaxQuery((id:(amazon^10.0 | text:amazon^0.5))DisjunctionMaxQuery((id:AND^10.0 | text:and^0.5))DisjunctionMaxQuery((id:soap)^10.0 | text:soap^0.5))))/no_coord","parsedquery_toString":"+(Collection:(COLLECT1008 (id:OR^10.0 |text:or^0.5) (id:COLLECT2587)^10.0 | text:collect2587^0.5) (id:AND^10.0| text:and^0.5) (id:(amazon^10.0 | text:amazon^0.5) (id:AND^10.0 |text:and^0.5) (id:soap)^10.0 | text:soap^0.5))",

    "QParser":"ExtendedDismaxQParser",
    "altquerystring":null,
    "boost_queries":null,
    "parsed_boost_queries":[],
    "boostfuncs":null,
    "explain":{

"COLLECT20001188691550":"\n0.05504096 = product of:\n 0.09632167= sum of:\n 0.0077696578 = max of:\n 0.0077696578 =weight(text:and^0.5 in 209834) [DefaultSimilarity], result of:\n0.0077696578 = score(doc=209834,freq=1.0), product of:\n0.005366315 = queryWeight, product of:\n 0.5 = boost\n4.633143 = idf(docFreq=431817, maxDocs=16336337)\n0.0023164903 = queryNorm\n 1.4478571 = fieldWeight in 209834, productof:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 =termFreq=1.0\n 4.633143 = idf(docFreq=431817,maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n0.03348729 = max of:\n 0.03348729 = weight(text:amazon^0.5 in209834) [DefaultSimilarity], result of:\n 0.03348729 =score(doc=209834,freq=1.0), product of:\n 0.01114077 =queryWeight, product of:\n 0.5 = boost\n 9.618664 =idf(docFreq=2951, maxDocs=16336337)\n 0.0023164903 =queryNorm\n 3.0058324 = fieldWeight in 209834, productof:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 =termFreq=1.0\n 9.618664 = idf(docFreq=2951,maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n0.0077696578 = max of:\n 0.0077696578 = weight(text:and^0.5 in 209834)[DefaultSimilarity], result of:\n 0.0077696578 =score(doc=209834,freq=1.0), product of:\n 0.005366315 =queryWeight, product of:\n 0.5 = boost\n 4.633143= idf(docFreq=431817, maxDocs=16336337)\n 0.0023164903 =queryNorm\n 1.4478571 = fieldWeight in 209834, productof:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 =termFreq=1.0\n 4.633143 = idf(docFreq=431817,maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n0.047295064 = max of:\n 0.047295064 = weight(text:soap^0.5 in209834) [DefaultSimilarity], result of:\n 0.047295064 =score(doc=209834,freq=1.0), product of:\n 0.013239852 =queryWeight, product of:\n 0.5 = boost\n 11.430959 =idf(docFreq=481, maxDocs=16336337)\n 0.0023164903 = queryNorm\n3.5721745 = fieldWeight in 209834, product of:\n 1.0 =tf(freq=1.0), with freq of:\n 1.0 =termFreq=1.0\n 11.430959 = idf(docFreq=481,maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n0.5714286 = coord(4/7)\n"}}}


On 7/22/2015 3:36 PM, Upayavira wrote:

I'd be curious to see the parsed query that you get when adding
debugQuery=true to the URL. I bet that the clustering component is
extracting terms from the parsed query, and perhaps each of those
queries is parsed in some way differently?

Upayavira

On Wed, Jul 22, 2015, at 08:29 PM, Joseph Obernberger wrote:

Upon further investigation, it looks like it is either ignoring the
default field, or when the default field is specified the rest of the
query is ignored.

Example:
q=Field1:(term1 OR term2) AND (item1 OR item2)&df=Field2
that does not cluster correctly, but this does:
q=Field1:(term1 OR term2) AND Field2:(item1 OR item2)

-Joe

On 7/22/2015 3:21 PM, Joseph Obernberger wrote:

Hi - I'm using carrot2 inside of solr cloud and have noticed that
queries that involve parenthesis don't seem to work correctly.  For
example if I have:
q=Field1:(term1 OR term2) AND Field2:(item1 OR item2)
The clustering seems to ignore the values in parenthesis.  If instead
I do:
q=(Field1:term1 OR Field1:term2) AND (Field2:item1 OR Field2:item2)
this works as expected.  Anyone else having this issue?  Apart from
re-writing the cluster query, I'm not sure of a solution.

Thank you!

-Joe

Re: Solr Clustering Issue

Reply via email to