Hi Upayavira - the URL was:
http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)&wt=json&indent=true&clustering=true&rows=1&df=FULL_DOCUMENT&debugQuery=true
Here is the relevant part of the response - notice that the default
field (FULL_DOCUMENT) is not in the response, and that it appears to
ignore parts of the query string.
"rawquerystring":"Collection:(COLLECT1008 OR COLLECT2587) AND
(amazon AND soap)",
"querystring":"Collection:(COLLECT1008 OR COLLECT2587) AND (amazon
AND soap)",
"parsedquery":"(+(Collection:(COLLECT1008
DisjunctionMaxQuery((id:OR^10.0 | text:or^0.5))
DisjunctionMaxQuery((id:COLLECT2587)^10.0 | text:collect2587^0.5))
DisjunctionMaxQuery((id:AND^10.0 | text:and^0.5))
DisjunctionMaxQuery((id:(amazon^10.0 | text:amazon^0.5))
DisjunctionMaxQuery((id:AND^10.0 | text:and^0.5))
DisjunctionMaxQuery((id:soap)^10.0 | text:soap^0.5))))/no_coord",
"parsedquery_toString":"+(Collection:(COLLECT1008 (id:OR^10.0 |
text:or^0.5) (id:COLLECT2587)^10.0 | text:collect2587^0.5) (id:AND^10.0
| text:and^0.5) (id:(amazon^10.0 | text:amazon^0.5) (id:AND^10.0 |
text:and^0.5) (id:soap)^10.0 | text:soap^0.5))",
"QParser":"ExtendedDismaxQParser",
"altquerystring":null,
"boost_queries":null,
"parsed_boost_queries":[],
"boostfuncs":null,
"explain":{
"COLLECT20001188691550":"\n0.05504096 = product of:\n 0.09632167
= sum of:\n 0.0077696578 = max of:\n 0.0077696578 =
weight(text:and^0.5 in 209834) [DefaultSimilarity], result of:\n
0.0077696578 = score(doc=209834,freq=1.0), product of:\n
0.005366315 = queryWeight, product of:\n 0.5 = boost\n
4.633143 = idf(docFreq=431817, maxDocs=16336337)\n
0.0023164903 = queryNorm\n 1.4478571 = fieldWeight in 209834, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 =
termFreq=1.0\n 4.633143 = idf(docFreq=431817,
maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n
0.03348729 = max of:\n 0.03348729 = weight(text:amazon^0.5 in
209834) [DefaultSimilarity], result of:\n 0.03348729 =
score(doc=209834,freq=1.0), product of:\n 0.01114077 =
queryWeight, product of:\n 0.5 = boost\n 9.618664 =
idf(docFreq=2951, maxDocs=16336337)\n 0.0023164903 =
queryNorm\n 3.0058324 = fieldWeight in 209834, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 =
termFreq=1.0\n 9.618664 = idf(docFreq=2951,
maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n
0.0077696578 = max of:\n 0.0077696578 = weight(text:and^0.5 in 209834)
[DefaultSimilarity], result of:\n 0.0077696578 =
score(doc=209834,freq=1.0), product of:\n 0.005366315 =
queryWeight, product of:\n 0.5 = boost\n 4.633143
= idf(docFreq=431817, maxDocs=16336337)\n 0.0023164903 =
queryNorm\n 1.4478571 = fieldWeight in 209834, product
of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 =
termFreq=1.0\n 4.633143 = idf(docFreq=431817,
maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n
0.047295064 = max of:\n 0.047295064 = weight(text:soap^0.5 in
209834) [DefaultSimilarity], result of:\n 0.047295064 =
score(doc=209834,freq=1.0), product of:\n 0.013239852 =
queryWeight, product of:\n 0.5 = boost\n 11.430959 =
idf(docFreq=481, maxDocs=16336337)\n 0.0023164903 = queryNorm\n
3.5721745 = fieldWeight in 209834, product of:\n 1.0 =
tf(freq=1.0), with freq of:\n 1.0 =
termFreq=1.0\n 11.430959 = idf(docFreq=481,
maxDocs=16336337)\n 0.3125 = fieldNorm(doc=209834)\n
0.5714286 = coord(4/7)\n"}}}
On 7/22/2015 3:36 PM, Upayavira wrote:
I'd be curious to see the parsed query that you get when adding
debugQuery=true to the URL. I bet that the clustering component is
extracting terms from the parsed query, and perhaps each of those
queries is parsed in some way differently?
Upayavira
On Wed, Jul 22, 2015, at 08:29 PM, Joseph Obernberger wrote:
Upon further investigation, it looks like it is either ignoring the
default field, or when the default field is specified the rest of the
query is ignored.
Example:
q=Field1:(term1 OR term2) AND (item1 OR item2)&df=Field2
that does not cluster correctly, but this does:
q=Field1:(term1 OR term2) AND Field2:(item1 OR item2)
-Joe
On 7/22/2015 3:21 PM, Joseph Obernberger wrote:
Hi - I'm using carrot2 inside of solr cloud and have noticed that
queries that involve parenthesis don't seem to work correctly. For
example if I have:
q=Field1:(term1 OR term2) AND Field2:(item1 OR item2)
The clustering seems to ignore the values in parenthesis. If instead
I do:
q=(Field1:term1 OR Field1:term2) AND (Field2:item1 OR Field2:item2)
this works as expected. Anyone else having this issue? Apart from
re-writing the cluster query, I'm not sure of a solution.
Thank you!
-Joe