Hello Mike, Structured queries in Solr are way cumbersome. Start from: q=+{!dismax v="skirt" qf="name"} +{!parent which=content_type:product score=min v=childq}&childq=+in_stock:true^=0 {!func}list_price_gbp&...
beside of "explain" there is a parsed query entry in debug that's more useful for troubleshooting purposes. Please also make sure that + is properly encoded by %2B and pass http hurdle. On Fri, Nov 18, 2016 at 2:14 PM, Mike Allen < mike.al...@thecommercepartnership.com> wrote: > Apologies if I'm doing something incredibly stupid as I'm new to Solr. I am > having an issue with scoring child documents in a block join query when > including a dismax query. I'm actually a little unclear on whether or not > that's a complete oxymoron, combining dismax and block join. > > > > Problem statement: Given a set of Product documents - which contain the > product names and descriptions - which contain nested variant documents > (see > below for abridged example) - which contain the boolean stock status > (in_stock) and the variant prices (list_price_gbp) - I want to do a Dismax > query of, say, "skirt" on the product name (name) and sort the resulting > product documents by the minimum price (list_price_gbp) of their child > variant documents. Note that, although the abridged document doesn't show > them, there are a number of other arbitrary fields which may be used as > filter queries on the child documents, for example size or colour, which > will in effect change the "active" minimum price of a product. Hence, > denormalizing, or flattening, the documents is not really an option I want > to pursue. > > > > An abridged example document returned by the Solr Admin Query console which > I am querying: > > > > <doc> > > <str name="id">12345</str> > > <str name="content_type">product</str> > > <str name="name">black flared skirt</str> > > <float name="min_list_price_gbp">40.0</float> > > <result name="doc" numFound="2" start="0"> > > <doc> > > <str name="skuid">12345abcd</str> > > <str name="productid">12345</str> > > <str name="content_type">variant</str> > > <float name="list_price_gbp">65.0</float> > > <bool name="in_stock">true</bool> > > </doc> > > <doc> > > <str name="skuid">12345fghi</str> > > <str name="productid">12345</str> > > <str name="content_type">variant</str> > > <float name="list_price_gbp">40.0</float> > > <bool name="in_stock">true</bool> > > </doc> > > </doc> > > > > So I am familiar with the block join score mode; setting aside the dismax > aspect for now, this query, using the Function Query {!func}list_price_gbp, > with score ascending, returns documents ordered correctly, with a £2.00 > (cheapest) product first: > > > > q={!parent which=content_type:product > score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms > f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > The "explain" for this is: > > > > 2.0000184 = Score based on 1 child docs in range from 26752 to 26752, best > match: > > 2.0000184 = sum of: > > 1.8374416E-5 = weight(in_stock:T in 26752) [], result of: > > 1.8374416E-5 = score(doc=26752,freq=1.0 = termFreq=1.0 > > ), product of: > > 1.8374416E-5 = idf(docFreq=27211, docCount=27211) > > 1.0 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.0 = parameter b (norms omitted for field) > > 2.0 = FunctionQuery(float(list_price_gbp)), product of: > > 2.0 = float(list_price_gbp)=2.0 > > 1.0 = boost > > 1.0 = queryNorm > > > > Even though this is doing what I want, I have a slight niggle the that > overall score is not just the result of the Function Query, however, as all > results get the same tiny fraction added, it doesn't matter. > > > > However, when I prepend my dismax query: > > > > q={!dismax v="skirt" qf="name"}+{!parent which=content_type:product > score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms > f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > The scoring is only dependent on the dismax scoring, where the "explain" > for > this is: > > > > 2.7600822 = sum of: > > 2.7600822 = weight(name:skirt in 13406) [], result of: > > 2.7600822 = score(doc=13406,freq=1.0 = termFreq=1.0 > > ), product of: > > 3.5851278 = idf(docFreq=103, docCount=3731) > > 0.76987 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.75 = parameter b > > 4.108818 = avgFieldLength > > 7.111111 = fieldLength > > > > So in actual fact, with score ascending, it is ordering the results by > least > matching first and the nested document list_price_gbp is irrelevant. I > strongly suspect I am being totally dumb and that this is expected > behaviour > for an obvious reason that escapes me, apart from perhaps it's because the > two scoring methods are just plainly incompatible. > > > > I have additionally tried just doing a lucene query instead: > > > > q=+name:skirt +{!parent which=content_type:product score=min} > (in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > The "explain" of this indicates it's scoring products, for which > list_price_gbp simply does not exist, as the Function Query always returns > zero. > > > > 6243963 = sum of: > > 3.624396 = weight(name:skirt in 18113) [], result of: > > 3.624396 = score(doc=18113,freq=1.0 = termFreq=1.0 > > ), product of: > > 3.5851278 = idf(docFreq=103, docCount=3731) > > 1.0109531 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.75 = parameter b > > 4.108818 = avgFieldLength > > 4.0 = fieldLength > > 1.0 = > {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( > QueryBitSetProducer(con > tent_type:product))), product of: > > 1.0 = boost > > 1.0 = queryNorm > > 0.0 = FunctionQuery(float(list_price_gbp)), product of: > > 0.0 = float(list_price_gbp)=0.0 > > 1.0 = boost > > 1.0 = queryNorm > > > > Indeed, if I change the Function Query field to a product scoped field, > min_list_price_gbp, like so: > > > > q=+name:skirt +{!parent which=content_type:product > score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms > f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > then the "explain" certainly does show the Function Query evaluating > > > > 8.624397 = sum of: > > 3.624396 = weight(name:skirt in 17890) [], result of: > > 3.624396 = score(doc=17890,freq=1.0 = termFreq=1.0 > > ), product of: > > 3.5851278 = idf(docFreq=103, docCount=3731) > > 1.0109531 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.75 = parameter b > > 4.108818 = avgFieldLength > > 4.0 = fieldLength > > 1.0 = > {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( > QueryBitSetProducer(con > tent_type:product))), product of: > > 1.0 = boost > > 1.0 = queryNorm > > 14.0 = FunctionQuery(float(min_list_price_gbp)), product of: > > 14.0 = float(min_list_price_gbp)=14.0 > > 1.0 = boost > > 1.0 = queryNorm > > > > My grasp of the syntax is pretty flakey, so I would be immensely grateful > if > someone could point out if I'm just doing something incredibly dumb. In my > head, I see what I am trying to do as > > > > (some dismax or lucene query on parent document [e.g."skirt"]) > > => (get a subset of these parent docs based on a block > join) > > => (where the children match a bunch of > arbitrary filter queries [e.g. "colour:red"]) > > => (then subquery the child > docs that match the same filter queries[e.g. "colour:red"]) > > => (then > score this subset of child documents) > > > => (and order by that score) > > > > > Is this actually possible? I've been googling about this for a day or so > and > can't quite find anything definitive. I'm going to maybe try and dive into > the solr source code, but I'm a c# guy, not java, without a debuggable > environment as unneeded yet, and that could prove pretty painful. > > > > Any help would be appreciated, even if it is just "can't be done", as at > least I could stop chasing my tail. > > > > Mike > > > > > > > > > > > > -- Sincerely yours Mikhail Khludnev