Hi Mikhail, Thanks for your advice, it went a long way towards helping me get the right documents in the first place, especially paramterising the block join with an explicit v, as otherwise it was a nightmare of parser errors. Not to mention I'm still figuring out the nuances of where I need a whitespace and where I don't! However, I spent a part of the weekend fiddling around with spaces and +'s and I believe I've got it working as I'd hoped.
Again, many thanks, Mike -----Original Message----- From: Mikhail Khludnev [mailto:m...@apache.org] Sent: 18 November 2016 12:58 To: solr-user Subject: Re: Combined Dismax and Block Join Scoring on nested documents Hello Mike, Structured queries in Solr are way cumbersome. Start from: q=+{!dismax v="skirt" qf="name"} +{!parent which=content_type:product score=min v=childq}&childq=+in_stock:true^=0 {!func}list_price_gbp&... beside of "explain" there is a parsed query entry in debug that's more useful for troubleshooting purposes. Please also make sure that + is properly encoded by %2B and pass http hurdle. On Fri, Nov 18, 2016 at 2:14 PM, Mike Allen < mike.al...@thecommercepartnership.com> wrote: > Apologies if I'm doing something incredibly stupid as I'm new to Solr. > I am having an issue with scoring child documents in a block join > query when including a dismax query. I'm actually a little unclear on > whether or not that's a complete oxymoron, combining dismax and block join. > > > > Problem statement: Given a set of Product documents - which contain > the product names and descriptions - which contain nested variant > documents (see below for abridged example) - which contain the boolean > stock status > (in_stock) and the variant prices (list_price_gbp) - I want to do a > Dismax query of, say, "skirt" on the product name (name) and sort the > resulting product documents by the minimum price (list_price_gbp) of > their child variant documents. Note that, although the abridged > document doesn't show them, there are a number of other arbitrary > fields which may be used as filter queries on the child documents, for > example size or colour, which will in effect change the "active" > minimum price of a product. Hence, denormalizing, or flattening, the > documents is not really an option I want to pursue. > > > > An abridged example document returned by the Solr Admin Query console > which I am querying: > > > > <doc> > > <str name="id">12345</str> > > <str name="content_type">product</str> > > <str name="name">black flared skirt</str> > > <float name="min_list_price_gbp">40.0</float> > > <result name="doc" numFound="2" start="0"> > > <doc> > > <str name="skuid">12345abcd</str> > > <str name="productid">12345</str> > > <str name="content_type">variant</str> > > <float > name="list_price_gbp">65.0</float> > > <bool name="in_stock">true</bool> > > </doc> > > <doc> > > <str name="skuid">12345fghi</str> > > <str name="productid">12345</str> > > <str name="content_type">variant</str> > > <float > name="list_price_gbp">40.0</float> > > <bool name="in_stock">true</bool> > > </doc> > > </doc> > > > > So I am familiar with the block join score mode; setting aside the > dismax aspect for now, this query, using the Function Query > {!func}list_price_gbp, with score ascending, returns documents ordered > correctly, with a £2.00 > (cheapest) product first: > > > > q={!parent which=content_type:product > score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms > f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > The "explain" for this is: > > > > 2.0000184 = Score based on 1 child docs in range from 26752 to 26752, > best > match: > > 2.0000184 = sum of: > > 1.8374416E-5 = weight(in_stock:T in 26752) [], result of: > > 1.8374416E-5 = score(doc=26752,freq=1.0 = termFreq=1.0 > > ), product of: > > 1.8374416E-5 = idf(docFreq=27211, docCount=27211) > > 1.0 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.0 = parameter b (norms omitted for field) > > 2.0 = FunctionQuery(float(list_price_gbp)), product of: > > 2.0 = float(list_price_gbp)=2.0 > > 1.0 = boost > > 1.0 = queryNorm > > > > Even though this is doing what I want, I have a slight niggle the that > overall score is not just the result of the Function Query, however, > as all results get the same tiny fraction added, it doesn't matter. > > > > However, when I prepend my dismax query: > > > > q={!dismax v="skirt" qf="name"}+{!parent which=content_type:product > score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms > f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > The scoring is only dependent on the dismax scoring, where the "explain" > for > this is: > > > > 2.7600822 = sum of: > > 2.7600822 = weight(name:skirt in 13406) [], result of: > > 2.7600822 = score(doc=13406,freq=1.0 = termFreq=1.0 > > ), product of: > > 3.5851278 = idf(docFreq=103, docCount=3731) > > 0.76987 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.75 = parameter b > > 4.108818 = avgFieldLength > > 7.111111 = fieldLength > > > > So in actual fact, with score ascending, it is ordering the results by > least matching first and the nested document list_price_gbp is > irrelevant. I strongly suspect I am being totally dumb and that this > is expected behaviour for an obvious reason that escapes me, apart > from perhaps it's because the two scoring methods are just plainly > incompatible. > > > > I have additionally tried just doing a lucene query instead: > > > > q=+name:skirt +{!parent which=content_type:product score=min} > (in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > The "explain" of this indicates it's scoring products, for which > list_price_gbp simply does not exist, as the Function Query always > returns zero. > > > > 6243963 = sum of: > > 3.624396 = weight(name:skirt in 18113) [], result of: > > 3.624396 = score(doc=18113,freq=1.0 = termFreq=1.0 > > ), product of: > > 3.5851278 = idf(docFreq=103, docCount=3731) > > 1.0109531 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.75 = parameter b > > 4.108818 = avgFieldLength > > 4.0 = fieldLength > > 1.0 = > {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( > QueryBitSetProducer(con > tent_type:product))), product of: > > 1.0 = boost > > 1.0 = queryNorm > > 0.0 = FunctionQuery(float(list_price_gbp)), product of: > > 0.0 = float(list_price_gbp)=0.0 > > 1.0 = boost > > 1.0 = queryNorm > > > > Indeed, if I change the Function Query field to a product scoped > field, min_list_price_gbp, like so: > > > > q=+name:skirt +{!parent which=content_type:product > score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms > f="productid" > v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( > true))&start=0&row > s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml > > > > then the "explain" certainly does show the Function Query evaluating > > > > 8.624397 = sum of: > > 3.624396 = weight(name:skirt in 17890) [], result of: > > 3.624396 = score(doc=17890,freq=1.0 = termFreq=1.0 > > ), product of: > > 3.5851278 = idf(docFreq=103, docCount=3731) > > 1.0109531 = tfNorm, computed from: > > 1.0 = termFreq=1.0 > > 1.2 = parameter k1 > > 0.75 = parameter b > > 4.108818 = avgFieldLength > > 4.0 = fieldLength > > 1.0 = > {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( > QueryBitSetProducer(con > tent_type:product))), product of: > > 1.0 = boost > > 1.0 = queryNorm > > 14.0 = FunctionQuery(float(min_list_price_gbp)), product of: > > 14.0 = float(min_list_price_gbp)=14.0 > > 1.0 = boost > > 1.0 = queryNorm > > > > My grasp of the syntax is pretty flakey, so I would be immensely > grateful if someone could point out if I'm just doing something > incredibly dumb. In my head, I see what I am trying to do as > > > > (some dismax or lucene query on parent document [e.g."skirt"]) > > => (get a subset of these parent docs based on a block > join) > > => (where the children match a bunch > of arbitrary filter queries [e.g. "colour:red"]) > > => (then subquery the > child docs that match the same filter queries[e.g. "colour:red"]) > > => > (then score this subset of child documents) > > > => (and order by that score) > > > > > Is this actually possible? I've been googling about this for a day or > so and can't quite find anything definitive. I'm going to maybe try > and dive into the solr source code, but I'm a c# guy, not java, > without a debuggable environment as unneeded yet, and that could prove > pretty painful. > > > > Any help would be appreciated, even if it is just "can't be done", as > at least I could stop chasing my tail. > > > > Mike > > > > > > > > > > > > -- Sincerely yours Mikhail Khludnev