A blog article about what you learned would be very welcome. These edge cases are something other people could certainly learn from. Share the knowledge forward etc.
Regards, Alex. ---- http://www.solr-start.com/ - Resources for Solr users, new and experienced On 21 November 2016 at 23:57, Mike Allen <mike.al...@thecommercepartnership.com> wrote: > Hi Mikhail, > > Thanks for your advice, it went a long way towards helping me get the right > documents in the first place, especially paramterising the block join with an > explicit v, as otherwise it was a nightmare of parser errors. Not to mention > I'm still figuring out the nuances of where I need a whitespace and where I > don't! However, I spent a part of the weekend fiddling around with spaces and > +'s and I believe I've got it working as I'd hoped. > > Again, many thanks, > > Mike > > -----Original Message----- > From: Mikhail Khludnev [mailto:m...@apache.org] > Sent: 18 November 2016 12:58 > To: solr-user > Subject: Re: Combined Dismax and Block Join Scoring on nested documents > > Hello Mike, > Structured queries in Solr are way cumbersome. > Start from: > q=+{!dismax v="skirt" qf="name"} +{!parent which=content_type:product > score=min v=childq}&childq=+in_stock:true^=0 {!func}list_price_gbp&... > > beside of "explain" there is a parsed query entry in debug that's more useful > for troubleshooting purposes. > Please also make sure that + is properly encoded by %2B and pass http hurdle. > > On Fri, Nov 18, 2016 at 2:14 PM, Mike Allen < > mike.al...@thecommercepartnership.com> wrote: > >> Apologies if I'm doing something incredibly stupid as I'm new to Solr. >> I am having an issue with scoring child documents in a block join >> query when including a dismax query. I'm actually a little unclear on >> whether or not that's a complete oxymoron, combining dismax and block join. >> >> >> >> Problem statement: Given a set of Product documents - which contain >> the product names and descriptions - which contain nested variant >> documents (see below for abridged example) - which contain the boolean >> stock status >> (in_stock) and the variant prices (list_price_gbp) - I want to do a >> Dismax query of, say, "skirt" on the product name (name) and sort the >> resulting product documents by the minimum price (list_price_gbp) of >> their child variant documents. Note that, although the abridged >> document doesn't show them, there are a number of other arbitrary >> fields which may be used as filter queries on the child documents, for >> example size or colour, which will in effect change the "active" >> minimum price of a product. Hence, denormalizing, or flattening, the >> documents is not really an option I want to pursue. >> >> >> >> An abridged example document returned by the Solr Admin Query console >> which I am querying: >> >> >> >> <doc> >> >> <str name="id">12345</str> >> >> <str name="content_type">product</str> >> >> <str name="name">black flared skirt</str> >> >> <float name="min_list_price_gbp">40.0</float> >> >> <result name="doc" numFound="2" start="0"> >> >> <doc> >> >> <str name="skuid">12345abcd</str> >> >> <str name="productid">12345</str> >> >> <str name="content_type">variant</str> >> >> <float >> name="list_price_gbp">65.0</float> >> >> <bool name="in_stock">true</bool> >> >> </doc> >> >> <doc> >> >> <str name="skuid">12345fghi</str> >> >> <str name="productid">12345</str> >> >> <str name="content_type">variant</str> >> >> <float >> name="list_price_gbp">40.0</float> >> >> <bool name="in_stock">true</bool> >> >> </doc> >> >> </doc> >> >> >> >> So I am familiar with the block join score mode; setting aside the >> dismax aspect for now, this query, using the Function Query >> {!func}list_price_gbp, with score ascending, returns documents ordered >> correctly, with a £2.00 >> (cheapest) product first: >> >> >> >> q={!parent which=content_type:product >> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms >> f="productid" >> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >> true))&start=0&row >> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >> >> >> >> The "explain" for this is: >> >> >> >> 2.0000184 = Score based on 1 child docs in range from 26752 to 26752, >> best >> match: >> >> 2.0000184 = sum of: >> >> 1.8374416E-5 = weight(in_stock:T in 26752) [], result of: >> >> 1.8374416E-5 = score(doc=26752,freq=1.0 = termFreq=1.0 >> >> ), product of: >> >> 1.8374416E-5 = idf(docFreq=27211, docCount=27211) >> >> 1.0 = tfNorm, computed from: >> >> 1.0 = termFreq=1.0 >> >> 1.2 = parameter k1 >> >> 0.0 = parameter b (norms omitted for field) >> >> 2.0 = FunctionQuery(float(list_price_gbp)), product of: >> >> 2.0 = float(list_price_gbp)=2.0 >> >> 1.0 = boost >> >> 1.0 = queryNorm >> >> >> >> Even though this is doing what I want, I have a slight niggle the that >> overall score is not just the result of the Function Query, however, >> as all results get the same tiny fraction added, it doesn't matter. >> >> >> >> However, when I prepend my dismax query: >> >> >> >> q={!dismax v="skirt" qf="name"}+{!parent which=content_type:product >> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms >> f="productid" >> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >> true))&start=0&row >> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >> >> >> >> The scoring is only dependent on the dismax scoring, where the "explain" >> for >> this is: >> >> >> >> 2.7600822 = sum of: >> >> 2.7600822 = weight(name:skirt in 13406) [], result of: >> >> 2.7600822 = score(doc=13406,freq=1.0 = termFreq=1.0 >> >> ), product of: >> >> 3.5851278 = idf(docFreq=103, docCount=3731) >> >> 0.76987 = tfNorm, computed from: >> >> 1.0 = termFreq=1.0 >> >> 1.2 = parameter k1 >> >> 0.75 = parameter b >> >> 4.108818 = avgFieldLength >> >> 7.111111 = fieldLength >> >> >> >> So in actual fact, with score ascending, it is ordering the results by >> least matching first and the nested document list_price_gbp is >> irrelevant. I strongly suspect I am being totally dumb and that this >> is expected behaviour for an obvious reason that escapes me, apart >> from perhaps it's because the two scoring methods are just plainly >> incompatible. >> >> >> >> I have additionally tried just doing a lucene query instead: >> >> >> >> q=+name:skirt +{!parent which=content_type:product score=min} >> (in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" >> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >> true))&start=0&row >> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >> >> >> >> The "explain" of this indicates it's scoring products, for which >> list_price_gbp simply does not exist, as the Function Query always >> returns zero. >> >> >> >> 6243963 = sum of: >> >> 3.624396 = weight(name:skirt in 18113) [], result of: >> >> 3.624396 = score(doc=18113,freq=1.0 = termFreq=1.0 >> >> ), product of: >> >> 3.5851278 = idf(docFreq=103, docCount=3731) >> >> 1.0109531 = tfNorm, computed from: >> >> 1.0 = termFreq=1.0 >> >> 1.2 = parameter k1 >> >> 0.75 = parameter b >> >> 4.108818 = avgFieldLength >> >> 4.0 = fieldLength >> >> 1.0 = >> {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( >> QueryBitSetProducer(con >> tent_type:product))), product of: >> >> 1.0 = boost >> >> 1.0 = queryNorm >> >> 0.0 = FunctionQuery(float(list_price_gbp)), product of: >> >> 0.0 = float(list_price_gbp)=0.0 >> >> 1.0 = boost >> >> 1.0 = queryNorm >> >> >> >> Indeed, if I change the Function Query field to a product scoped >> field, min_list_price_gbp, like so: >> >> >> >> q=+name:skirt +{!parent which=content_type:product >> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms >> f="productid" >> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >> true))&start=0&row >> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >> >> >> >> then the "explain" certainly does show the Function Query evaluating >> >> >> >> 8.624397 = sum of: >> >> 3.624396 = weight(name:skirt in 17890) [], result of: >> >> 3.624396 = score(doc=17890,freq=1.0 = termFreq=1.0 >> >> ), product of: >> >> 3.5851278 = idf(docFreq=103, docCount=3731) >> >> 1.0109531 = tfNorm, computed from: >> >> 1.0 = termFreq=1.0 >> >> 1.2 = parameter k1 >> >> 0.75 = parameter b >> >> 4.108818 = avgFieldLength >> >> 4.0 = fieldLength >> >> 1.0 = >> {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( >> QueryBitSetProducer(con >> tent_type:product))), product of: >> >> 1.0 = boost >> >> 1.0 = queryNorm >> >> 14.0 = FunctionQuery(float(min_list_price_gbp)), product of: >> >> 14.0 = float(min_list_price_gbp)=14.0 >> >> 1.0 = boost >> >> 1.0 = queryNorm >> >> >> >> My grasp of the syntax is pretty flakey, so I would be immensely >> grateful if someone could point out if I'm just doing something >> incredibly dumb. In my head, I see what I am trying to do as >> >> >> >> (some dismax or lucene query on parent document [e.g."skirt"]) >> >> => (get a subset of these parent docs based on a block >> join) >> >> => (where the children match a bunch >> of arbitrary filter queries [e.g. "colour:red"]) >> >> => (then subquery the >> child docs that match the same filter queries[e.g. "colour:red"]) >> >> => >> (then score this subset of child documents) >> >> >> => (and order by that score) >> >> >> >> >> Is this actually possible? I've been googling about this for a day or >> so and can't quite find anything definitive. I'm going to maybe try >> and dive into the solr source code, but I'm a c# guy, not java, >> without a debuggable environment as unneeded yet, and that could prove >> pretty painful. >> >> >> >> Any help would be appreciated, even if it is just "can't be done", as >> at least I could stop chasing my tail. >> >> >> >> Mike >> >> >> >> >> >> >> >> >> >> >> >> > > > -- > Sincerely yours > Mikhail Khludnev >