You could do: *) LinkedIn *) Wiki *) Write it up, give it to me and I'll stick it as a guest post on my blog (with attribution of your choice) *) Write it up, give it to Lucidworks and they may (not sure about rules) stick it on their blog
Regards, Alex. ---- http://www.solr-start.com/ - Resources for Solr users, new and experienced On 22 November 2016 at 02:36, Mike Allen <mike.al...@thecommercepartnership.com> wrote: > Sure thing Alex. I don't actually do any personal blogging, but if there's a > suitable place - the Solr Wiki perhaps - you'd suggest I can write something > up I'd be more than happy to. What goes around comes around! > > -----Original Message----- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: 21 November 2016 13:01 > To: solr-user > Subject: Re: Combined Dismax and Block Join Scoring on nested documents > > A blog article about what you learned would be very welcome. These edge cases > are something other people could certainly learn from. > Share the knowledge forward etc. > > Regards, > Alex. > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 21 November 2016 at 23:57, Mike Allen > <mike.al...@thecommercepartnership.com> wrote: >> Hi Mikhail, >> >> Thanks for your advice, it went a long way towards helping me get the right >> documents in the first place, especially paramterising the block join with >> an explicit v, as otherwise it was a nightmare of parser errors. Not to >> mention I'm still figuring out the nuances of where I need a whitespace and >> where I don't! However, I spent a part of the weekend fiddling around with >> spaces and +'s and I believe I've got it working as I'd hoped. >> >> Again, many thanks, >> >> Mike >> >> -----Original Message----- >> From: Mikhail Khludnev [mailto:m...@apache.org] >> Sent: 18 November 2016 12:58 >> To: solr-user >> Subject: Re: Combined Dismax and Block Join Scoring on nested >> documents >> >> Hello Mike, >> Structured queries in Solr are way cumbersome. >> Start from: >> q=+{!dismax v="skirt" qf="name"} +{!parent which=content_type:product >> score=min v=childq}&childq=+in_stock:true^=0 {!func}list_price_gbp&... >> >> beside of "explain" there is a parsed query entry in debug that's more >> useful for troubleshooting purposes. >> Please also make sure that + is properly encoded by %2B and pass http hurdle. >> >> On Fri, Nov 18, 2016 at 2:14 PM, Mike Allen < >> mike.al...@thecommercepartnership.com> wrote: >> >>> Apologies if I'm doing something incredibly stupid as I'm new to Solr. >>> I am having an issue with scoring child documents in a block join >>> query when including a dismax query. I'm actually a little unclear on >>> whether or not that's a complete oxymoron, combining dismax and block join. >>> >>> >>> >>> Problem statement: Given a set of Product documents - which contain >>> the product names and descriptions - which contain nested variant >>> documents (see below for abridged example) - which contain the >>> boolean stock status >>> (in_stock) and the variant prices (list_price_gbp) - I want to do a >>> Dismax query of, say, "skirt" on the product name (name) and sort the >>> resulting product documents by the minimum price (list_price_gbp) of >>> their child variant documents. Note that, although the abridged >>> document doesn't show them, there are a number of other arbitrary >>> fields which may be used as filter queries on the child documents, >>> for example size or colour, which will in effect change the "active" >>> minimum price of a product. Hence, denormalizing, or flattening, the >>> documents is not really an option I want to pursue. >>> >>> >>> >>> An abridged example document returned by the Solr Admin Query console >>> which I am querying: >>> >>> >>> >>> <doc> >>> >>> <str name="id">12345</str> >>> >>> <str name="content_type">product</str> >>> >>> <str name="name">black flared skirt</str> >>> >>> <float name="min_list_price_gbp">40.0</float> >>> >>> <result name="doc" numFound="2" start="0"> >>> >>> <doc> >>> >>> <str name="skuid">12345abcd</str> >>> >>> <str name="productid">12345</str> >>> >>> <str name="content_type">variant</str> >>> >>> <float >>> name="list_price_gbp">65.0</float> >>> >>> <bool name="in_stock">true</bool> >>> >>> </doc> >>> >>> <doc> >>> >>> <str name="skuid">12345fghi</str> >>> >>> <str name="productid">12345</str> >>> >>> <str name="content_type">variant</str> >>> >>> <float >>> name="list_price_gbp">40.0</float> >>> >>> <bool name="in_stock">true</bool> >>> >>> </doc> >>> >>> </doc> >>> >>> >>> >>> So I am familiar with the block join score mode; setting aside the >>> dismax aspect for now, this query, using the Function Query >>> {!func}list_price_gbp, with score ascending, returns documents >>> ordered correctly, with a £2.00 >>> (cheapest) product first: >>> >>> >>> >>> q={!parent which=content_type:product >>> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms >>> f="productid" >>> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >>> true))&start=0&row >>> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >>> >>> >>> >>> The "explain" for this is: >>> >>> >>> >>> 2.0000184 = Score based on 1 child docs in range from 26752 to 26752, >>> best >>> match: >>> >>> 2.0000184 = sum of: >>> >>> 1.8374416E-5 = weight(in_stock:T in 26752) [], result of: >>> >>> 1.8374416E-5 = score(doc=26752,freq=1.0 = termFreq=1.0 >>> >>> ), product of: >>> >>> 1.8374416E-5 = idf(docFreq=27211, docCount=27211) >>> >>> 1.0 = tfNorm, computed from: >>> >>> 1.0 = termFreq=1.0 >>> >>> 1.2 = parameter k1 >>> >>> 0.0 = parameter b (norms omitted for field) >>> >>> 2.0 = FunctionQuery(float(list_price_gbp)), product of: >>> >>> 2.0 = float(list_price_gbp)=2.0 >>> >>> 1.0 = boost >>> >>> 1.0 = queryNorm >>> >>> >>> >>> Even though this is doing what I want, I have a slight niggle the >>> that overall score is not just the result of the Function Query, >>> however, as all results get the same tiny fraction added, it doesn't matter. >>> >>> >>> >>> However, when I prepend my dismax query: >>> >>> >>> >>> q={!dismax v="skirt" qf="name"}+{!parent which=content_type:product >>> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms >>> f="productid" >>> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >>> true))&start=0&row >>> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >>> >>> >>> >>> The scoring is only dependent on the dismax scoring, where the "explain" >>> for >>> this is: >>> >>> >>> >>> 2.7600822 = sum of: >>> >>> 2.7600822 = weight(name:skirt in 13406) [], result of: >>> >>> 2.7600822 = score(doc=13406,freq=1.0 = termFreq=1.0 >>> >>> ), product of: >>> >>> 3.5851278 = idf(docFreq=103, docCount=3731) >>> >>> 0.76987 = tfNorm, computed from: >>> >>> 1.0 = termFreq=1.0 >>> >>> 1.2 = parameter k1 >>> >>> 0.75 = parameter b >>> >>> 4.108818 = avgFieldLength >>> >>> 7.111111 = fieldLength >>> >>> >>> >>> So in actual fact, with score ascending, it is ordering the results >>> by least matching first and the nested document list_price_gbp is >>> irrelevant. I strongly suspect I am being totally dumb and that this >>> is expected behaviour for an obvious reason that escapes me, apart >>> from perhaps it's because the two scoring methods are just plainly >>> incompatible. >>> >>> >>> >>> I have additionally tried just doing a lucene query instead: >>> >>> >>> >>> q=+name:skirt +{!parent which=content_type:product score=min} >>> (in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" >>> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >>> true))&start=0&row >>> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >>> >>> >>> >>> The "explain" of this indicates it's scoring products, for which >>> list_price_gbp simply does not exist, as the Function Query always >>> returns zero. >>> >>> >>> >>> 6243963 = sum of: >>> >>> 3.624396 = weight(name:skirt in 18113) [], result of: >>> >>> 3.624396 = score(doc=18113,freq=1.0 = termFreq=1.0 >>> >>> ), product of: >>> >>> 3.5851278 = idf(docFreq=103, docCount=3731) >>> >>> 1.0109531 = tfNorm, computed from: >>> >>> 1.0 = termFreq=1.0 >>> >>> 1.2 = parameter k1 >>> >>> 0.75 = parameter b >>> >>> 4.108818 = avgFieldLength >>> >>> 4.0 = fieldLength >>> >>> 1.0 = >>> {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( >>> QueryBitSetProducer(con >>> tent_type:product))), product of: >>> >>> 1.0 = boost >>> >>> 1.0 = queryNorm >>> >>> 0.0 = FunctionQuery(float(list_price_gbp)), product of: >>> >>> 0.0 = float(list_price_gbp)=0.0 >>> >>> 1.0 = boost >>> >>> 1.0 = queryNorm >>> >>> >>> >>> Indeed, if I change the Function Query field to a product scoped >>> field, min_list_price_gbp, like so: >>> >>> >>> >>> q=+name:skirt +{!parent which=content_type:product >>> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms >>> f="productid" >>> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:( >>> true))&start=0&row >>> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml >>> >>> >>> >>> then the "explain" certainly does show the Function Query evaluating >>> >>> >>> >>> 8.624397 = sum of: >>> >>> 3.624396 = weight(name:skirt in 17890) [], result of: >>> >>> 3.624396 = score(doc=17890,freq=1.0 = termFreq=1.0 >>> >>> ), product of: >>> >>> 3.5851278 = idf(docFreq=103, docCount=3731) >>> >>> 1.0109531 = tfNorm, computed from: >>> >>> 1.0 = termFreq=1.0 >>> >>> 1.2 = parameter k1 >>> >>> 0.75 = parameter b >>> >>> 4.108818 = avgFieldLength >>> >>> 4.0 = fieldLength >>> >>> 1.0 = >>> {!cache=false}ConstantScore(BitDocIdSetFilterWrapper( >>> QueryBitSetProducer(con >>> tent_type:product))), product of: >>> >>> 1.0 = boost >>> >>> 1.0 = queryNorm >>> >>> 14.0 = FunctionQuery(float(min_list_price_gbp)), product of: >>> >>> 14.0 = float(min_list_price_gbp)=14.0 >>> >>> 1.0 = boost >>> >>> 1.0 = queryNorm >>> >>> >>> >>> My grasp of the syntax is pretty flakey, so I would be immensely >>> grateful if someone could point out if I'm just doing something >>> incredibly dumb. In my head, I see what I am trying to do as >>> >>> >>> >>> (some dismax or lucene query on parent document [e.g."skirt"]) >>> >>> => (get a subset of these parent docs based on a >>> block >>> join) >>> >>> => (where the children match a bunch >>> of arbitrary filter queries [e.g. "colour:red"]) >>> >>> => (then subquery the >>> child docs that match the same filter queries[e.g. "colour:red"]) >>> >>> => >>> (then score this subset of child documents) >>> >>> >>> => (and order by that score) >>> >>> >>> >>> >>> Is this actually possible? I've been googling about this for a day or >>> so and can't quite find anything definitive. I'm going to maybe try >>> and dive into the solr source code, but I'm a c# guy, not java, >>> without a debuggable environment as unneeded yet, and that could >>> prove pretty painful. >>> >>> >>> >>> Any help would be appreciated, even if it is just "can't be done", as >>> at least I could stop chasing my tail. >>> >>> >>> >>> Mike >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> >