Hello Mike,
Structured queries in Solr are way cumbersome.
Start from:
q=+{!dismax v="skirt" qf="name"} +{!parent which=content_type:product
score=min v=childq}&childq=+in_stock:true^=0 {!func}list_price_gbp&...

beside of "explain" there is a parsed query entry in debug that's more
useful for troubleshooting purposes.
Please also make sure that + is properly encoded by %2B and pass http
hurdle.

On Fri, Nov 18, 2016 at 2:14 PM, Mike Allen <
mike.al...@thecommercepartnership.com> wrote:

> Apologies if I'm doing something incredibly stupid as I'm new to Solr. I am
> having an issue with scoring child documents in a block join query when
> including a dismax query. I'm actually a little unclear on whether or not
> that's a complete oxymoron, combining dismax and block join.
>
>
>
> Problem statement: Given a set of Product documents - which contain the
> product names and descriptions - which contain nested variant documents
> (see
> below for abridged example) - which contain the boolean stock status
> (in_stock) and the variant prices (list_price_gbp) - I want to do a Dismax
> query of, say, "skirt" on the product name (name) and sort the resulting
> product documents by the minimum price (list_price_gbp) of their child
> variant documents. Note that, although the abridged document doesn't show
> them, there are a number of other arbitrary fields which may be used as
> filter queries on the child documents, for example size or colour, which
> will in effect change the "active" minimum price of a product. Hence,
> denormalizing, or flattening, the documents is not really an option I want
> to pursue.
>
>
>
> An abridged example document returned by the Solr Admin Query console which
> I am querying:
>
>
>
> <doc>
>
>     <str name="id">12345</str>
>
>                 <str name="content_type">product</str>
>
>                 <str name="name">black flared skirt</str>
>
>                 <float name="min_list_price_gbp">40.0</float>
>
>                 <result name="doc" numFound="2" start="0">
>
>       <doc>
>
>                     <str name="skuid">12345abcd</str>
>
>                                 <str name="productid">12345</str>
>
>         <str name="content_type">variant</str>
>
>                                 <float name="list_price_gbp">65.0</float>
>
>                                 <bool name="in_stock">true</bool>
>
>                   </doc>
>
>                   <doc>
>
>                     <str name="skuid">12345fghi</str>
>
>                                 <str name="productid">12345</str>
>
>         <str name="content_type">variant</str>
>
>                                 <float name="list_price_gbp">40.0</float>
>
>                                 <bool name="in_stock">true</bool>
>
>                   </doc>
>
> </doc>
>
>
>
> So I am familiar with the block join score mode; setting aside the dismax
> aspect for now, this query, using the Function Query {!func}list_price_gbp,
> with score ascending, returns documents ordered correctly, with a £2.00
> (cheapest) product first:
>
>
>
> q={!parent which=content_type:product
> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms
> f="productid"
> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(
> true))&start=0&row
> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml
>
>
>
> The "explain" for this is:
>
>
>
> 2.0000184 = Score based on 1 child docs in range from 26752 to 26752, best
> match:
>
>   2.0000184 = sum of:
>
>     1.8374416E-5 = weight(in_stock:T in 26752) [], result of:
>
>       1.8374416E-5 = score(doc=26752,freq=1.0 = termFreq=1.0
>
> ), product of:
>
>         1.8374416E-5 = idf(docFreq=27211, docCount=27211)
>
>         1.0 = tfNorm, computed from:
>
>           1.0 = termFreq=1.0
>
>           1.2 = parameter k1
>
>           0.0 = parameter b (norms omitted for field)
>
>     2.0 = FunctionQuery(float(list_price_gbp)), product of:
>
>       2.0 = float(list_price_gbp)=2.0
>
>       1.0 = boost
>
>       1.0 = queryNorm
>
>
>
> Even though this is doing what I want, I have a slight niggle the that
> overall score is not just the result of the Function Query, however, as all
> results get the same tiny fraction added, it doesn't matter.
>
>
>
> However, when I prepend my dismax query:
>
>
>
> q={!dismax v="skirt" qf="name"}+{!parent which=content_type:product
> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms
> f="productid"
> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(
> true))&start=0&row
> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml
>
>
>
> The scoring is only dependent on the dismax scoring, where the "explain"
> for
> this is:
>
>
>
> 2.7600822 = sum of:
>
>   2.7600822 = weight(name:skirt in 13406) [], result of:
>
>     2.7600822 = score(doc=13406,freq=1.0 = termFreq=1.0
>
> ), product of:
>
>       3.5851278 = idf(docFreq=103, docCount=3731)
>
>       0.76987 = tfNorm, computed from:
>
>         1.0 = termFreq=1.0
>
>         1.2 = parameter k1
>
>         0.75 = parameter b
>
>         4.108818 = avgFieldLength
>
>         7.111111 = fieldLength
>
>
>
> So in actual fact, with score ascending, it is ordering the results by
> least
> matching first and the nested document list_price_gbp is irrelevant. I
> strongly suspect I am being totally dumb and that this is expected
> behaviour
> for an obvious reason that escapes me, apart from perhaps it's because the
> two scoring methods are just plainly incompatible.
>
>
>
> I have additionally tried just doing a lucene query instead:
>
>
>
> q=+name:skirt +{!parent which=content_type:product score=min}
> (in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid"
> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(
> true))&start=0&row
> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml
>
>
>
> The "explain" of this indicates it's scoring products, for which
> list_price_gbp simply does not exist, as the Function Query always returns
> zero.
>
>
>
> 6243963 = sum of:
>
>   3.624396 = weight(name:skirt in 18113) [], result of:
>
>     3.624396 = score(doc=18113,freq=1.0 = termFreq=1.0
>
> ), product of:
>
>       3.5851278 = idf(docFreq=103, docCount=3731)
>
>       1.0109531 = tfNorm, computed from:
>
>         1.0 = termFreq=1.0
>
>         1.2 = parameter k1
>
>         0.75 = parameter b
>
>         4.108818 = avgFieldLength
>
>         4.0 = fieldLength
>
>   1.0 =
> {!cache=false}ConstantScore(BitDocIdSetFilterWrapper(
> QueryBitSetProducer(con
> tent_type:product))), product of:
>
>     1.0 = boost
>
>     1.0 = queryNorm
>
>   0.0 = FunctionQuery(float(list_price_gbp)), product of:
>
>     0.0 = float(list_price_gbp)=0.0
>
>     1.0 = boost
>
>     1.0 = queryNorm
>
>
>
> Indeed, if I change the Function Query field to a product scoped field,
> min_list_price_gbp, like so:
>
>
>
> q=+name:skirt +{!parent which=content_type:product
> score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms
> f="productid"
> v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(
> true))&start=0&row
> s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml
>
>
>
> then the "explain" certainly does show the Function Query evaluating
>
>
>
> 8.624397 = sum of:
>
>   3.624396 = weight(name:skirt in 17890) [], result of:
>
>     3.624396 = score(doc=17890,freq=1.0 = termFreq=1.0
>
> ), product of:
>
>       3.5851278 = idf(docFreq=103, docCount=3731)
>
>       1.0109531 = tfNorm, computed from:
>
>         1.0 = termFreq=1.0
>
>         1.2 = parameter k1
>
>         0.75 = parameter b
>
>         4.108818 = avgFieldLength
>
>         4.0 = fieldLength
>
>   1.0 =
> {!cache=false}ConstantScore(BitDocIdSetFilterWrapper(
> QueryBitSetProducer(con
> tent_type:product))), product of:
>
>     1.0 = boost
>
>     1.0 = queryNorm
>
>   14.0 = FunctionQuery(float(min_list_price_gbp)), product of:
>
>     14.0 = float(min_list_price_gbp)=14.0
>
>     1.0 = boost
>
>     1.0 = queryNorm
>
>
>
> My grasp of the syntax is pretty flakey, so I would be immensely grateful
> if
> someone could point out if I'm just doing something incredibly dumb. In my
> head, I see what I am trying to do as
>
>
>
> (some dismax or lucene query on parent document [e.g."skirt"])
>
>                 => (get a subset of these parent docs based on a block
> join)
>
>                                 => (where the children match a bunch of
> arbitrary filter queries [e.g. "colour:red"])
>
>                                                 => (then subquery the child
> docs that match the same filter queries[e.g. "colour:red"])
>
>                                                                 => (then
> score this subset of child documents)
>
>
> => (and order by that score)
>
>
>
>
> Is this actually possible? I've been googling about this for a day or so
> and
> can't quite find anything definitive. I'm going to maybe try and dive into
> the solr source code, but I'm a c# guy, not java, without a debuggable
> environment as unneeded yet, and that could prove pretty painful.
>
>
>
> Any help would be appreciated, even if it is just "can't be done", as at
> least I could stop chasing my tail.
>
>
>
> Mike
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to