Apologies if I'm doing something incredibly stupid as I'm new to Solr. I am
having an issue with scoring child documents in a block join query when
including a dismax query. I'm actually a little unclear on whether or not
that's a complete oxymoron, combining dismax and block join.

 

Problem statement: Given a set of Product documents - which contain the
product names and descriptions - which contain nested variant documents (see
below for abridged example) - which contain the boolean stock status
(in_stock) and the variant prices (list_price_gbp) - I want to do a Dismax
query of, say, "skirt" on the product name (name) and sort the resulting
product documents by the minimum price (list_price_gbp) of their child
variant documents. Note that, although the abridged document doesn't show
them, there are a number of other arbitrary fields which may be used as
filter queries on the child documents, for example size or colour, which
will in effect change the "active" minimum price of a product. Hence,
denormalizing, or flattening, the documents is not really an option I want
to pursue.

 

An abridged example document returned by the Solr Admin Query console which
I am querying:

                  

<doc>

    <str name="id">12345</str>

                <str name="content_type">product</str>

                <str name="name">black flared skirt</str>

                <float name="min_list_price_gbp">40.0</float>

                <result name="doc" numFound="2" start="0">

      <doc>

                    <str name="skuid">12345abcd</str>

                                <str name="productid">12345</str>

        <str name="content_type">variant</str>

                                <float name="list_price_gbp">65.0</float>

                                <bool name="in_stock">true</bool>

                  </doc>

                  <doc>

                    <str name="skuid">12345fghi</str>

                                <str name="productid">12345</str>

        <str name="content_type">variant</str>

                                <float name="list_price_gbp">40.0</float>

                                <bool name="in_stock">true</bool>

                  </doc> 

</doc>

 

So I am familiar with the block join score mode; setting aside the dismax
aspect for now, this query, using the Function Query {!func}list_price_gbp,
with score ascending, returns documents ordered correctly, with a £2.00
(cheapest) product first:

 

q={!parent which=content_type:product
score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms
f="productid"
v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row
s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml

 

The "explain" for this is:

 

2.0000184 = Score based on 1 child docs in range from 26752 to 26752, best
match:

  2.0000184 = sum of:

    1.8374416E-5 = weight(in_stock:T in 26752) [], result of:

      1.8374416E-5 = score(doc=26752,freq=1.0 = termFreq=1.0

), product of:

        1.8374416E-5 = idf(docFreq=27211, docCount=27211)

        1.0 = tfNorm, computed from:

          1.0 = termFreq=1.0

          1.2 = parameter k1

          0.0 = parameter b (norms omitted for field)

    2.0 = FunctionQuery(float(list_price_gbp)), product of:

      2.0 = float(list_price_gbp)=2.0

      1.0 = boost

      1.0 = queryNorm

 

Even though this is doing what I want, I have a slight niggle the that
overall score is not just the result of the Function Query, however, as all
results get the same tiny fraction added, it doesn't matter.

 

However, when I prepend my dismax query:

 

q={!dismax v="skirt" qf="name"}+{!parent which=content_type:product
score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms
f="productid"
v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row
s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml

 

The scoring is only dependent on the dismax scoring, where the "explain" for
this is:

 

2.7600822 = sum of:

  2.7600822 = weight(name:skirt in 13406) [], result of:

    2.7600822 = score(doc=13406,freq=1.0 = termFreq=1.0

), product of:

      3.5851278 = idf(docFreq=103, docCount=3731)

      0.76987 = tfNorm, computed from:

        1.0 = termFreq=1.0

        1.2 = parameter k1

        0.75 = parameter b

        4.108818 = avgFieldLength

        7.111111 = fieldLength  

                                

So in actual fact, with score ascending, it is ordering the results by least
matching first and the nested document list_price_gbp is irrelevant. I
strongly suspect I am being totally dumb and that this is expected behaviour
for an obvious reason that escapes me, apart from perhaps it's because the
two scoring methods are just plainly incompatible.

 

I have additionally tried just doing a lucene query instead:

 

q=+name:skirt +{!parent which=content_type:product score=min}
(in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid"
v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row
s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml

 

The "explain" of this indicates it's scoring products, for which
list_price_gbp simply does not exist, as the Function Query always returns
zero. 

 

6243963 = sum of:

  3.624396 = weight(name:skirt in 18113) [], result of:

    3.624396 = score(doc=18113,freq=1.0 = termFreq=1.0

), product of:

      3.5851278 = idf(docFreq=103, docCount=3731)

      1.0109531 = tfNorm, computed from:

        1.0 = termFreq=1.0

        1.2 = parameter k1

        0.75 = parameter b

        4.108818 = avgFieldLength

        4.0 = fieldLength

  1.0 =
{!cache=false}ConstantScore(BitDocIdSetFilterWrapper(QueryBitSetProducer(con
tent_type:product))), product of:

    1.0 = boost

    1.0 = queryNorm

  0.0 = FunctionQuery(float(list_price_gbp)), product of:

    0.0 = float(list_price_gbp)=0.0

    1.0 = boost

    1.0 = queryNorm

                

Indeed, if I change the Function Query field to a product scoped field,
min_list_price_gbp, like so:

                

q=+name:skirt +{!parent which=content_type:product
score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms
f="productid"
v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row
s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml

 

then the "explain" certainly does show the Function Query evaluating

 

8.624397 = sum of:

  3.624396 = weight(name:skirt in 17890) [], result of:

    3.624396 = score(doc=17890,freq=1.0 = termFreq=1.0

), product of:

      3.5851278 = idf(docFreq=103, docCount=3731)

      1.0109531 = tfNorm, computed from:

        1.0 = termFreq=1.0

        1.2 = parameter k1

        0.75 = parameter b

        4.108818 = avgFieldLength

        4.0 = fieldLength

  1.0 =
{!cache=false}ConstantScore(BitDocIdSetFilterWrapper(QueryBitSetProducer(con
tent_type:product))), product of:

    1.0 = boost

    1.0 = queryNorm

  14.0 = FunctionQuery(float(min_list_price_gbp)), product of:

    14.0 = float(min_list_price_gbp)=14.0

    1.0 = boost

    1.0 = queryNorm

 

My grasp of the syntax is pretty flakey, so I would be immensely grateful if
someone could point out if I'm just doing something incredibly dumb. In my
head, I see what I am trying to do as 

 

(some dismax or lucene query on parent document [e.g."skirt"]) 

                => (get a subset of these parent docs based on a block join)

                                => (where the children match a bunch of
arbitrary filter queries [e.g. "colour:red"])

                                                => (then subquery the child
docs that match the same filter queries[e.g. "colour:red"])

                                                                => (then
score this subset of child documents)

 
=> (and order by that score)

 


Is this actually possible? I've been googling about this for a day or so and
can't quite find anything definitive. I'm going to maybe try and dive into
the solr source code, but I'm a c# guy, not java, without a debuggable
environment as unneeded yet, and that could prove pretty painful.

 

Any help would be appreciated, even if it is just "can't be done", as at
least I could stop chasing my tail.

 

Mike

 

 

 

 

                

Reply via email to