Apologies if I'm doing something incredibly stupid as I'm new to Solr. I am having an issue with scoring child documents in a block join query when including a dismax query. I'm actually a little unclear on whether or not that's a complete oxymoron, combining dismax and block join.
Problem statement: Given a set of Product documents - which contain the product names and descriptions - which contain nested variant documents (see below for abridged example) - which contain the boolean stock status (in_stock) and the variant prices (list_price_gbp) - I want to do a Dismax query of, say, "skirt" on the product name (name) and sort the resulting product documents by the minimum price (list_price_gbp) of their child variant documents. Note that, although the abridged document doesn't show them, there are a number of other arbitrary fields which may be used as filter queries on the child documents, for example size or colour, which will in effect change the "active" minimum price of a product. Hence, denormalizing, or flattening, the documents is not really an option I want to pursue. An abridged example document returned by the Solr Admin Query console which I am querying: <doc> <str name="id">12345</str> <str name="content_type">product</str> <str name="name">black flared skirt</str> <float name="min_list_price_gbp">40.0</float> <result name="doc" numFound="2" start="0"> <doc> <str name="skuid">12345abcd</str> <str name="productid">12345</str> <str name="content_type">variant</str> <float name="list_price_gbp">65.0</float> <bool name="in_stock">true</bool> </doc> <doc> <str name="skuid">12345fghi</str> <str name="productid">12345</str> <str name="content_type">variant</str> <float name="list_price_gbp">40.0</float> <bool name="in_stock">true</bool> </doc> </doc> So I am familiar with the block join score mode; setting aside the dismax aspect for now, this query, using the Function Query {!func}list_price_gbp, with score ascending, returns documents ordered correctly, with a £2.00 (cheapest) product first: q={!parent which=content_type:product score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml The "explain" for this is: 2.0000184 = Score based on 1 child docs in range from 26752 to 26752, best match: 2.0000184 = sum of: 1.8374416E-5 = weight(in_stock:T in 26752) [], result of: 1.8374416E-5 = score(doc=26752,freq=1.0 = termFreq=1.0 ), product of: 1.8374416E-5 = idf(docFreq=27211, docCount=27211) 1.0 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) 2.0 = FunctionQuery(float(list_price_gbp)), product of: 2.0 = float(list_price_gbp)=2.0 1.0 = boost 1.0 = queryNorm Even though this is doing what I want, I have a slight niggle the that overall score is not just the result of the Function Query, however, as all results get the same tiny fraction added, it doesn't matter. However, when I prepend my dismax query: q={!dismax v="skirt" qf="name"}+{!parent which=content_type:product score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml The scoring is only dependent on the dismax scoring, where the "explain" for this is: 2.7600822 = sum of: 2.7600822 = weight(name:skirt in 13406) [], result of: 2.7600822 = score(doc=13406,freq=1.0 = termFreq=1.0 ), product of: 3.5851278 = idf(docFreq=103, docCount=3731) 0.76987 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 4.108818 = avgFieldLength 7.111111 = fieldLength So in actual fact, with score ascending, it is ordering the results by least matching first and the nested document list_price_gbp is irrelevant. I strongly suspect I am being totally dumb and that this is expected behaviour for an obvious reason that escapes me, apart from perhaps it's because the two scoring methods are just plainly incompatible. I have additionally tried just doing a lucene query instead: q=+name:skirt +{!parent which=content_type:product score=min} (in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml The "explain" of this indicates it's scoring products, for which list_price_gbp simply does not exist, as the Function Query always returns zero. 6243963 = sum of: 3.624396 = weight(name:skirt in 18113) [], result of: 3.624396 = score(doc=18113,freq=1.0 = termFreq=1.0 ), product of: 3.5851278 = idf(docFreq=103, docCount=3731) 1.0109531 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 4.108818 = avgFieldLength 4.0 = fieldLength 1.0 = {!cache=false}ConstantScore(BitDocIdSetFilterWrapper(QueryBitSetProducer(con tent_type:product))), product of: 1.0 = boost 1.0 = queryNorm 0.0 = FunctionQuery(float(list_price_gbp)), product of: 0.0 = float(list_price_gbp)=0.0 1.0 = boost 1.0 = queryNorm Indeed, if I change the Function Query field to a product scoped field, min_list_price_gbp, like so: q=+name:skirt +{!parent which=content_type:product score=min}+(in_stock:(true)){!func}list_price_gbp&doc.q={!terms f="productid" v=$row.id}&doc.rows=1000&doc.fl=score,*&doc.fq=(in_stock:(true))&start=0&row s=103&fl=score,*,doc:[subquery]&sort=score asc&debugQuery=on&wt=xml then the "explain" certainly does show the Function Query evaluating 8.624397 = sum of: 3.624396 = weight(name:skirt in 17890) [], result of: 3.624396 = score(doc=17890,freq=1.0 = termFreq=1.0 ), product of: 3.5851278 = idf(docFreq=103, docCount=3731) 1.0109531 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 4.108818 = avgFieldLength 4.0 = fieldLength 1.0 = {!cache=false}ConstantScore(BitDocIdSetFilterWrapper(QueryBitSetProducer(con tent_type:product))), product of: 1.0 = boost 1.0 = queryNorm 14.0 = FunctionQuery(float(min_list_price_gbp)), product of: 14.0 = float(min_list_price_gbp)=14.0 1.0 = boost 1.0 = queryNorm My grasp of the syntax is pretty flakey, so I would be immensely grateful if someone could point out if I'm just doing something incredibly dumb. In my head, I see what I am trying to do as (some dismax or lucene query on parent document [e.g."skirt"]) => (get a subset of these parent docs based on a block join) => (where the children match a bunch of arbitrary filter queries [e.g. "colour:red"]) => (then subquery the child docs that match the same filter queries[e.g. "colour:red"]) => (then score this subset of child documents) => (and order by that score) Is this actually possible? I've been googling about this for a day or so and can't quite find anything definitive. I'm going to maybe try and dive into the solr source code, but I'm a c# guy, not java, without a debuggable environment as unneeded yet, and that could prove pretty painful. Any help would be appreciated, even if it is just "can't be done", as at least I could stop chasing my tail. Mike