: First let me say that this is very possibly the "x - y problem" so let me
: state up front what my ultimate need is -- then I'll ask about the thing I
: imagine might help...  which, of course, is heavily biased in the direction
: of my experience coding Java and writing SQL...

Thank you so much for asking your question this way!

Right off the bat, the background you've provided seems supicious...

: I have a piece of a query that calculates a score based on a "weighting"
        ...
: The specific line is this:
: <str name="bf">product(field(category_weight),20)</str>
: 
: What I just realized is that when I query Solr for a string that has NO
: matches in the entire corpus, I still get a slew of results because EVERY
: doc has the weighting value in the category_weight field - and therefore
: every doc gets some score.

...that is *NOT* how dismax and edisamx normally work.  

While both the "bf" abd "bq" params result in "additive" boosting, and the 
implementation of that "additive boost" comes from adding new optional 
clauses to the top level BooleanQuery that is executed, that only happens 
after the "main" query (from your "q" param) is added to that top level 
BooleanQuery as a "mandaory" clause.

So, for example, "bf=true()" and "bq=*:*" should match & boost every doc, 
but with the techprducts configs/data these requests still don't match 
anything...

/select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
/select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query

...and if you look at the debug output, the parsed queries shows that the 
"bogus" part of the query is mandatory...

+DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) 
FunctionQuery(const(true))

(i didn't use "pf" in that example, but the effect is the same, the "pf" 
based clauses are optional, while the "qf" based clauses are mandatory)

If you compare that example to your debug output, you'll notice a 
difference in structure -- it's a bit hard to see in your example, but if 
you simplify your qf, pf, and q fields it should be more obvious, but 
AFAICT the "main" parts of your query are getting wrapped in an extra 
layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in 
the top level query ... i don't see *any* mandatory clauses in your top 
level BooleanQuery, which is why any match on a bf or bq function is 
enough to cause a document to match.

I suspect the reason your parsed query structure is so diff has to do with 
this...

:        <str name="defType">synonym_edismax</str>>


1) how exactly is "synonym_edismax" defined in your solrconfig.xml? 
2) what QParserPlugin are you using to implement that?

I suspect whatever QParserPlugin you are using has a bug in it :)


If you can't fix the bug, one possibile workaround would be to abandon bf 
and bq params completely, and instead wrap the query it produces in in a 
{!boost} parser with whatever function you want (using functions like
sum() or prod() to combine multiple functions, and query() to incorporate 
your current bq param).  Doing this will require chanign how you specify 
you input (example below) and it will result in *multiplicitive* boosts -- 
so your scores will be much diff, and you will likely have to adjust your 
constants, but: 1) multiplicitive boosts are almost always what people 
*really* want anyway; 2) it will ensure the boosts are only applied for 
things matching your main query, no matter how that query parser works or 
what bugs it has.

Example of using {!boost} to wrap an arbitrary other parser...

instead of...
  defType=foofoo
  q=barbarbar

use...
   q={!boost b=$func defType=foofoo v=$qq}
  qq=barbarbar
func=sum(something,somethingelse)

https://cwiki.apache.org/confluence/display/solr/Other+Parsers
https://cwiki.apache.org/confluence/display/solr/Function+Queries




: 
: What I would like is to return zero results if there is no match for the
: querystring.  My collection is small enough that I don't care if the actual
: calculation runs on each doc (although that's wasteful) -- I just don't
: want to see results come back for zero matches to the querystring
: 
: (The /select endpoint does this of course, but my custom endpoint includes
: this "weighting" piece and therefore returns every doc in the corpus
: because they all have the weighting.
: 
: ====================
: Enter my imagined solution...  The potential X-Y problem...
: ====================
: 
: So - given that I come from a programming background, I immediately start
: thinking of an if statement ...
: 
:      if(some_score_for_the_primary_search_string) {
:           run_the_category_weight_calculation;
:      } else {
:           do_NOT_run_category_weight_calc;
:      }
: 
: 
: Another way of thinking of it would be something like the "WHERE" clause in
: SQL...
: 
:  run_category_weight_calculation WHERE "searchstring" is found in the
: document, not otherwise.
: 
: I'm aware that things could be handled in the client-side of my web app,
: but if possible, I'd like the interface to SOLR to be as clean as possible,
: and massage incoming SOLR data as little as possible.
: 
: In other words, do NOT return any docs if the querystring (and any
: synonyms) match zero docs.
: 
: Here is the endpoint XML for the query.  I've highlighted the specific line
: that is causing the unintended results...
: 
: 
:  <requestHandler name="/foo" class="solr.SearchHandler">
:     <!-- default values for query parameters can be specified, these
:          will be overridden by parameters in the request
:       -->
:      <lst name="defaults">
:        <str name="echoParams">all</str>
:        <int name="rows">20</int>
:        <!-- Query settings -->
:        <str name="df">text</str>
:       <!-- <str name="df">title</str> -->
:        <str name="defType">synonym_edismax</str>>
:        <str name="synonyms">true</str>
:     <!-- The line below balances out the weighting of exact matches to the
: synonym phrase entered by the user
:          with the category_weight calculation and the titleQuery calc.
: These numbers exist in a balance and
:          if one is raised or lowered, the others (probably) need to change
: as well.  It may be better to go with decimals
:          for all of them... .4 instead of 4 and 2 instead of 20 and 2.5
: instead of 25.
:          In the end, I'm not sure it really matters, but don't change one
: without changing the others
:          unless you've tested and are sure you want the results  -->
:        <float name="synonyms.originalBoost">1.5</float>
:        <float name="synonyms.synonymBoost">1.1</float>
:        <str name="mm">75%</str>
:        <str name="q.alt">*:*</str>
:        <str name="rows">20</str>
:        <str name="fq">meta_doc_type:chapterDoc</str>
:        <str name="bq">{!synonym_edismax qf='title' synonyms='true'
: synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
: v=$q}</str>
:        <str name="fl">id category_weight title category_ss score
: contentType</str>
:        <str name="titleQuery">{!edismax qf='title' bf='' bq='' v=$q}</str>
: =====================================================
:        *<str name="bf">product(field(category_weight),20)</str>*
: =====================================================
:        <str name="bf">product(query($titleQuery),4)</str>
:        <str name="qf">text contentType^1000</str>
:        <str name="wt">python</str>
:        <str name="debug">true</str>
:        <str name="debug.explain.structured">true</str>
:        <str name="indent">true</str>
:        <str name="echoParams">all</str>
:      </lst>
:   </requestHandler>
: 
: And here is the debug output for a query.  (This was a test for synonyms,
: which you'll see in the output.) The original query string was, of
: course, "μ-heavy
: chain disease"
: 
: You'll note that although there is no score in the first doc explain for
: the actual querystring, the highlighted section does get a score for
: product(double(category_weight)=1.5,const(20))
: 
: ... which is the thing that is currently causing all the docs in the
: collection to "match" even though the querystring is not in any of them.
: 
: "debug":{ "rawquerystring":"\"μ-heavy chain disease\"",
: "querystring":"\"μ-heavy
: chain disease\"", "parsedquery":"(DisjunctionMaxQuery((text:\"μ heavy chain
: disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
: ((+DisjunctionMaxQuery((text:\"mu heavy chain disease\" | (contentType:\"mu
: heavy chain disease\")^1000.0)))/no_coord^1.1)
: ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
: hcd\")^1000.0)))/no_coord^1.1) ((+DisjunctionMaxQuery((text:\"μ heavy chain
: disease\" | (contentType:\"μ heavy chain disease\")^1000.0)))/no_coord^1.1)
: ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
: hcd\")^1000.0)))/no_coord^1.1)) ((DisjunctionMaxQuery((title:\"μ heavy
: chain disease\"))^2.5 ((+DisjunctionMaxQuery((title:\"mu heavy chain
: disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
: hcd\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ heavy chain
: disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
: hcd\")))/no_coord^1.1)))
: FunctionQuery(product(double(category_weight),const(20)))
: FunctionQuery(product(query(+(title:\"μ heavy chain
: disease\"),def=0.0),const(4)))", "parsedquery_toString":"(((text:\"μ heavy
: chain disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
: ((+(text:\"mu heavy chain disease\" | (contentType:\"mu heavy chain
: disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
: hcd\")^1000.0))^1.1) ((+(text:\"μ heavy chain disease\" | (contentType:\"μ
: heavy chain disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
: hcd\")^1000.0))^1.1)) ((((title:\"μ heavy chain disease\"))^2.5
: ((+(title:\"mu heavy chain disease\"))^1.1) ((+(title:\"μ hcd\"))^1.1)
: ((+(title:\"μ heavy chain disease\"))^1.1) ((+(title:\"μ hcd\"))^1.1)))
: product(double(category_weight),const(20)) product(query(+(title:\"μ heavy
: chain disease\"),def=0.0),const(4))", "explain":{ "
: 33d808fe-6ccf-4305-a643-48e94de34d18":{ "match":true, "value":30.0, "
: description":"sum of:", "details":[{ "match":true, "value":30.0, "
: description":"FunctionQuery(product(double(category_weight),const(20))),
: product of:",
: =====================================================
: *"details":**[{ "match":true, "value":30.0,
: "description":"product(double(category_weight)=1.5,const(20))"}, {*
: =====================================================
: 
: "match":true, "value":1.0, "description":"boost"}, { "match":true, "value":
: 1.0, "description":"queryNorm"}]}, {
: 

-Hoss
http://www.lucidworks.com/

Reply via email to