Index time or query time boost, and help with boost syntax

jimi.hullegard Mon, 22 Feb 2016 23:43:18 -0800

Hi,

We have a use case where we want to influence the score of the documents based 
on the document type, and I am a bit unsure what is the best way to achieve 
this. In essence we have about 100.000 documents, of about 15 different 
document types. And we more or less want to tweak the score differently for 
each document type (ie it is not just one document type that should be boosed 
over all the others).


How would you suggest that we do this? First I thought that query time boosing 
would be perfect for this, because that way we can tweak and fine tune the 
boost levels without having to reindex everything each time. But to be honest, 
I really don't understand how I would put such a query together, using the 
edismax parser. I can't seem to find one single example for edismax for this, 
using the multiplicative boost, that boosts like this: documentType:person^1.8 
documentType:publication^1.5 documentType:news^1.5 documentType:event^1.3 
etc... Can someone help me out with the syntax?

Another approach could be that we use index time boost. That would simplify the 
querys, and to be honest I don't think that we need to modify the boosting 
factors much after the initial tweaking is done, and also our indexing process 
is fairly quick and light weight, so it isn't a big deal to perform a full 
reindex.
But here I am also unsure of how to set that up properly. Basically we want to 
boost the documents based on document type, regardless of the query. According 
to the documentaiton, this is what happens when one uses the boost attribute on 
the doc element in the xml. However the documentation also mentions that this 
is just "a convinience mechanism equivilent to specifying a boost attribute on 
each of the individual fields that support norms". This leaves me wondering:

1. If boost is defined on both the doc and field level, how is that 
interpreted? Are the values merged using 
add/multiply/max/some-other-math-function? Or is the doc boost just used as a 
default value for fields that doesn't defined their own boost?
2. What about fields that doesn't have norms? If a query matches such a field, 
wouldn't that effect the score, without me being able to effect that score?
3. On a general note: Is the score I'm boosting really the 
total/outermost/final score of the document? So that a boost of 2.0 would 
double the final score of that document, all else equal? Or I'm I simply 
boosting one "inner score", that in turn is used in some complex math 
expression so that it might not influence the final score at all in 
circumstances, and other times might only influence the score in a much smaller 
way?

An alternative I guess could be to start out with query time boosting like 
above, to find the apropriate boosting levels. And then convert this to some 
kind of hybrid solition afterwards, where the boost factor is stored in a field 
in the document (thus being set at index time), and then being used in a boost 
function in the query. With this solution, I guess that it would also be 
possible to have multiple "boost fields" in the documents, each with different 
relative boost values based on document type, and then be able to choose at 
query time what boost field we want. Would that be a good solution you think? 
But would it be possible to go from a query boost of the type 
"documentType:person^1.8 ..." to a function query boost that uses a document 
field with that value? Ie, would the resulting scores be the same for 
"documentType:person^1.8 ..." on one hand, and a function boost query with a 
field that has the value 1.8 for documents of type person? Or could the boost 
values from these different boost styles result in different final scores?

Regards
/Jimi

Index time or query time boost, and help with boost syntax

Reply via email to