On Jul 7, 2016, at 7:30 AM, Mark T. Trembley <mark.tremb...@etrailer.com> wrote:
I have a question about the best way to rank my results based on a score field
that can have different values per document and where each document can have
different scores based on which term is queried.
Essentially what I'm wanting to have happen is provide a list of terms that when matched via a
query it returns a corresponding score to help boost the original document. So if I had a document
with a multi-valued field named B1_ss with terms [Boost1|10], [Boost2|20], [Boost3|100] and my
search query is "Boost2", I want that document's result to be boosted by 20. Also note
that "Boost2" can boost different documents at different levels. The query to select the
actual documents will select against other fields in the document and could possibly return
documents with any combination of B1 terms.
I'm still trying to figure out how best to model this in my index, either as
child documents, or in another collection, or if it would make more sense to
figure out how to make it work via payloads or by boosting the terms at index
time.
I'm running Solr 5.5.1 in cloud mode. Each server has a complete replica of all
collections.
The document structure I've been toying with the most is to put the boosts into a
separate index and join them using !join syntax and returning the scores, but I've not
had any luck getting quality results from those tests. The extra "scores" index
is structured like this (I'll add the json for my test collections at the end of the
email):
id:Document1_Boost1
B1_s:Boost1
B1_f:10
id:Document1_Boost3
B1_s:Boost3
B1_f:100
Using this structure, I get close, but the scores are not what I'm expecting.
If I use the following query, the explain says it's using the score from
Document6_Boost2 even though my query is specifying B1_s:Boost3
http://localhost:8983/solr/generic/select?q={!join from=id to=B1_name_ss
fromIndex=scores score=max}B1_s:Boost3{!func}B1_f&fl=*,score&debugQuery=true
<lstname="explain">
<strname="Document6">
*3.379996* = Score based on join value Document6_Boost2
</str>
<strname="Document1">
*2.2533307* = Score based on join value Document1_Boost1
</str>
<strname="Document7">
*0.24786638* = Score based on join value Document7_Boost333
</str>
<strname="Document3">*0.0* = Score based on join value Document3_NoBoost</str>
</lst>
My guess is that it's now doing an all document query on the "scores" collection to
return the scores in addition to the B1_s query I've passed in. I can't figure out where it's
getting those scores from as a simple query against the "scores" collection returns
scores like I'd expect to see them based on a similar query:
http://192.168.1.194:8983/solr/scores/select?q=B1_s:Boost3 AND
_val_:B1_f&fl=score,*&debugQuery=true
<lstname="explain">
<strname="Document1_Boost3">
*46.834885* = sum of: 1.7682717 = weight(B1_s:Boost3 in 1) [ClassicSimilarity],
result of: 1.7682717 = score(doc=1,freq=1.0), product of: 0.8926926 =
queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 =
queryNorm 1.9808292 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 =
fieldNorm(doc=1) 45.066612 = FunctionQuery(float(B1_f)), product of: 100.0 =
float(B1_f)=100.0 1.0 = boost 0.45066613 = queryNorm
</str>
<strname="Document6_Boost3">
*15.288256* = sum of: 1.7682717 = weight(B1_s:Boost3 in 5) [ClassicSimilarity],
result of: 1.7682717 = score(doc=5,freq=1.0), product of: 0.8926926 =
queryWeight, product of: 1.9808292 = idf(docFreq=2, maxDocs=8) 0.45066613 =
queryNorm 1.9808292 = fieldWeight in 5, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 1.9808292 = idf(docFreq=2, maxDocs=8) 1.0 =
fieldNorm(doc=5) 13.519984 = FunctionQuery(float(B1_f)), product of: 30.0 =
float(B1_f)=30.0 1.0 = boost 0.45066613 = queryNorm
</str>
</lst>
I feel like I'm getting close to what I need, but it's just not clear to me
what I'm missing at this point.
The other option I've been toying with is using payloads, but actually
utilizing the payloads as part of the scoring process is beyond me at this time.
Any thoughts or hints on the best way to boost the relevancy of these
scoreswould be appreciated.
Thanks
Mark
GENERIC:
{
"id" : "Document1",
"B1_ss" : ["Boost1|10","Boost3|100"],
"title_s" : "Title1"
,"otherstuff_ss" : ["stuff1","suggestion"]
,"B1_name_ss" : ["Document1_Boost1","Document1_Boost3"]
},
{
"id" : "Document2",
"B1_ss" : ["Boost2|20"],
"name_s" : "Product2",
"title_s" : "Title2"
,"otherstuff_ss" : ["stuff2","recommendation"]
,"B1_name_ss" : ["Document2_Boost1"]
},
{
"id" : "Document3",
"name_s" : "Product3",
"B1_ss" : ["NoBoost"],
"title_s" : "Title3"
,"otherstuff_ss" : ["stuff3","new","suggestion"]
,"B1_name_ss" : ["Document3_NoBoost"]
},
{
"id" : "Document4",
"name_s" : "Product4",
"title_s" : "Title4"
,"otherstuff_ss" : ["stuff4","old","suggestion"]
} ,
{
"id" : "Document5",
"name_s" : "Product5",
"title_s" : "Title5"
,"otherstuff_ss" : ["stuff5","recommendation"]
},
{
"id" : "Document6",
"name_s" : "Product6",
"B1_ss" : ["Boost2|15","Boost3|30"],
"title_s" : "Title6"
,"B1_name_ss" : ["Document6_Boost2","Document6_Boost3"]
},
{
"id" : "Document7",
"name_s" : "Product7",
"B1_ss" : ["NoBoost","Boost333|1.1"],
"title_s" : "Title7"
,"B1_name_ss" : ["Document7_NoBoost","Document7_Boost333"]
}
SCORES:
{
"id" : "Document1_Boost1",
"B1_s" : "Boost1",
"B1_f" : 10
},
{
"id" : "Document1_Boost3",
"B1_s" : "Boost3",
"B1_f" : 100
},
{
"id" : "Document2_Boost2",
"B1_s" : "Boost2",
"B1_f" : 20
},
{
"id" : "Document3_NoBoost",
"B1_s" : "NoBoost"
},
{
"id" : "Document6_Boost2",
"B1_s" : "Boost2",
"B1_f" : 15
},
{
"id" : "Document6_Boost3",
"B1_s" : "Boost3",
"B1_f" : 30
},
{
"id" : "Document7_NoBoost",
"B1_s" : "NoBoost"
},
{
"id" : "Document7_Boost333",
"B1_s" : "Boost333",
"B1_f" : 1.1
}