Re: boost parameter produces garbage hits

Walter Underwood Thu, 18 Apr 2019 14:34:29 -0700

For your application, I would probably do everything with the qf and pf fields. 
Your minimally tokenized fields are better evidence for relevance, so weight 
them higher. Something like this, with phrase matches counting twice as much as 
word matches:


      <str name=“qf”>text_minimal^2 text_stem</str>
      <str name=“pf”>text_minimal^4 text_stem^2</str>

I most often use boost for popularity, almost always with this formula:

       <str name="boost">sum(log(sum(popularity,1)),1)</str>

If there is a chance that popularity might be negative, do this:

       <str name="boost">sum(log(sum(max(popularity,0),1)),1)</str>

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 18, 2019, at 12:55 PM, Webster Homer 
> <webster.ho...@milliporesigma.com> wrote:
> 
> Looked at boost a bit more. The # of results remains the same whether the 
> boost parameter is present or not. If it is present the behavior seems to be 
> that if it matches a hit in the result, it does what I expect, however if it 
> does not match the hit, what ends up in the result is completely unexpected 
> with 0 relevancy. 
> It does appear that bq does what I want, but the behavior of boost seems like 
> a bug. We use boost elsewhere and it works as we want, that use case does not 
> involve using the query function though.
> 
> -----Original Message-----
> From: Webster Homer <webster.ho...@milliporesigma.com> 
> Sent: Thursday, April 18, 2019 12:16 PM
> To: solr-user@lucene.apache.org
> Subject: boost parameter produces garbage hits
> 
> Hi,
> 
> I am trying to understand how the boost (and bq) parameters are supposed to 
> work.
> My application searches our product schema and returns the best matches. To 
> enable an exactish match on product name we created fields that are minimally 
> tokenized (keyword tokenizer/lowercase). Now I want the search to boost 
> results that match on those fields. I thought that either the boost or bq 
> parameter would work. I found very few good examples of the boost parameter 
> used on a query. A lot of permutations resulted in errors such as this:
> org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
> 'ethyl alcohol'
> 
> I am using Solr 7.2 and the eDismax query parser.
> I have gotten boost to work, sort of, it really changes the query results in 
> a bad way. I'm sure that I'm doing something wrong. Here is an example of my 
> boost parameter boost=product(query({!edismax qf="search_en_p_pri_name_min 
> search_en_root_name_min" v=$q boost=}, 0),10000)
> 
> When I search for "ethyl alcohol" products named "ethyl alcohol" come first, 
> which is what I want. We have a range of ethyl alcohol products. Normally I 
> expect to see "ethyl alcohol, pure" and "ethyl alcohol, dnatured" after the 
> initial "ethyl alcohol" and I see this without the boost. With the boost I 
> get "ethyl alcohol" with a score of, 3.87201088E8. The second hit is 
> "Brilliant Cresyl blue" with a score of 0. All subsequent hits have a 0
> 
> Why are there any matches returned with a score of 0? Why are these hits with 
> a 0 score being returned at all? Especially when more relevant matches are 
> not being returned? I suspect that there is something wrong with my boost 
> function, but it looks right. However if I take it and instead submit the 
> function shown above as a bf parameter I get a syntax error:
> bf=product(query({!edismax qf="search_en_p_pri_name_min 
> search_en_root_name_min" v=$q bf=}),10000)
> org.apache.solr.search.SyntaxError: Expected identifier at pos 23 
> str='product(query({!edismax'"
> 
> From the documentation I expected that the bf and boost parameters only 
> differed as to how the result was boosted with boost being multiplicative and 
> the bf being additive, but I cannot find an equivalent which actually works 
> with the bf parameter.
> 
> The bq parameter doesn't throw an error, but it doesn't seem to have any 
> effect in how the results are ordered.
> 
> What am I doing wrong? Why does the boost parameter return garbage hits with 
> 0 score? What would work as a bf parameter function?
>

Re: boost parameter produces garbage hits

Reply via email to