If you just discovered the omitTf parameter because of this post, please be aware that I've not really explained it's purpose properly and note that using it will prevent phrase queries from working. See this thread for clarification on it's use here: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200903.mbox/%3 c897559.95769...@web50301.mail.re2.yahoo.com%3e
-- Dean -----Original Message----- From: Dean Missikowski (Consultant), CLSA Sent: 16/03/2009 10:30 AM To: solr-user@lucene.apache.org Subject: RE: How to correctly boost results in Solr Dismax query Hi, My experience is that the BQ parameter can be used with any query type. You can define boosts on the query fields (qf) that are used with the query terms (q) in your query, AND you can define additional boosts for fields that are not used with the query terms through the bq or bf parameters. I think the relative weight that assigning a particular boost to a field via BQ has on the overall scoring needs to take into consideration the other fields in your query. If you're searching on titles, you might want to consider setting omitNorms=true (means don't generate length normalization vectors) for title in your schema.xml, and if you're using Solr 1.4 omitTf=true (means don't generate term frequency vectors), so that results aren't skewed by short and long titles, or titles that contain multiple occurrences of the same term (setting these requires you to reindex). I think this should have the effect of making BQ boosts like &bq=media:DVD^2&bq=media:BLU-RAY^1.5 more effective. -- Dean -----Original Message----- From: Pete Smith [mailto:pete.sm...@lovefilm.com] Sent: 13/03/2009 7:11 PM To: solr-user@lucene.apache.org Subject: Re: How to correctly boost results in Solr Dismax query Hi, On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote: > bq works only with q.alt query and not with q queries. So, in your case you > would be using qf parameter for field boosting, you will have to give both > the fields in qf parameter i.e. both title and media. > > try this > > <str name=qf>media^1.0 title^100.0</str> But with that, how will it know to rank media:DVD higher than media:BLU-RAY? Cheers, Pete > Pete Smith-3 wrote: > > > > Hi Amit, > > > > Thanks again for your reply. I am understanding it a bit better but I > > think it would help if I posted an example. Say I have three records: > > > > <doc> > > <long name="id">1</long> > > <str name="media">BLU-RAY</str> > > <str name="title">Indiana Jones and the Kingdom of the Crystal > > Skull</str> > > </doc> > > <doc> > > <long name="id">2</long> > > <str name="media">DVD</str> > > <str name="title">Indiana Jones and the Kingdom of the Crystal > > Skull</str> > > </doc> > > <doc> > > <long name="id">3</long> > > <str name="media">DVD</str> > > <str name="title">Casino Royale</str> > > </doc> > > > > Now, if I search for indiana: select?q=indiana > > > > I want the first two rows to come back (not the third as it does not > > contain 'indiana'). I would like record 2 to be scored higher than > > record 1 as it's media type is DVD. > > > > At the moment I have in my config: > > > > <str name="qf">title</str> > > > > And i was trying to boost by media having a specific value by using 'bq' > > but from what you told me that is incorrect. > > > > Cheers, > > Pete > > > > > > On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: > >> Pete, > >> > >> Sorry, if wasnt clear. Here is the explanation. > >> > >> Suppose you have 2 records and they have films and media as 2 columns. > >> > >> Now first record has values like films="Indiana" and media="blue ray" > >> and 2nd record has values like films="Bond" and media="Indiana" > >> > >> Values for qf parameters > >> > >> <str name="qf">media^2.0 films^1.0</str> > >> > >> Now, search for q=Indiana .. it should display both of the records but > >> record #2 will display above than the 1st. > >> > >> Let me know if you still have questions. > >> > >> Cheers, > >> amit > >> > >> > >> Pete Smith-3 wrote: > >> > > >> > Hi Amit, > >> > > >> > Thanks very much for your reply. What you said makes things a bit > >> > clearer but I am still a bit confused. > >> > > >> > On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: > >> >> If you want to boost the records with their field value then you must > >> use > >> >> q > >> >> query parameter instead of q.alt. 'q' parameter actually uses qf > >> >> parameters > >> >> from solrConfig for field boosting. > >> > > >> >>From the documentation for Dismax queries, I thought that "q" is simply > >> > a keyword parameter: > >> > > >> >>From http://wiki.apache.org/solr/DisMaxRequestHandler: > >> > q > >> > The guts of the search defining the main "query". This is designed to > >> be > >> > support raw input strings provided by users with no special escaping. > >> > '+' and '-' characters are treated as "mandatory" and "prohibited" > >> > modifiers for the subsequent terms. Text wrapped in balanced quote > >> > characters '"' are treated as phrases, any query containing an odd > >> > number of quote characters is evaluated as if there were no quote > >> > characters at all. Wildcards in this "q" parameter are not supported. > >> > > >> > And I thought 'qf' is a list of fields and boost scores: > >> > > >> >>From http://wiki.apache.org/solr/DisMaxRequestHandler: > >> > qf (Query Fields) > >> > List of fields and the "boosts" to associate with each of them when > >> > building DisjunctionMaxQueries from the user's query. The format > >> > supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that > >> > fieldOne has a boost of 2.3, fieldTwo has the default boost, and > >> > fieldThree has a boost of 0.4 ... this indicates that matches in > >> > fieldOne are much more significant than matches in fieldTwo, which are > >> > more significant than matches in fieldThree. > >> > > >> > But if I want to, say, search for films with 'indiana' in the title, > >> > with media=DVD scoring higher than media=BLU-RAY then do I need to do > >> > something like: > >> > > >> > solr/select?q=indiana > >> > > >> > And in my config: > >> > > >> > <str name="qf">media^2</str> > >> > > >> > But I don't see where the actual *contents* of the media field would > >> > determine the boost. > >> > > >> > Sorry if I have misunderstood what you mean. > >> > > >> > Cheers, > >> > Pete > >> > > >> >> Pete Smith-3 wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > I have managed to build an index in Solr which I can search on > >> keyword, > >> >> > produce facets, query facets etc. This is all working great. I have > >> >> > implemented my search using a dismax query so it searches > >> predetermined > >> >> > fields. > >> >> > > >> >> > However, my results are coming back sorted by score which appears to > >> be > >> >> > calculated by keyword relevancy only. I would like to adjust the > >> score > >> >> > where fields have pre-determined values. I think I can do this with > >> >> > boost query and boost functions but the documentation here: > >> >> > > >> >> > > >> >> > >> http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09b dab971309135c7aea22fb3 > >> >> > > >> >> > Is not particularly helpful. I tried adding adding a bq argument to > >> my > >> >> > search: > >> >> > > >> >> > &bq=media:DVD^2 > >> >> > > >> >> > (yes, this is an index of films!) but I find when I start adding > >> more > >> >> > and more: > >> >> > > >> >> > &bq=media:DVD^2&bq=media:BLU-RAY^1.5 > >> >> > > >> >> > I find the negative results - e.g. films that are DVD but are not > >> >> > BLU-RAY get negatively affected in their score. In the end it all > >> seems > >> >> > to even out and my score is as it was before i started boosting. > >> >> > > >> >> > I must be doing this wrong and I wonder whether "boost function" > >> comes > >> >> > in somewhere. Any ideas on how to correctly use boost? > >> >> > > >> >> > Cheers, > >> >> > Pete > >> >> > > >> >> > -- > >> >> > Pete Smith > >> >> > Developer > >> >> > > >> >> > No.9 | 6 Portal Way | London | W3 6RU | > >> >> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 > >> >> > > >> >> > LOVEFiLM.com > >> >> > > >> >> > > >> >> > >> > -- > >> > Pete Smith > >> > Developer > >> > > >> > No.9 | 6 Portal Way | London | W3 6RU | > >> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 > >> > > >> > LOVEFiLM.com > >> > > >> > > >> > > -- > > Pete Smith > > Developer > > > > No.9 | 6 Portal Way | London | W3 6RU | > > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 > > > > LOVEFiLM.com > > > > > -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com CLSA CLEAN & GREEN: Please consider our environment before printing this email. The content of this communication is subject to CLSA Legal and Regulatory Notices. These can be viewed at https://www.clsa.com/disclaimer.html or sent to you upon request.