Also note that we have an open and related issue on Lucene's bug tracking system. omitTf might get renamed so that it's more clear that positional information is not stored, which prevents phrase queries.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: "Dean Missikowski (Consultant), CLSA" <dean.missikow...@clsa.com> > To: solr-user@lucene.apache.org > Sent: Monday, March 16, 2009 4:46:32 AM > Subject: RE: How to correctly boost results in Solr Dismax query > > If you just discovered the omitTf parameter because of this post, please > be aware that I've not really explained it's purpose properly and note > that using it will prevent phrase queries from working. See this thread > for clarification on it's use here: > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200903.mbox/%3 > c897559.95769...@web50301.mail.re2.yahoo.com%3e > > -- Dean > > -----Original Message----- > From: Dean Missikowski (Consultant), CLSA > Sent: 16/03/2009 10:30 AM > To: solr-user@lucene.apache.org > Subject: RE: How to correctly boost results in Solr Dismax query > > Hi, > > My experience is that the BQ parameter can be used with any query type. > You can define boosts on the query fields (qf) that are used with the > query terms (q) in your query, AND you can define additional boosts for > fields that are not used with the query terms through the bq or bf > parameters. > > I think the relative weight that assigning a particular boost to a field > via BQ has on the overall scoring needs to take into consideration the > other fields in your query. If you're searching on titles, you might > want to consider setting omitNorms=true (means don't generate length > normalization vectors) for title in your schema.xml, and if you're using > Solr 1.4 omitTf=true (means don't generate term frequency vectors), so > that results aren't skewed by short and long titles, or titles that > contain multiple occurrences of the same term (setting these requires > you to reindex). I think this should have the effect of making BQ boosts > like &bq=media:DVD^2&bq=media:BLU-RAY^1.5 more effective. > > -- Dean > > -----Original Message----- > From: Pete Smith [mailto:pete.sm...@lovefilm.com] > Sent: 13/03/2009 7:11 PM > To: solr-user@lucene.apache.org > Subject: Re: How to correctly boost results in Solr Dismax query > > Hi, > > On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote: > > bq works only with q.alt query and not with q queries. So, in your > case you > > would be using qf parameter for field boosting, you will have to give > both > > the fields in qf parameter i.e. both title and media. > > > > try this > > > > media^1.0 title^100.0 > > But with that, how will it know to rank media:DVD higher than > media:BLU-RAY? > > Cheers, > Pete > > > > Pete Smith-3 wrote: > > > > > > Hi Amit, > > > > > > Thanks again for your reply. I am understanding it a bit better but > I > > > think it would help if I posted an example. Say I have three > records: > > > > > > > > > 1 > > > BLU-RAY > > > Indiana Jones and the Kingdom of the Crystal > > > Skull > > > > > > > > > 2 > > > DVD > > > Indiana Jones and the Kingdom of the Crystal > > > Skull > > > > > > > > > 3 > > > DVD > > > Casino Royale > > > > > > > > > Now, if I search for indiana: select?q=indiana > > > > > > I want the first two rows to come back (not the third as it does not > > > contain 'indiana'). I would like record 2 to be scored higher than > > > record 1 as it's media type is DVD. > > > > > > At the moment I have in my config: > > > > > > title > > > > > > And i was trying to boost by media having a specific value by using > 'bq' > > > but from what you told me that is incorrect. > > > > > > Cheers, > > > Pete > > > > > > > > > On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: > > >> Pete, > > >> > > >> Sorry, if wasnt clear. Here is the explanation. > > >> > > >> Suppose you have 2 records and they have films and media as 2 > columns. > > >> > > >> Now first record has values like films="Indiana" and media="blue > ray" > > >> and 2nd record has values like films="Bond" and media="Indiana" > > >> > > >> Values for qf parameters > > >> > > >> media^2.0 films^1.0 > > >> > > >> Now, search for q=Indiana .. it should display both of the records > but > > >> record #2 will display above than the 1st. > > >> > > >> Let me know if you still have questions. > > >> > > >> Cheers, > > >> amit > > >> > > >> > > >> Pete Smith-3 wrote: > > >> > > > >> > Hi Amit, > > >> > > > >> > Thanks very much for your reply. What you said makes things a bit > > >> > clearer but I am still a bit confused. > > >> > > > >> > On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: > > >> >> If you want to boost the records with their field value then you > must > > >> use > > >> >> q > > >> >> query parameter instead of q.alt. 'q' parameter actually uses qf > > >> >> parameters > > >> >> from solrConfig for field boosting. > > >> > > > >> >>From the documentation for Dismax queries, I thought that "q" is > simply > > >> > a keyword parameter: > > >> > > > >> >>From http://wiki.apache.org/solr/DisMaxRequestHandler: > > >> > q > > >> > The guts of the search defining the main "query". This is > designed to > > >> be > > >> > support raw input strings provided by users with no special > escaping. > > >> > '+' and '-' characters are treated as "mandatory" and > "prohibited" > > >> > modifiers for the subsequent terms. Text wrapped in balanced > quote > > >> > characters '"' are treated as phrases, any query containing an > odd > > >> > number of quote characters is evaluated as if there were no quote > > >> > characters at all. Wildcards in this "q" parameter are not > supported. > > >> > > > >> > And I thought 'qf' is a list of fields and boost scores: > > >> > > > >> >>From http://wiki.apache.org/solr/DisMaxRequestHandler: > > >> > qf (Query Fields) > > >> > List of fields and the "boosts" to associate with each of them > when > > >> > building DisjunctionMaxQueries from the user's query. The format > > >> > supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which > indicates that > > >> > fieldOne has a boost of 2.3, fieldTwo has the default boost, and > > >> > fieldThree has a boost of 0.4 ... this indicates that matches in > > >> > fieldOne are much more significant than matches in fieldTwo, > which are > > >> > more significant than matches in fieldThree. > > >> > > > >> > But if I want to, say, search for films with 'indiana' in the > title, > > >> > with media=DVD scoring higher than media=BLU-RAY then do I need > to do > > >> > something like: > > >> > > > >> > solr/select?q=indiana > > >> > > > >> > And in my config: > > >> > > > >> > media^2 > > >> > > > >> > But I don't see where the actual *contents* of the media field > would > > >> > determine the boost. > > >> > > > >> > Sorry if I have misunderstood what you mean. > > >> > > > >> > Cheers, > > >> > Pete > > >> > > > >> >> Pete Smith-3 wrote: > > >> >> > > > >> >> > Hi, > > >> >> > > > >> >> > I have managed to build an index in Solr which I can search on > > >> keyword, > > >> >> > produce facets, query facets etc. This is all working great. I > have > > >> >> > implemented my search using a dismax query so it searches > > >> predetermined > > >> >> > fields. > > >> >> > > > >> >> > However, my results are coming back sorted by score which > appears to > > >> be > > >> >> > calculated by keyword relevancy only. I would like to adjust > the > > >> score > > >> >> > where fields have pre-determined values. I think I can do this > with > > >> >> > boost query and boost functions but the documentation here: > > >> >> > > > >> >> > > > >> >> > > >> > http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09b > dab971309135c7aea22fb3 > > >> >> > > > >> >> > Is not particularly helpful. I tried adding adding a bq > argument to > > >> my > > >> >> > search: > > >> >> > > > >> >> > &bq=media:DVD^2 > > >> >> > > > >> >> > (yes, this is an index of films!) but I find when I start > adding > > >> more > > >> >> > and more: > > >> >> > > > >> >> > &bq=media:DVD^2&bq=media:BLU-RAY^1.5 > > >> >> > > > >> >> > I find the negative results - e.g. films that are DVD but are > not > > >> >> > BLU-RAY get negatively affected in their score. In the end it > all > > >> seems > > >> >> > to even out and my score is as it was before i started > boosting. > > >> >> > > > >> >> > I must be doing this wrong and I wonder whether "boost > function" > > >> comes > > >> >> > in somewhere. Any ideas on how to correctly use boost? > > >> >> > > > >> >> > Cheers, > > >> >> > Pete > > >> >> > > > >> >> > -- > > >> >> > Pete Smith > > >> >> > Developer > > >> >> > > > >> >> > No.9 | 6 Portal Way | London | W3 6RU | > > >> >> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 > > >> >> > > > >> >> > LOVEFiLM.com > > >> >> > > > >> >> > > > >> >> > > >> > -- > > >> > Pete Smith > > >> > Developer > > >> > > > >> > No.9 | 6 Portal Way | London | W3 6RU | > > >> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 > > >> > > > >> > LOVEFiLM.com > > >> > > > >> > > > >> > > > -- > > > Pete Smith > > > Developer > > > > > > No.9 | 6 Portal Way | London | W3 6RU | > > > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 > > > > > > LOVEFiLM.com > > > > > > > > > -- > Pete Smith > Developer > > No.9 | 6 Portal Way | London | W3 6RU | > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 > > LOVEFiLM.com > > CLSA CLEAN & GREEN: Please consider our environment before printing this > email. > The content of this communication is subject to CLSA Legal and > Regulatory Notices. > These can be viewed at https://www.clsa.com/disclaimer.html or sent to > you upon request.