Removing lengthNorm from the calculation
I know I'm missing something really obvious, but I'm spinning my wheels figuring out how to eliminate lengthNorm from the calculations. The specific problem I'm trying to solve is that naive queries are resulting in crummy short records near the top of the list. The reality is that the longer records tend to be higher quality, so if anything, they need to be emphasized. However, I'm missing something simple. Any advice or a pointer to an example I could model off would be greatly appreciated. Thanks, kyle
Best way to change weighting based on the presence of a field
Howdy all, We are attempting to provide access to about 8 million records of highly variable quality and length. In a nutshell, we are trying to find a way to deprioritize "suspect" records without discriminating against useful records that happen to be short. We do not wish to eliminate suspect records from the results -- just deprioritize them a bit. We have been indexing a field that marks a record as likely to be good or bad, and I'm trying to figure out the most efficient way to use it (should I be trying this at all?). As a newbie, my first inclination was to OR the search terms with the same terms combined with a "good record marker" with a modest boost. However, this method seems really clunky, and I'm wondering if there's a better way to accomplish what we're trying to do. Thanks, kyle
Re: Best way to change weighting based on the presence of a field
> If you know at index time that the document is shady, the easiest way > to de-emphasize it globally is to set the document boost to some > value other than one. > > ... I considered that, but assumed we'd get the values wrong at first and have to do a lot of tinkering before we got it right. Is there a good way to do this at query time, or do you really need to do this when loading? It would be feasible to boost at load time, but recovery times from bad decisions are longer than I was hoping for. kyle
Re: Best way to change weighting based on the presence of a field
> In the near future, you can do a real query-time boost (score multiplication) > by another field or function > https://issues.apache.org/jira/browse/SOLR-334 > > And even quickly update all the values of the field being used as the boost: > https://issues.apache.org/jira/browse/SOLR-351 Thanks, all the feedback people are providing is very helpful. For the short term, it looks like the ticket might to use a function query on the value stored in a field that represents the quality of the record. kyle
Re: Forced Top Document
This method Charlie suggested will work just fine with a minor tweak. For relevancy sorting ?q=foo OR (foo AND id:bar) For nonrelevancy sorting, all you need is a multilevel sort. Just add a bogus field that only the important document contains. Then sort by bogus field in descending order before any other sorting criteria are applied. Either way, the document only appears when it matches the search criteria, and it will always be on top. kyle On 10/24/07, Charlie Jackson <[EMAIL PROTECTED]> wrote: > Yes, this will only work if the results are sorted by score (the > default). > > One thing I thought of after I sent this out was that this will include > the specified document even if it doesn't match your search criteria, > which may not be what you want. > > > -Original Message- > From: mark angelillo [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 24, 2007 12:44 PM > To: solr-user@lucene.apache.org > Subject: Re: Forced Top Document > > Charlie, > > That's interesting. I did try something like this. Did you try your > query with a sorting parameter? > > What I've read suggests that all the results are returned based on > the query specified, but then resorted as specified. Boosting (which > modifies the document's score) should not change the order unless the > results are sorted by score. > > Mark > > On Oct 24, 2007, at 1:05 PM, Charlie Jackson wrote: > > > Do you know which document you want at the top? If so, I believe you > > could just add an "OR" clause to your query to boost that document > > very > > high, such as > > > > ?q=foo OR id:bar^1000 > > > > Tried this on my installation and it did, indeed push the document > > specified to the top. > > > > > > > > -Original Message- > > From: Matthew Runo [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, October 24, 2007 10:17 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Forced Top Document > > > > I'd love to know this, as I just got a development request for this > > very feature. I'd rather not spend time on it if it already exists. > > > > ++ > > | Matthew Runo > > | Zappos Development > > | [EMAIL PROTECTED] > > | 702-943-7833 > > ++ > > > > > > On Oct 23, 2007, at 10:12 PM, mark angelillo wrote: > > > >> Hi all, > >> > >> Is there a way to get a specific document to appear on top of > >> search results even if a sorting parameter would push it further > >> down? > >> > >> Thanks in advance, > >> Mark > >> > >> mark angelillo > >> snooth inc. > >> o: 646.723.4328 > >> c: 484.437.9915 > >> [EMAIL PROTECTED] > >> snooth -- 1.8 million ratings and counting... > >> > >> > > > > mark angelillo > snooth inc. > o: 646.723.4328 > c: 484.437.9915 > [EMAIL PROTECTED] > snooth -- 1.8 million ratings and counting... > > > -- -- Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance [EMAIL PROTECTED] / 541.359.9599
Re: Forced Top Document
> The typical use case, though, is for the featured document to be on top only > for certain queries. Like in an intranet where someone queries 401K or > retirement or similar, you want to feature a document about benefits that > would otherwise rank really low for that query. I have not be able to make > sorting strategies work very well. Depending on how many of these certain queries you have, it seems like you could still use some variation of the strategy based on a bogus tag sort. If you place a dynamic field for each query term (e.g. foo_s, bar_s, etc) relevant to a document and then detect when one of the special query terms is detected, you can still sort on the appropriate dynamic field before applying the rest of the sort. kyle