Removing lengthNorm from the calculation

2007-09-10 Thread Kyle Banerjee
I know I'm missing something really obvious, but I'm spinning my
wheels figuring out how to eliminate lengthNorm from the calculations.

The specific problem I'm trying to solve is that naive queries are
resulting in crummy short records near the top of the list. The
reality is that the longer records tend to be higher quality, so if
anything, they need to be emphasized.

However, I'm missing something simple. Any advice or a pointer to an
example I could model off would be greatly appreciated. Thanks,

kyle


Best way to change weighting based on the presence of a field

2007-10-05 Thread Kyle Banerjee
Howdy all,

We are attempting to provide access to about 8 million records of
highly variable quality and length. In a nutshell, we are trying to
find a way to deprioritize "suspect" records without discriminating
against useful records that happen to be short. We do not wish to
eliminate suspect records from the results -- just deprioritize them a
bit.

We have been indexing a field that marks a record as likely to be good
or bad, and I'm trying to figure out the most efficient way to use it
(should I be trying this at all?). As a newbie, my first inclination
was to OR the search terms with the same terms combined with a "good
record marker" with a modest boost.

However, this method seems really clunky, and I'm wondering if there's
a better way to accomplish what we're trying to do. Thanks,

kyle


Re: Best way to change weighting based on the presence of a field

2007-10-05 Thread Kyle Banerjee
> If you know at index time that the document is shady, the easiest way
> to de-emphasize it globally is to set the document boost to some
> value other than one.
>
> ...

I considered that, but assumed we'd get the values wrong at first and
have to do a lot of tinkering before we got it right. Is there a good
way to do this at query time, or do you really need to do this when
loading? It would be feasible to boost at load time, but recovery
times from bad decisions are longer than I was hoping for.

kyle


Re: Best way to change weighting based on the presence of a field

2007-10-06 Thread Kyle Banerjee
> In the near future, you can do a real query-time boost (score multiplication)
> by another field or function
> https://issues.apache.org/jira/browse/SOLR-334
>
> And even quickly update all the values of the field being used as the boost:
> https://issues.apache.org/jira/browse/SOLR-351

Thanks, all the feedback people are providing is very helpful. For the
short term, it looks like the ticket might to use a function query on
the value stored in a field that represents the quality of the record.

kyle


Re: Forced Top Document

2007-10-24 Thread Kyle Banerjee
This method Charlie suggested will work just fine with a minor tweak.
For relevancy sorting

?q=foo OR (foo AND id:bar)

For nonrelevancy sorting, all you need is a multilevel sort. Just add
a bogus field that only the important document contains. Then sort by
bogus field in descending order before any other sorting criteria are
applied.

Either way, the document only appears when it matches the search
criteria, and it will always be on top.

kyle

On 10/24/07, Charlie Jackson <[EMAIL PROTECTED]> wrote:
> Yes, this will only work if the results are sorted by score (the
> default).
>
> One thing I thought of after I sent this out was that this will include
> the specified document even if it doesn't match your search criteria,
> which may not be what you want.
>
>
> -Original Message-
> From: mark angelillo [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 24, 2007 12:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Forced Top Document
>
> Charlie,
>
> That's interesting. I did try something like this. Did you try your
> query with a sorting parameter?
>
> What I've read suggests that all the results are returned based on
> the query specified, but then resorted as specified. Boosting (which
> modifies the document's score) should not change the order unless the
> results are sorted by score.
>
> Mark
>
> On Oct 24, 2007, at 1:05 PM, Charlie Jackson wrote:
>
> > Do you know which document you want at the top? If so, I believe you
> > could just add an "OR" clause to your query to boost that document
> > very
> > high, such as
> >
> > ?q=foo OR id:bar^1000
> >
> > Tried this on my installation and it did, indeed push the document
> > specified to the top.
> >
> >
> >
> > -Original Message-
> > From: Matthew Runo [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, October 24, 2007 10:17 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Forced Top Document
> >
> > I'd love to know this, as I just got a development request for this
> > very feature. I'd rather not spend time on it if it already exists.
> >
> > ++
> >   | Matthew Runo
> >   | Zappos Development
> >   | [EMAIL PROTECTED]
> >   | 702-943-7833
> > ++
> >
> >
> > On Oct 23, 2007, at 10:12 PM, mark angelillo wrote:
> >
> >> Hi all,
> >>
> >> Is there a way to get a specific document to appear on top of
> >> search results even if a sorting parameter would push it further
> >> down?
> >>
> >> Thanks in advance,
> >> Mark
> >>
> >> mark angelillo
> >> snooth inc.
> >> o: 646.723.4328
> >> c: 484.437.9915
> >> [EMAIL PROTECTED]
> >> snooth -- 1.8 million ratings and counting...
> >>
> >>
> >
>
> mark angelillo
> snooth inc.
> o: 646.723.4328
> c: 484.437.9915
> [EMAIL PROTECTED]
> snooth -- 1.8 million ratings and counting...
>
>
>


-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
[EMAIL PROTECTED] / 541.359.9599


Re: Forced Top Document

2007-10-24 Thread Kyle Banerjee
> The typical use case, though, is for the featured document to be on top only
> for certain queries.  Like in an intranet where someone queries 401K or
> retirement or similar, you want to feature a document about benefits that
> would otherwise rank really low for that query.  I have not be able to make
> sorting strategies work very well.

Depending on how many of these certain queries you have, it seems like
you could still use some variation of the strategy based on a bogus
tag sort. If you place a dynamic field for each query term (e.g.
foo_s, bar_s, etc) relevant to a document and then detect when one of
the special query terms is detected, you can still sort on the
appropriate dynamic field before applying the rest of the sort.

kyle