I'm having the date boosting function as well. I'm using this function:
F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around
10,000 of documents added in one day, rord(createDate) returns very
different values for the same createDate. For example, the last document
added with have rord(createdDate) =1 while the last document added will have
rord(createdDate) = 10,000. When createDate > 10,000, value of F is
approaching 0. Therefore, the boost query doesn't make any difference
between the the last document added today and the document added 10 days
ago. Now if I replace 1000 in F with a large number, say 100000,  the boost
function  suddenly gives the last few documents enormous boost and make the
other query scores irrelevant.

So in my case (and many others' I believe), the "true" date value would be
more appropriate. I'm thinking along the same line of adding timestamp. It
wouldn't add much overhead this way, would it?

Regards,



On 8/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : Actually, just thinking about this a bit more, perhaps adding a function
> : call such as parseDate() might add too much overhead to the actual
> query,
> : perhaps it would be better to first convert the date to a timestamp at
> index
> : time and store it in a field type slong?  This might be more efficient
> but
>
> i would agree with you there, this is where a more robust (ie:
> less efficient) DateField-ish class that supports configuration options
> to specify:
>   1) the output format
>   2) the input format(s)
>   3) the indexed format
> ...as SimpleDateFormatter pattern strings would be handy.  The
> ValueSource it uses could return seconds (or some other unit based on
> another config option) since epoch as the intValue.
>
> it's been discussed before, but there are a lot of tricky issues involved
> which is probably why no one has really tackled it.
>
> : that still leaves the problem of obtaining the current timestamp to use
> in
> : the boost function.
>
> it would be pretty easy to write a ValueSource that just knew about "now"
> as seconds since epoch.
>
> : > While it seems to work pretty well, I've realised that this may not be
> : > quite as effective as i had hoped given that the calculation is based
> on the
> : > ordinal of the field value rather than the value of the field
> itself.  In
> : > cases where the field type is 'date' and the actual field values are
> not
> : > distributed evenly across all documents in the index, the value
> returned by
> : > rord() is not going to give a true reflection of document age.  For
> example,
>
> be careful what you wish for.  you are 100% correct that functions using
> hte (r)ord value of a DateField aren't a function of true age, but
> dependong on how you look at it that may be better then using the real age
> (i think so anyway).  Why it sounds appealing to say that docA should
> score half as high as docB if it is twice as old, that typically isn't all
> that important when dealing with recent dates; and when dealing with older
> dates the ordinal value tends to approximate it decently well ... where a
> true measure of age might screw you up is when you have situations where
> few/no new articles get published on weekends (or late at night).  it's
> also very confusing to people when the ordering of documents changes even
> though no new documents have been published -- that can easily happen if
> you are heavily boosting on a true age calculation but will never happen
> when dealing with an ordinal ranking of documents by age.
>
> (allthough, this could be compensated by doing all of your true age
> calculations relative the "min age" of all articles in your index -- but
> you would still get really weird 'big' shifts in scores as soon as that
> first article gets published on monday morning.
>
>
> -Hoss
>
>


-- 
Regards,

Cuong Hoang

Reply via email to