I'm having the date boosting function as well. I'm using this function: F = recip(rord(creationDate),1,1000,1000)^10. However, since I have around 10,000 of documents added in one day, rord(createDate) returns very different values for the same createDate. For example, the last document added with have rord(createdDate) =1 while the last document added will have rord(createdDate) = 10,000. When createDate > 10,000, value of F is approaching 0. Therefore, the boost query doesn't make any difference between the the last document added today and the document added 10 days ago. Now if I replace 1000 in F with a large number, say 100000, the boost function suddenly gives the last few documents enormous boost and make the other query scores irrelevant.
So in my case (and many others' I believe), the "true" date value would be more appropriate. I'm thinking along the same line of adding timestamp. It wouldn't add much overhead this way, would it? Regards, On 8/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > : Actually, just thinking about this a bit more, perhaps adding a function > : call such as parseDate() might add too much overhead to the actual > query, > : perhaps it would be better to first convert the date to a timestamp at > index > : time and store it in a field type slong? This might be more efficient > but > > i would agree with you there, this is where a more robust (ie: > less efficient) DateField-ish class that supports configuration options > to specify: > 1) the output format > 2) the input format(s) > 3) the indexed format > ...as SimpleDateFormatter pattern strings would be handy. The > ValueSource it uses could return seconds (or some other unit based on > another config option) since epoch as the intValue. > > it's been discussed before, but there are a lot of tricky issues involved > which is probably why no one has really tackled it. > > : that still leaves the problem of obtaining the current timestamp to use > in > : the boost function. > > it would be pretty easy to write a ValueSource that just knew about "now" > as seconds since epoch. > > : > While it seems to work pretty well, I've realised that this may not be > : > quite as effective as i had hoped given that the calculation is based > on the > : > ordinal of the field value rather than the value of the field > itself. In > : > cases where the field type is 'date' and the actual field values are > not > : > distributed evenly across all documents in the index, the value > returned by > : > rord() is not going to give a true reflection of document age. For > example, > > be careful what you wish for. you are 100% correct that functions using > hte (r)ord value of a DateField aren't a function of true age, but > dependong on how you look at it that may be better then using the real age > (i think so anyway). Why it sounds appealing to say that docA should > score half as high as docB if it is twice as old, that typically isn't all > that important when dealing with recent dates; and when dealing with older > dates the ordinal value tends to approximate it decently well ... where a > true measure of age might screw you up is when you have situations where > few/no new articles get published on weekends (or late at night). it's > also very confusing to people when the ordering of documents changes even > though no new documents have been published -- that can easily happen if > you are heavily boosting on a true age calculation but will never happen > when dealing with an ordinal ranking of documents by age. > > (allthough, this could be compensated by doing all of your true age > calculations relative the "min age" of all articles in your index -- but > you would still get really weird 'big' shifts in scores as soon as that > first article gets published on monday morning. > > > -Hoss > > -- Regards, Cuong Hoang