Re: FunctionQuery and boosting documents using date arithmetic

Chris Hostetter Fri, 10 Aug 2007 19:31:27 -0700

: Actually, just thinking about this a bit more, perhaps adding a function
: call such as parseDate() might add too much overhead to the actual query,
: perhaps it would be better to first convert the date to a timestamp at index
: time and store it in a field type slong?  This might be more efficient but


i would agree with you there, this is where a more robust (ie:
less efficient) DateField-ish class that supports configuration options
to specify:
  1) the output format
  2) the input format(s)
  3) the indexed format
...as SimpleDateFormatter pattern strings would be handy.  The
ValueSource it uses could return seconds (or some other unit based on
another config option) since epoch as the intValue.

it's been discussed before, but there are a lot of tricky issues involved
which is probably why no one has really tackled it.

: that still leaves the problem of obtaining the current timestamp to use in
: the boost function.

it would be pretty easy to write a ValueSource that just knew about "now"
as seconds since epoch.

: > While it seems to work pretty well, I've realised that this may not be
: > quite as effective as i had hoped given that the calculation is based on the
: > ordinal of the field value rather than the value of the field itself.  In
: > cases where the field type is 'date' and the actual field values are not
: > distributed evenly across all documents in the index, the value returned by
: > rord() is not going to give a true reflection of document age.  For example,

be careful what you wish for.  you are 100% correct that functions using
hte (r)ord value of a DateField aren't a function of true age, but
dependong on how you look at it that may be better then using the real age
(i think so anyway).  Why it sounds appealing to say that docA should
score half as high as docB if it is twice as old, that typically isn't all
that important when dealing with recent dates; and when dealing with older
dates the ordinal value tends to approximate it decently well ... where a
true measure of age might screw you up is when you have situations where
few/no new articles get published on weekends (or late at night).  it's
also very confusing to people when the ordering of documents changes even
though no new documents have been published -- that can easily happen if
you are heavily boosting on a true age calculation but will never happen
when dealing with an ordinal ranking of documents by age.

(allthough, this could be compensated by doing all of your true age
calculations relative the "min age" of all articles in your index -- but
you would still get really weird 'big' shifts in scores as soon as that
first article gets published on monday morning.


-Hoss

Re: FunctionQuery and boosting documents using date arithmetic

Reply via email to