On Jun 23, 2009, at 8:05 AM, Asif Rahman wrote:
Hi Grant,
I'll give a real life example of the problem that we are trying to
solve.
We index a large number of current news articles on a continuing
basis. We
tag these articles with news topics (e.g. Barack Obama, Iran,
etc.). We
then use these tags to facet our queries. For example, we might
issue a
query for all articles in the last 24 hours. The facets would then
tell us
which news topics have been written about the most in that period.
The
problem is that "Barack Obama", for example, is always written about
in high
frequency, as opposed to "Iran" which is currently very hot in the
news, but
which has not always been the case. In this case, we'd like to see
"Iran"
show up higher than "Barack Obama" in the facet results.
To me, this seems identical to the tf-idf scoring expression that is
used in
normal search. The facet count is analogous to the tf and I can
access the
facet term idf's through the Similarity API.
I'd say faceting is akin to the DF (doc freq) part of search, not TF.
TF is per document, DF is across all the docs. Faceting is just
counting all of docs that contain the various terms in that field
across the results set.
Regardless of the semantics, it doesn't sound like DF would give you
what you want. It could be entirely possible that in some short
timespan the number of docs on Iran could match up w/ the number on
Obama (maybe not for that particular example) in which case your "hot"
item would no longer appear hot.
One idea is that you could take baselines of all the facets nightly
for that field (via *:* or something) and then you could track the
trends that way by calculating the diffs. Of course, you could then
do this hour to hour and get into all kinds of trend detection stuff.
In other words, it does seem like it's something you could do with Solr.
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search