On Jun 23, 2009, at 6:23 PM, Chris Hostetter wrote:
: Regardless of the semantics, it doesn't sound like DF would give
you what you
: want. It could be entirely possible that in some short timespan
the number of
: docs on Iran could match up w/ the number on Obama (maybe not for
that
: particular example) in which case your "hot" item would no longer
appear hot.
but if hte numbers match up in that timespan then the "hot" item
isn't as
"hot" anymore.
Not necessarily true. Consider the case where over the year there are
50 stories about Obama. Then, in the span of 5 days, there are 50
stories about Iran. Iran, in my view, is still hotter than Obama. In
Asif's case, he was suggesting comparing against the global DF.
Not to worry, though, your proposal is much the same as mine, namely
take a baseline based on some set of docs (I chose *:*, you chose past
month) and then compare.
Myabe i'm missunderstanding: but it sounds like Asif's question
esentailly
boils down to getting facet constraints sorted after using some
normalizing fraction ... the simplest case being the inverse ratio
(this
is where i think Asif is comparing it to IDF) of the number of
matches for
that facet in some larger docset to the size of the docset-- typically
that docset could be the entire index, but it could also be the same
search over a large window of time.
So if i was doing a news search for all docs in the last 24 hours, I
could
multiple each of those facet counts by the ratio of the corrisponding
counts from the past month to the number of articles from the past
monght
see how much "hotter" they are in my smaller result set...
current result set facet counts (X)...
News:1100
Obama:1000
Iran:800
Miley Cyrus:700
iPod:500
facet counts from the past month (Y), during which type 9000 (Z)
documents were published...
News:9000
Obama:7000
Iran:1000
Miley Cyrus:4000
iPod:5000
X*(Z/Y)...
Iran:7200
Miley Cyrus:1575
Obama:1285.7
News:1100
iPod:900
Doing this in a Solr plugin would be the best way to to this --
because
otherwise your "hot" terms might not even show up in the facet lists.
any attempt to do it on the client would just be an approximation, and
could easily miss the "hottest" item if it was just below cutoff for
hte
number of constraints to be returned.
-Hoss
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search