On Jun 23, 2009, at 6:23 PM, Chris Hostetter wrote:


: Regardless of the semantics, it doesn't sound like DF would give you what you : want. It could be entirely possible that in some short timespan the number of : docs on Iran could match up w/ the number on Obama (maybe not for that : particular example) in which case your "hot" item would no longer appear hot.

but if hte numbers match up in that timespan then the "hot" item isn't as
"hot" anymore.

Not necessarily true. Consider the case where over the year there are 50 stories about Obama. Then, in the span of 5 days, there are 50 stories about Iran. Iran, in my view, is still hotter than Obama. In Asif's case, he was suggesting comparing against the global DF.

Not to worry, though, your proposal is much the same as mine, namely take a baseline based on some set of docs (I chose *:*, you chose past month) and then compare.


Myabe i'm missunderstanding: but it sounds like Asif's question esentailly
boils down to getting facet constraints sorted after using some
normalizing fraction ... the simplest case being the inverse ratio (this is where i think Asif is comparing it to IDF) of the number of matches for
that facet in some larger docset to the size of the docset-- typically
that docset could be the entire index, but it could also be the same
search over a large window of time.

So if i was doing a news search for all docs in the last 24 hours, I could
multiple each of those facet counts by the ratio of the corrisponding
counts from the past month to the number of articles from the past monght
see how much "hotter" they are in my smaller result set...

current result set facet counts (X)...
 News:1100
 Obama:1000
 Iran:800
 Miley Cyrus:700
 iPod:500

facet counts from the past month (Y), during which type 9000 (Z)
documents were published...
 News:9000
 Obama:7000
 Iran:1000
 Miley Cyrus:4000
 iPod:5000

X*(Z/Y)...
 Iran:7200
 Miley Cyrus:1575
 Obama:1285.7
 News:1100
 iPod:900


Doing this in a Solr plugin would be the best way to to this -- because
otherwise your "hot" terms might not even show up in the facet lists.
any attempt to do it on the client would just be an approximation, and
could easily miss the "hottest" item if it was just below cutoff for hte
number of constraints to be returned.


-Hoss


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to