Asif Rahman wrote:
Hi Grant,

I'll give a real life example of the problem that we are trying to solve.

We index a large number of current news articles on a continuing basis.  We
tag these articles with news topics (e.g. Barack Obama, Iran, etc.).  We
then use these tags to facet our queries.  For example, we might issue a
query for all articles in the last 24 hours.  The facets would then tell us
which news topics have been written about the most in that period.  The
problem is that "Barack Obama", for example, is always written about in high
frequency, as opposed to "Iran" which is currently very hot in the news, but
which has not always been the case.  In this case, we'd like to see "Iran"
show up higher than "Barack Obama" in the facet results.


your not looking for a IDF based function.
you need to figure out what a 'normal' amount of news flow for a given topic is and then determine when an abnormal amount is happening.
note.. that an abnormal amount is positive or negative.
we use a similar method to this on http://love.com, so we know for example something is going on with Ed McMahon as I type.

I wouldn't be looking at using SOLR to do this kind of thing btw. try something like esper. I think it might hold some promise to this kind of thing (esper is a open source stream database).

Regards

To me, this seems identical to the tf-idf scoring expression that is used in
normal search.  The facet count is analogous to the tf and I can access the
facet term idf's through the Similarity API.

Is my reasoning sound?  Can you provide any guidance as to the best way to
implement this?

Thanks for your help,

Asif


On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll <gsing...@apache.org>wrote:

On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote:

 Hi again,
I guess nobody has used facets in the way I described below before.  Do
any
of the experts have any ideas as to how to do this efficiently and
correctly?  Any thoughts would be greatly appreciated.

Thanks,

Asif

On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman <a...@newscred.com> wrote:

 Hi all,
We have an index of news articles that are tagged with news topics.
Currently, we use solr facets to see which topics are popular for a given
query or time period.  I'd like to apply the concept of IDF to the facet
counts so as to penalize the topics that occur broadly through our index.
I've begun to write custom facet component that applies the IDF to the
facet
counts, but I also wanted to check if anyone has experience using facets
in
this way.

I'm not sure I'm following.  Would you be faceting on one field, but using
the DF from some other field?  Faceting is already a count of all the
documents that contain the term on a given field for that search.  If I'm
understanding, you would still do the typical faceting, but then rerank by
the global DF values, right?

Backing up, what is the problem you are seeing that you are trying to
solve?

I think you could do this, but you'd have to hook it in yourself.  By
penalize, do you mean remove, or just have them in the sort?  Generally
speaking, looking up the DF value can be expensive, especially if you do a
lot of skipping around.  I don't know how pluggable the sort capabilities
are for faceting, but that might be the place to start if you are just
looking at the sorting options.



--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search





Reply via email to