Your approach sounds like well knows old school one
http://nlp.stanford.edu/IR-book/html/htmledition/pseudo-relevance-feedback-1.html

I believe you can hack MLT and do what you need.
I'm working on something like this, and there are a number of approaches.

One of the simple one is build custom component, which will:
* use docList (retrieved page) as a docset, and
* will refine results by that top-docset and
* count facets on that restricted results.
* By analyzing these restricted counts you can get which category filter to
apply.
Sounds puzzling, isn't it?

On Wed, May 16, 2012 at 12:53 PM, Samarendra Pratap <samarz...@gmail.com>wrote:

> Thanks Sujit, Mikhail for you suggestions
>
> Sujit -
> Continuing to do it at client side increases one extra cycle between server
> and the client.
> Moreover it does not remain centralized, so I may have to repeat client
> side logic to multiple places, depending upon how it is implemented.
>
> Mikhail -
> More Like This (mlt) is different than what I require. I am guessing the
> best matching categories for a "set of documents" and then filtering
> through only those categories, while mlt finds similar documents for
> "individual" documents. There are two things which will not work for me -
>
>   1. Mlt suggests similar documents for individual documents, while I am
>   working on aggregated result set.
>   2. Mlt is based on the same scoring mechanism which was not providing
>   relevant results and that's why I moved to this 2 query system
>
>
> I was wondering that many people might have thought about this but did not
> find if anybody worked on this.
> Is it a bad idea? Or something which has repercussions (other than slightly
> increased response time)?
>
> Thanks again
>
>
> On Wed, May 16, 2012 at 11:58 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > Hello,
> >
> > have you checked MoreLikeThis feature?
> >
> > On Tue, May 15, 2012 at 11:26 PM, Samarendra Pratap <samarz...@gmail.com
> > >wrote:
> >
> > >   - We are calculating frequency of category ids in these top results.
> We
> > >   are not using facets because that gives count for all, relevant or
> > >   irrelevant, results.
> > >   - Based on category frequencies within top matching results we are
> > >   trying to find a few most frequent categories by simple calculation.
> > Now
> > > we
> > >   are very confident that these categories are the ones which best suit
> > to
> > >   our query.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Tech Lead
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mkhlud...@griddynamics.com>
> >
>
>
>
> --
> Regards,
> Samar
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to