Your approach sounds like well knows old school one http://nlp.stanford.edu/IR-book/html/htmledition/pseudo-relevance-feedback-1.html
I believe you can hack MLT and do what you need. I'm working on something like this, and there are a number of approaches. One of the simple one is build custom component, which will: * use docList (retrieved page) as a docset, and * will refine results by that top-docset and * count facets on that restricted results. * By analyzing these restricted counts you can get which category filter to apply. Sounds puzzling, isn't it? On Wed, May 16, 2012 at 12:53 PM, Samarendra Pratap <samarz...@gmail.com>wrote: > Thanks Sujit, Mikhail for you suggestions > > Sujit - > Continuing to do it at client side increases one extra cycle between server > and the client. > Moreover it does not remain centralized, so I may have to repeat client > side logic to multiple places, depending upon how it is implemented. > > Mikhail - > More Like This (mlt) is different than what I require. I am guessing the > best matching categories for a "set of documents" and then filtering > through only those categories, while mlt finds similar documents for > "individual" documents. There are two things which will not work for me - > > 1. Mlt suggests similar documents for individual documents, while I am > working on aggregated result set. > 2. Mlt is based on the same scoring mechanism which was not providing > relevant results and that's why I moved to this 2 query system > > > I was wondering that many people might have thought about this but did not > find if anybody worked on this. > Is it a bad idea? Or something which has repercussions (other than slightly > increased response time)? > > Thanks again > > > On Wed, May 16, 2012 at 11:58 AM, Mikhail Khludnev < > mkhlud...@griddynamics.com> wrote: > > > Hello, > > > > have you checked MoreLikeThis feature? > > > > On Tue, May 15, 2012 at 11:26 PM, Samarendra Pratap <samarz...@gmail.com > > >wrote: > > > > > - We are calculating frequency of category ids in these top results. > We > > > are not using facets because that gives count for all, relevant or > > > irrelevant, results. > > > - Based on category frequencies within top matching results we are > > > trying to find a few most frequent categories by simple calculation. > > Now > > > we > > > are very confident that these categories are the ones which best suit > > to > > > our query. > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > Tech Lead > > Grid Dynamics > > > > <http://www.griddynamics.com> > > <mkhlud...@griddynamics.com> > > > > > > -- > Regards, > Samar > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>