Hi -
On 10/30/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Yes, a custom hit collector would work. Searcher.doc() would be
> deadly... but since each doc has at most one category, the FieldCache
> could be used (it quickly maps id to field value and was historically
> used for sorting).
Not to be dense, but how do I use a custom HitCollector with Solr?
I've checked the wiki, and searched the mailing list, and don't see
anything. Is there a way to configure this, or do I just build a
custom version of Solr?
I have no problems doing this in Lucene, but I'm not quite sure where
to configure/code this in Solr.
Thanks,
Tom
On 10/30/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Hi Tom, I moderated your email in... you need to subscribe to prevent
> your emails being blocked in the future.
Thanks. That's fixed, I hope. I was using the wrong address.
> http://incubator.apache.org/solr/mailing_lists.html
>
> On 10/30/06, Tom <[EMAIL PROTECTED]> wrote:
> > I'd like to be able to limit the number of documents returned from
> > any particular group of documents, much as Google only shows a max of
> > two results from any one website.
>
> You bring up an interesting problem that may be of general use.
> Solr doesn't currently do this, but it should be possible (with some
> work in the internals).
>
> > The docs are all marked as to which group they belong to. There will
> > probably be multiple groups returned from any search. Documents
> > belong to only one group
>
> Documents belonging to only one group does make things easier.
>
> > I could just examine each returned document, and discard documents
> > from groups I have seen before, but that seems slow (but I'm not sure
> > there is a better alternative).
> >
> > The number of groups is fairly high percentage of the number of
> > documents (maybe 5% of all documents), so building something like a
> > filter for each group doesn't seem feasible.
> >
> > CustomHitCollector of some sort could work, but there is the comment
> > in the javadoc about "should not call Searcher.doc(int)
> > or IndexReader.document(int) on every document number encountered."
> > which would seem to be necessary to get the group id.
>
> Yes, a custom hit collector would work. Searcher.doc() would be
> deadly... but since each doc has at most one category, the FieldCache
> could be used (it quickly maps id to field value and was historically
> used for sorting).
>
> It might be useful to see what Nutch does in this regard too.
>
> -Yonik
>