Hi Tom, I moderated your email in... you need to subscribe to prevent
your emails being blocked in the future.
http://incubator.apache.org/solr/mailing_lists.html

On 10/30/06, Tom <[EMAIL PROTECTED]> wrote:
I'd like to be able to limit the number of documents returned from
any particular group of documents, much as Google only shows a max of
two results from any one website.

You bring up an interesting problem that may be of general use.
Solr doesn't currently do this, but it should be possible (with some
work in the internals).

The docs are all marked as to which group they belong to. There will
probably be multiple groups returned from any search. Documents
belong to only one group

Documents belonging to only one group does make things easier.

I could just examine each returned document, and discard documents
from groups I have seen before, but that seems slow (but I'm not sure
there is a better alternative).

The number of groups is fairly high percentage of the number of
documents (maybe 5% of all documents), so building something like a
filter for each group doesn't seem feasible.

CustomHitCollector of some sort could work, but there is the comment
in the javadoc about "should not call  Searcher.doc(int)
or  IndexReader.document(int) on every  document number encountered."
which would seem to be necessary to get the group id.

Yes, a custom hit collector would work.  Searcher.doc() would be
deadly... but since each doc has at most one category, the FieldCache
could be used (it quickly maps id to field value and was historically
used for sorting).

It might be useful to see what Nutch does in this regard too.

-Yonik

Reply via email to