Hi Tom, I moderated your email in... you need to subscribe to prevent your emails being blocked in the future. http://incubator.apache.org/solr/mailing_lists.html
On 10/30/06, Tom <[EMAIL PROTECTED]> wrote:
I'd like to be able to limit the number of documents returned from any particular group of documents, much as Google only shows a max of two results from any one website.
You bring up an interesting problem that may be of general use. Solr doesn't currently do this, but it should be possible (with some work in the internals).
The docs are all marked as to which group they belong to. There will probably be multiple groups returned from any search. Documents belong to only one group
Documents belonging to only one group does make things easier.
I could just examine each returned document, and discard documents from groups I have seen before, but that seems slow (but I'm not sure there is a better alternative). The number of groups is fairly high percentage of the number of documents (maybe 5% of all documents), so building something like a filter for each group doesn't seem feasible. CustomHitCollector of some sort could work, but there is the comment in the javadoc about "should not call Searcher.doc(int) or IndexReader.document(int) on every document number encountered." which would seem to be necessary to get the group id.
Yes, a custom hit collector would work. Searcher.doc() would be deadly... but since each doc has at most one category, the FieldCache could be used (it quickly maps id to field value and was historically used for sorting). It might be useful to see what Nutch does in this regard too. -Yonik