Yonik Seeley wrote:
On 1/3/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
thanks. Yes, the presentation layer could group results, but that is
not practical if i want to show the first 20 results out of 200,000
matches.
Nutch groups the results by site. Any idea how they do it?
Good question.
Off the top of my head, one could use a priority queue that can change
it's size dynamically. One could increment a group count for each hit
(like faceted search with the FieldCache) and if the group count
exceeds "n", then you increment the size of the priority queue to
allow an additional item to be collected to compensate.
-Yonik
You might as wheel say that I have to change the dilithium crystals in the flux
capacitor :-)
One of the reasons I like Solr so much is because I get impressive results
without having to know Lucene, which is something that will have to change
because I also need this feature.
Not knowing much about the internal of Solr/Lucene I had a look at the Facet
code in search of ideas, but from what I could see the facet counts are
calculated after the Documents are added to the response, it seems to me that
any kind of grouping has to be done before that... right?
Could you explain in more detail where should I look?
Can the TopFieldDocCollector/TopFieldDocs classes be used to this end?
I'm immersing my self on Lucene but it will take some time.
Side note: Over here, beside Solr, we also use the "FAST" search platform and
they call this feature "Field collapsing":
<http://www.fastsearch.com/glossary.aspx?m=48&amid=299>
I like the syntax they use:
"&collapseon=<fieldname>&collapsenum=N" -> Collapse, but keep N number of
collapsed documents
For some reason they can only collapse on numeric fields (int32).
Regards,
Luis Neves