In my case, I want to filter out "duplicate" docs so that returned docs are unique w/ respect to a certain field (not the schema's unique field, of course): a "duplicate" doc here is one that has same value for a checksum field as one of the docs already in the results. It would be great if I could somehow express that w/ a query, but I don't think that would be possible.
On Thu, Feb 24, 2011 at 5:11 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote: > Hmm, depending on what you are actually needing to do, can you do it with a > simple fq param to filter out what you want filtered out, instead of needing > to write custom Java as you are suggesting? It would be a lot easier to just > use an fq. > > How would you describe the documents you want to filter from the query > results page? Can that description be represented by a Solr query you can > already represent using the lucene, dismax, or any other existing query? If > so, why not just use a negated fq describing what to omit from the results? > ________________________________________ > From: Babak Farhang [farh...@gmail.com] > Sent: Thursday, February 24, 2011 6:58 PM > To: solr-user > Subject: query results filter > > Hi everyone, > > I have some existing solr cores that for one reason or another have > documents that I need to filter from the query results page. > > I would like to do this inside Solr instead of doing it on the > receiving end, in the client. After searching the mailing list > archives and Solr wiki, it appears you do this by registering a custom > SearchHandler / SearchComponent with Solr. Still, I don't quite > understand how this machinery fits together. Any suggestions / ideas > / pointers much appreciated! > > Cheers, > -Babak > > ~~ > > Ideally, I'd like to find / code a solution that does the following: > > 1. A request handler that works like the StandardRequestHandler but > which allows an optional DocFilter (say, modeled like the > java.io.FileFilter interface) > 2. Allows current pagination to work transparently. > 3. Works transparently with distributed/sharded queries. >