Hello, Solrs we are trying to filter out documents written by (one or more of) the authors from a mediumish list (~2K). The document set itself is in the millions.
Apart from the obvious approach of building a huge OR-list and appending it to the query, it seems that writing a Lucene[1] filter (or a SolrFilter[2]) seems to suggest itself. In fact [3] seems to strongly encourage this approach. Basically, as we understand it, the filter's method getDocIdSet gets called and is fed with index segments, "one spoonful at a time". It then decides which docs of the segment will be accepted, setting the corresponding bits in the result (in our case, e.g. look up the document's author's name in a HashMap or something like it). Our first question is: how does it all fit together? Would be enough to write such a class? How do I reference that in the SOLR configuration? In the query? A Lucene Filter or a SolrFilter? The problem is, we are experiencing very slow response times, in the order of 12 seconds for a query (the OR alternative, which we tested on a smallish author list of aboug a couple of hundred is nearly-instantaneous). Our second question is: are we on track with this? Intuition would say, of course, that sifting sequentially through the index, checking each document for its author *will* take its time. So may be the approach is doomed? Are there other, better approaches? Thanks for any pointers ------ [1] <https://builds.apache.org/job/Lucene-3.x/javadoc/all/org/apache/lucene/search/Filter.html?is-external=true> [2] <http://lucene.apache.org/solr/api/org/apache/solr/search/SolrFilter.html> [1] <http://wiki.apache.org/lucene-java/FilteringOptions> -- tomás