Hello, Solrs

we are trying to filter out documents written by (one or more of) the authors 
from
a mediumish list (~2K). The document set itself is in the millions.

Apart from the obvious approach of building a huge OR-list and appending it
to the query, it seems that writing a Lucene[1] filter (or a SolrFilter[2]) 
seems
to suggest itself. In fact [3] seems to strongly encourage this approach.

Basically, as we understand it, the filter's method getDocIdSet gets called and 
is
fed with index segments, "one spoonful at a time". It then decides which docs
of the segment will be accepted, setting the corresponding bits in the result 
(in
our case, e.g. look up the document's author's name in a HashMap or something
like it).

Our first question is: how does it all fit together? Would be enough to write 
such a
class? How do I reference that in the SOLR configuration? In the query? A Lucene
Filter or a SolrFilter?

The problem is, we are experiencing very slow response times, in the order of
12 seconds for a query (the OR alternative, which we tested on a smallish author
list of aboug a couple of hundred is nearly-instantaneous).

Our second question is: are we on track with this? Intuition would say, of 
course,
that sifting sequentially through the index, checking each document for its 
author
*will* take its time. So may be the approach is doomed? Are there other, better
approaches?

Thanks for any pointers

------

[1] 
<https://builds.apache.org/job/Lucene-3.x/javadoc/all/org/apache/lucene/search/Filter.html?is-external=true>
[2] <http://lucene.apache.org/solr/api/org/apache/solr/search/SolrFilter.html>
[1] <http://wiki.apache.org/lucene-java/FilteringOptions>

-- tomás

Reply via email to