Hello,

It seem you are talking about huge disjunctive filter: fq=id:(1 2 3 4
5....). I have two suggestions.

I. don't do that.  It has the following drawbacks:
* it takes too much to parse such long query. as result, search will
cost O(query-len) instead of O(numFound) (without scoring/sorting)
* it hits BooleanQuery.TooManyClauses  exception
* this huge BooleanQuery is used as a key in Solr filter cache, that's
bad you know. Because, even cache hit cost you O(n^2) due to
straightforward equals().

The proper solution is bringing your first search stage, which gives
you  ids list, into Solr. Assuming you have some kind of external
index, which maps some short key e.g. category-id into set of ids. You
need index that category field by Solr, and request short filter query
fq=catId:666 instead the huge one.

II. some time ago I deal with this challenge, beside of the query
parsing though. The proper approach is implement your own
org.apache.lucene.search.MultiTermQuery and back in onto list of
sorted ids encoded by vint. It gives you fast equals(). Then you'll
need to implement own queryparser which will decode that vint vector.
And your app should form properly encoded filter query. But the length
is limited by url length. see approach I.

Regards


On Fri, Jan 6, 2012 at 9:55 AM, solr_noob <diversau...@gmail.com> wrote:
> Hello,
>
> I'm new to SOLR. I am facing the same set of problem to solve. The idea is
> to search for key phrase(s) within a set of documents. I understand the
> query syntax somewhat. What if the list of document ids to search gets to
> about say, 10000 documents? what is the best way to craft the query?
>
> so it would be,in relational DB
>
>    SELECT * FROM documents WHERE query ='search term' and document_id in
> [.............];
>
> Thanks :)
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3637150.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

Reply via email to