if your index is read-only in production, can you add mapping unique_id-Lucene docId in your kv store and and build filters externally? That would make unique Key obsolete in your production index, as you would work at lucene doc id level.
That way, you offline the problem to update/optimize phase. Ugly part is a lot of updates on your kv-store... I am not really familiar with solr, but working directly with lucene this is doable, even having parallel index that has unique ID as a stored field, and another index with indexed fields on update master, and than having only this index with indexed fields in production. On Fri, Oct 15, 2010 at 8:59 PM, Burton-West, Tom <tburt...@umich.edu>wrote: > Hi Jonathan, > > The advantages of the obvious approach you outline are that it is simple, > it fits in to the existing Solr model, it doesn't require any customization > or modification to Solr/Lucene java code. Unfortunately, it does not scale > well. We originally tried just what you suggest for our implementation of > Collection Builder. For a user's personal collection we had a table that > maps the collection id to the unique Solr ids. > Then when they wanted to search their collection, we just took their search > and added a filter query with the fq=(id:1 OR id:2 OR....). I seem to > remember running in to a limit on the number of OR clauses allowed. Even if > you can set that limit larger, there are a number of efficiency issues. > > We ended up constructing a separate Solr index where we have a multi-valued > collection number field. Unfortunately, until incremental field updating > gets implemented, this means that every time someone adds a document to a > collection, the entire document (including 700KB of OCR) needs to be > re-indexed just to update the collection number field. This approach has > allowed us to scale up to a total of something under 100,000 documents, but > we don't think we can scale it much beyond that for various reasons. > > I was actually thinking of some kind of custom Lucene/Solr component that > would for example take a query parameter such as &lookitUp=123 and the > component might do a JDBC query against a database or kv store and return > results in some form that would be efficient for Solr/Lucene to process. (Of > course this assumes that a JDBC query would be more efficient than just > sending a long list of ids to Solr). The other part of the equation is > mapping the unique Solr ids to internal Lucene ids in order to implement a > filter query. I was wondering if something like the unique id to Lucene id > mapper in zoie might be useful or if that is too specific to zoie. SoThis > may be totally off-base, since I haven't looked at the zoie code at all yet. > > In our particular use case, we might be able to build some kind of > in-memory map after we optimize an index and before we mount it in > production. In our workflow, we update the index and optimize it before we > release it and once it is released to production there is no > indexing/merging taking place on the production index (so the internal > Lucene ids don't change.) > > Tom > > > > -----Original Message----- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Friday, October 15, 2010 1:07 PM > To: solr-user@lucene.apache.org > Subject: RE: filter query from external list of Solr unique IDs > > Definitely interested in this. > > The naive obvious approach would be just putting all the ID's in the query. > Like fq=(id:1 OR id:2 OR....). Or making it another clause in the 'q'. > > Can you outline what's wrong with this approach, to make it more clear > what's needed in a solution? > ________________________________________ >