On 8/7/2015 11:48 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 8/7/2015 8:56 AM, Davis, Daniel (NIH/NLM) [C] wrote: > > ... snip... > > Each document has an id I wish to use as the unique ID, but I also want to > > compute a signature. Is there some way I can use an > > updateRequestProcessorChain to throw away a document if its signature and > > document id match based on real-time get? > > My main Solr indexes are each generated from a MySQL database. One contains > over 100 million rows, another over 200 million. > A third contains about 18 million. Here's how we handle the requirement you > asked about: > > The main table has a delete id column that is its primary key. This is an > autoincrement column. There is another unique index > on another column in that table, which is the canonical unique identifier, > used as Solr's uniqueKey. > > The main table also has triggers for DELETE and UPDATE which add records to > the idx_delete table (contains delete id values) > and idx_reinsert table (contains unique key values). These extra tables each > have a primary key on an autoincrement column. > The build program (written in Java using SolrJ) tracks three values for every > update -- the last did value in the main table, and > the last id value in idx_delete and idx_reinsert.
Thanks, Shawn - this is a better solution, and I've used something similar with PostgreSQL in the past. I don't control the schema, but I can make the suggestion.