On 12/5/2011 6:57 PM, Jamie Johnson wrote:
Question which is a bit off topic.  You mention your algorithm for
sharding, how do you handle updates or do you not have to deal with
that in your scenario?

I have a long running program based on SolrJ that handles updates. Once a minute, I run through an update cycle, which consists of deletes, document reinserts, and inserting new content. The data is pulled from a mysql database with the sharding algorithm specified as part of the mysql query. I keep track of which shards actually received changes, so that I do not do unnecessary commits.

For a full reindex, the build program sets up a separate thread, which uses the dataimporter on a set of build cores, then swaps them with the live cores. The algorithm is in the SQL entity in dih-config.conf, passing parameters in via the URL.

Thanks,
Shawn

Reply via email to