On 3/14/2012 12:58 PM, KeesSchepers wrote:
1. I wipe the reindex core
2. I run the DIH to the complete dataset (4 million documents) in peices of
20.000 records (to prevent very long mysql locks)
3. After the DIH is finished (2 hours) we have to also have to update the
rebuild core with changes from the last two hours, this is a problem
4. After updating is done and the core is not more then some seconds behind
we want to SWAP the cores.
Everything goes well except for step 3. The rebuild and the core swap is all
okay.
Because the website is undergoing changes every minute we cannot pauze the
delta-import on the live and walk behind for 2 hours. The problem is that I
can't figure out a closing system with not delaying the live core to long
and use the DIH instead of writing a lot of code.
I solve this problem by tracking the current position outside of the
database and Solr, in my build system. The primary key on the mysql
table is an autoincrement BIGINT field - I just keep track of the last
value that was added. When a rebuild happens, I continue to track it
for the live cores, then when it comes time to swap, I restore the
tracked value to what it was when the rebuild started. You can put
arbitrary parameters on your DIH url and have DIH construct your query
with them.
I used to have a build system written in Perl that did *all* index
activity with DIH. Now I have a Java build system that only uses DIH
for full index rebuilds. It uses SolrJ for everything else.
Thanks,
Shawn