Todd, With the DIH request, are you specifying "cacheDeletePriorData=false". Looking at the BerkleyBackedCache code if this is set to true, it deletes the cache and assumes the current update is to fully repopulate it. If you want to do an incremental update to the cache, it needs to be false. You might also need to specify "clean=false", but I'm not sure if this is a requirement.
I've used DIH with BerkleyBackedCache for a few years and it works well for us. But rather than using it inline, we have a number of DIH handlers that just build caches, then when they're all built, a final DIH joins data from the caches and indexes it to solr. We also do like you are, with several handlers running at once, each doing part of the data. But I have to warn you this code hasn't been maintained by anyone. I'm using an older DIH jar (4.6) with newer solr. I think there might have been an api change or something that prevented the uncommitted caching code from working with newer versions, but I honestly forget. This is probably a viable solution if you don't want to write any code, but it might take some trial and error getting it to work. James Dyer Ingram Content Group -----Original Message----- From: Todd Long [mailto:lon...@gmail.com] Sent: Tuesday, November 17, 2015 8:11 AM To: solr-user@lucene.apache.org Subject: Re: DIH Caching w/ BerkleyBackedCache Mikhail Khludnev wrote > It's worth to mention that for really complex relations scheme it might be > challenging to organize all of them into parallel ordered streams. This will most likely be the issue for us which is why I would like to have the Berkley cache solution to fall back on, if possible. Again, I'm not sure why but it appears that the Berkley cache is overwriting itself (i.e. cleaning up unused data) when building the database... I've read plenty of other threads where it appears folks are having success using that caching solution. Mikhail Khludnev wrote > threads... you said? Which ones? Declarative parallelization in > EntityProcessor worked only with certain 3.x version. We are running multiple DIH instances which query against specific partitions of the data (i.e. mod of the document id we're indexing). -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html Sent from the Solr - User mailing list archive at Nabble.com.