Todd,

With the DIH request, are you specifying "cacheDeletePriorData=false".  Looking 
at the BerkleyBackedCache code if this is set to true, it deletes the cache and 
assumes the current update is to fully repopulate it.  If you want to do an 
incremental update to the cache, it needs to be false.  You might also need to 
specify "clean=false", but I'm not sure if this is a requirement.

I've used DIH with BerkleyBackedCache for a few years and it works well for us. 
 But rather than using it inline, we have a number of DIH handlers that just 
build caches, then when they're all built, a final DIH joins data from the 
caches and indexes it to solr.  We also do like you are, with several handlers 
running at once, each doing part of the data.

But I have to warn you this code hasn't been maintained by anyone.  I'm using 
an older DIH jar (4.6) with newer solr.  I think there might have been an api 
change or something that prevented the uncommitted caching code from working 
with newer versions, but I honestly forget.  This is probably a viable solution 
if you don't want to write any code, but it might take some trial and error 
getting it to work.

James Dyer
Ingram Content Group


-----Original Message-----
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, November 17, 2015 8:11 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH Caching w/ BerkleyBackedCache

Mikhail Khludnev wrote
> It's worth to mention that for really complex relations scheme it might be
> challenging to organize all of them into parallel ordered streams.

This will most likely be the issue for us which is why I would like to have
the Berkley cache solution to fall back on, if possible. Again, I'm not sure
why but it appears that the Berkley cache is overwriting itself (i.e.
cleaning up unused data) when building the database... I've read plenty of
other threads where it appears folks are having success using that caching
solution.


Mikhail Khludnev wrote
> threads... you said? Which ones? Declarative parallelization in
> EntityProcessor worked only with certain 3.x version.

We are running multiple DIH instances which query against specific
partitions of the data (i.e. mod of the document id we're indexing).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to