Depending on how much data you're pulling back, 2 hours might be a reasonable amount of time. Of course if you had it a lot faster with Endeca & Forge, I can understand your questioning this. Keep in mind that the way you're setting up, it will build each cache, 1 at a time. I'm pretty sure Forge does them serially like this also unless you use complicated tricks around it. Likewise for DIH, there is a way to build your caches in parallel by setting up multiple DIH handlers to first build your caches, then a final handler to index the pre-cached data. You need DIHCacheWriter and DIHCacheProcessor from SOLR-2943.
The default for berkleyInternalCacheSize is 2% of your JVM heap. You might get better performance increasing this, but then again you might find that 2% of heap is way plenty big enough and you should just make it smaller to conserve memory. This parameter takes bytes, so use 100000 for 100k, etc. I think the file size is hardcoded to 1gb, so if you're getting 9 files, it means your query is pulling back more than 8gb of data. Sound right? To get the "defaultRowPrefetch", try putting this in the <defaults /> section under <requestHandler name="/dataimport" ... /> in solrconfig.xml. Based on a quick review of the code, it seems that it will only honor jdbc parameters if they are in "defaults". Also keep in mind that Lucene/Solr handle updates really well and with the size of your data, you likely will want to use delta updates rather than re-index all the time. If so, then perhaps the total time to pull back everything isn't going to matter quite as much? To implement delta updates with DIH in your case, I'd recommend the approach outlined here: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ... (you can still use bdb-je for caches if it still makes sense depending on how big the deltas are) James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: mroosendaal [mailto:mroosend...@yahoo.com] Sent: Thursday, November 15, 2012 8:52 AM To: solr-user@lucene.apache.org Subject: RE: DIH nested entities don't work Hi James, Just gave it a go and it worked! That's the good news. The problem now is getting it to work faster. It took over 2 hours just to index 4 views and i need to get information from 26. I tried adding the defaultRowPrefetch="20000" as a jdbc parameter but it does not seem to honour that. It should work because it is part of the oracle jdbc driver but there's no mention of it in the Solr documentation. Would it also help to increase the berkleyInternalCacheSize? For 'CATEGORIES' it creates 9 'files'. Thanks, Maarten -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4020503.html Sent from the Solr - User mailing list archive at Nabble.com.