RE: DIH nested entities don't work

Dyer, James Thu, 15 Nov 2012 12:32:13 -0800

Depending on how much data you're pulling back, 2 hours might be a reasonable 
amount of time.  Of course if you had it a lot faster with Endeca & Forge, I 
can understand your questioning this. Keep in mind that the way you're setting 
up, it will build each cache, 1 at a time.  I'm pretty sure Forge does them 
serially like this also unless you use complicated tricks around it.  Likewise 
for DIH, there is a way to build your caches in parallel by setting up multiple 
DIH handlers to first build your caches, then a final handler to index the 
pre-cached data.  You need DIHCacheWriter and DIHCacheProcessor from SOLR-2943.


The default for berkleyInternalCacheSize is 2% of your JVM heap.  You might get 
better performance increasing this, but then again you might find that 2% of 
heap is way plenty big enough and you should just make it smaller to conserve 
memory.  This parameter takes bytes, so use 100000 for 100k, etc.

I think the file size is hardcoded to 1gb, so if you're getting 9 files, it 
means your query is pulling back more than 8gb of data. Sound right?

To get the "defaultRowPrefetch", try putting this in the <defaults /> section 
under <requestHandler name="/dataimport" ... /> in solrconfig.xml.  Based on a 
quick review of the code, it seems that it will only honor jdbc parameters if 
they are in "defaults".

Also keep in mind that Lucene/Solr handle updates really well and with the size 
of your data, you likely will want to use delta updates rather than re-index 
all the time.  If so, then perhaps the total time to pull back everything isn't 
going to matter quite as much?  To implement delta updates with DIH in your 
case, I'd recommend the approach outlined here: 
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ... (you 
can still use bdb-je for caches if it still makes sense depending on how big 
the deltas are)

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: mroosendaal [mailto:mroosend...@yahoo.com] 
Sent: Thursday, November 15, 2012 8:52 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH nested entities don't work

Hi James,

Just gave it a go and it worked! That's the good news. The problem now is
getting it to work faster. It took over 2 hours just to index 4 views and i
need to get information from 26.

I tried adding the defaultRowPrefetch="20000" as a jdbc parameter but it
does not seem to honour that. It should work because it is part of the
oracle jdbc driver but there's no mention of it in the Solr documentation.

Would it also help to increase the berkleyInternalCacheSize? For
'CATEGORIES' it creates 9 'files'.

Thanks,
Maarten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4020503.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DIH nested entities don't work

Reply via email to