Here are things I would try:
- You need to package the patch from SOLR-2943 in your jar as well as SOLR-2613 
(to get the class DIHCachePersistCacheProperties)

- You need to specify "cacheImpl", not "persistCacheImpl"

- You are correct using "persistCacheName" & "persistCacheBaseDir" , contra the 
test case for which these parameters are extraneous and are out-of-date. 

- I wouldn't cache the parent entity, just the child.

- Don't specify persistCachePartitionNumber unless you're actually trying to 
partition your caches (I wouldn't try this at first).

What will happen is it will loop through the resultset of the parent, 
document-by-document.  At the first iteration, it will note that the child 
entity's cache hasn't been initalized and it will build a cache for it. Then, 
for each iteration, it pulls out of the cache for the child while looping the 
resultset for the parent.

Hopefully this will work better for yu.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: mroosendaal [mailto:mroosend...@yahoo.com] 
Sent: Friday, November 09, 2012 12:39 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH nested entities don't work

Hi James,

What i did:
* build a jar from the patch
* downloaded the BDB library
* added them to my classpath
* download a nightly 4.1 Sol build
* created a db config according to:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestEphemeralCache.java

although i got things working, after 2 hours of indexing i stopped the
proces. For that amount of data it took endeca 1h15. After looking at some
of the tests in the patch i configured the data-config.xml as follows:
<document>
                <entity name="END_FRG_PRODUCTS_VW" 
                        processor="SqlEntityProcessor"
                
persistCacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache" 
                        persistCacheName="END_FRG_PRODUCTS_VW"
                        persistCachePartitionNumber="0"
                        persistCacheBaseDir="d:\cacheloc"
                        berkleyInternalCacheSize="1000000"                      
                        berkleyInternalShared="true"
                        query="select PDT_ID, SEARCH_TITLE from 
END_FRG_PRODUCTS_VW">
                        <entity name="END_FRG_FEATURES_VW"
                                processor="SqlEntityProcessor"
                                
persistCacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
                                persistCacheName="FEATURE"
                                cacheKey="PDT_ID"
                                cacheLookup="END_FRG_PRODUCTS_VW.PDT_ID"
                                berkleyInternalCacheSize="1000000"              
        
                                berkleyInternalShared="true"
                                persistCacheBaseDir="d:\cacheloc"
                                query="select * from END_FRG_FEATURES_VW"/>
                </entity>
        </document>

Although different in behaviour:
[snapshot from the indexing after 8 minutes: Requests: 2899, Fetched:
28974398, Skipped: 0, Processed: 2258] it was still slow and the parameter
'persistCacheBaseDir' has no effect. The difference in behaviour from the
previous is that it had only 2 requests and hadn't processed anything after
2 hours.

Hope you can help me.

Thanks,
Maarten




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4019223.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to