Hi,

You are correct about not wanting to index everything every day, however for
this PoC i need a 'bootstrap' mechanism which basically does what Endeca
does.

The 'defaultRowPrefetch' in the solrconfig.xml does not seem to take, i'll
have a closer look.

With the long time, it appeard that one of the views i was reading was also
by far the biggest with over 4mln entries. Other views should take much less
time.

With regards to the parallel processing, i have the 2 classes you mention
and packaged them. The documentation in the patch was not clear on how to
exactly do that. My assumption is that
* for every entity you have to define a DIH in the solrconfig and refer to
aspecific data-config-<entity>.xml
* define 1 importhandler for the join in the solrconfig 
* what isn't clear is how a data-config-<entity>.xml should look like (for
example, i see no reference in the documention to a cacheName)
* and how the data-config-join.xml should should look like

My first attempt:
the data-config-products.xml (parent)
<dataSource name=&quot;jdbc1&quot;
driver=&quot;oracle.jdbc.driver.OracleDriver&quot;
url=&quot;jdbc:oracle:thin:@//&lt;host>:1521/ENDDEV" user="un"
password="pw"/>
        <document>
                <entity name="END_FRG_PRODUCTS_VW" 
                        processor="SqlEntityProcessor"
                        
cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
                        
writerImpl="org.apache.solr.handler.dataimport.DIHCacheWriter"
                        dataSource="jdbc1"
                        rootEntity="true"
                        persistCacheName="PRODUCTS" 
                        persistCacheBaseDir="d:\cacheloc"
                        berkleyInternalCacheSize="1000000"
        
persistCacheFieldNames="PDT_ID,SEARCH_TITLE,PDT_GLOBAL_ID,PDT_EAN_CODE,PDT_TYP_CODE,PDT_AVAILABILITY,AVAIL_CODE_OFF_STOCK,AVAIL_CODE_ON_STOCK,OFFER_TYPE"
                
persistCacheFieldTypes="STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING"
                        query="select
PDT_ID,SEARCH_TITLE,PDT_GLOBAL_ID,PDT_EAN_CODE,PDT_TYP_CODE,PDT_AVAILABILITY,AVAIL_CODE_OFF_STOCK,AVAIL_CODE_ON_STOCK,OFFER_TYPE
from END_FRG_PRODUCTS_VW">
                </entity>
        </document>

the data-config-features (child):
 <dataSource name=&quot;jdbc1&quot;
driver=&quot;oracle.jdbc.driver.OracleDriver&quot;
url=&quot;jdbc:oracle:thin:@//&lt;host>:1521/ENDDEV" user="un" password="pw"
batchSize="20000"/>
        
        <document>
                <entity name="END_FRG_FEATURES_VW"
                        processor="SqlEntityProcessor"
                        
cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
                        
writerImpl="org.apache.solr.handler.dataimport.DIHCacheWriter"
                        persistCacheName="FEATURE" 
                        persistCacheBaseDir="d:\cacheloc"
                        berkleyInternalCacheSize="1000000"
                        persistCacheFieldNames="PDT_ID,PDT_FEATURES"
                        persistCacheFieldTypes="STRING,STRING"
                        berkleyInternalShared="true"
                        cacheKey="PDT_ID"
                        cacheLookup="END_FRG_PRODUCTS_VW.PDT_ID"
                        dataSource="jdbc1"                      
                        query="select PDT_ID, PDT_FEATURES from
END_FRG_FEATURES_VW"/>
        </document>

the data-config-join.xml
<entity name=&quot;END_FRG_PRODUCTS_VW&quot; 
                
processor=&quot;org.apache.solr.handler.dataimport.DIHCacheProcessor&quot;
                        rootEntity=&quot;true&quot;
                        name=&quot;PARENT&quot;
                
persistCacheFieldNames=&quot;PDT_ID,SEARCH_TITLE,PDT_GLOBAL_ID,PDT_EAN_CODE,PDT_TYP_CODE,PDT_AVAILABILITY,AVAIL_CODE_OFF_STOCK,AVAIL_CODE_ON_STOCK,OFFER_TYPE&quot;
                
persistCacheFieldTypes=&quot;STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING&quot;
                        &lt;entity name=&quot;END_FRG_FEATURES_VW&quot;
                        
processor=&quot;org.apache.solr.handler.dataimport.DIHCacheProcessor&quot;
                        
cacheImpl=&quot;org.apache.solr.handler.dataimport.BerkleyBackedCache&quot;
                                persistCacheName=&quot;FEATURE&quot; 
                                persistCacheBaseDir=&quot;d:\cacheloc&quot;
                                berkleyInternalCacheSize=&quot;1000000&quot;
                                
persistCacheFieldNames=&quot;PDT_ID,PDT_FEATURES&quot;
                                persistCacheFieldTypes=&quot;STRING,STRING&quot;
                                berkleyInternalShared=&quot;true&quot;
                                cacheKey=&quot;PDT_ID&quot;
                                
cacheLookup=&quot;END_FRG_PRODUCTS_VW.PDT_ID&quot;/>

Is this a correct setup? Hope you can give some pointers.

Thanks,
Maarten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4020727.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to