Hi,
You are correct about not wanting to index everything every day, however for
this PoC i need a 'bootstrap' mechanism which basically does what Endeca
does.
The 'defaultRowPrefetch' in the solrconfig.xml does not seem to take, i'll
have a closer look.
With the long time, it appeard that one of the views i was reading was also
by far the biggest with over 4mln entries. Other views should take much less
time.
With regards to the parallel processing, i have the 2 classes you mention
and packaged them. The documentation in the patch was not clear on how to
exactly do that. My assumption is that
* for every entity you have to define a DIH in the solrconfig and refer to
aspecific data-config-<entity>.xml
* define 1 importhandler for the join in the solrconfig
* what isn't clear is how a data-config-<entity>.xml should look like (for
example, i see no reference in the documention to a cacheName)
* and how the data-config-join.xml should should look like
My first attempt:
the data-config-products.xml (parent)
<dataSource name="jdbc1"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@//<host>:1521/ENDDEV" user="un"
password="pw"/>
<document>
<entity name="END_FRG_PRODUCTS_VW"
processor="SqlEntityProcessor"
cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
writerImpl="org.apache.solr.handler.dataimport.DIHCacheWriter"
dataSource="jdbc1"
rootEntity="true"
persistCacheName="PRODUCTS"
persistCacheBaseDir="d:\cacheloc"
berkleyInternalCacheSize="1000000"
persistCacheFieldNames="PDT_ID,SEARCH_TITLE,PDT_GLOBAL_ID,PDT_EAN_CODE,PDT_TYP_CODE,PDT_AVAILABILITY,AVAIL_CODE_OFF_STOCK,AVAIL_CODE_ON_STOCK,OFFER_TYPE"
persistCacheFieldTypes="STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING"
query="select
PDT_ID,SEARCH_TITLE,PDT_GLOBAL_ID,PDT_EAN_CODE,PDT_TYP_CODE,PDT_AVAILABILITY,AVAIL_CODE_OFF_STOCK,AVAIL_CODE_ON_STOCK,OFFER_TYPE
from END_FRG_PRODUCTS_VW">
</entity>
</document>
the data-config-features (child):
<dataSource name="jdbc1"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@//<host>:1521/ENDDEV" user="un" password="pw"
batchSize="20000"/>
<document>
<entity name="END_FRG_FEATURES_VW"
processor="SqlEntityProcessor"
cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
writerImpl="org.apache.solr.handler.dataimport.DIHCacheWriter"
persistCacheName="FEATURE"
persistCacheBaseDir="d:\cacheloc"
berkleyInternalCacheSize="1000000"
persistCacheFieldNames="PDT_ID,PDT_FEATURES"
persistCacheFieldTypes="STRING,STRING"
berkleyInternalShared="true"
cacheKey="PDT_ID"
cacheLookup="END_FRG_PRODUCTS_VW.PDT_ID"
dataSource="jdbc1"
query="select PDT_ID, PDT_FEATURES from
END_FRG_FEATURES_VW"/>
</document>
the data-config-join.xml
<entity name="END_FRG_PRODUCTS_VW"
processor="org.apache.solr.handler.dataimport.DIHCacheProcessor"
rootEntity="true"
name="PARENT"
persistCacheFieldNames="PDT_ID,SEARCH_TITLE,PDT_GLOBAL_ID,PDT_EAN_CODE,PDT_TYP_CODE,PDT_AVAILABILITY,AVAIL_CODE_OFF_STOCK,AVAIL_CODE_ON_STOCK,OFFER_TYPE"
persistCacheFieldTypes="STRING,STRING,STRING,STRING,STRING,STRING,STRING,STRING"
<entity name="END_FRG_FEATURES_VW"
processor="org.apache.solr.handler.dataimport.DIHCacheProcessor"
cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
persistCacheName="FEATURE"
persistCacheBaseDir="d:\cacheloc"
berkleyInternalCacheSize="1000000"
persistCacheFieldNames="PDT_ID,PDT_FEATURES"
persistCacheFieldTypes="STRING,STRING"
berkleyInternalShared="true"
cacheKey="PDT_ID"
cacheLookup="END_FRG_PRODUCTS_VW.PDT_ID"/>
Is this a correct setup? Hope you can give some pointers.
Thanks,
Maarten
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4020727.html
Sent from the Solr - User mailing list archive at Nabble.com.