Hi, I am trying to use the DIH for crawling over some xml-files and xpathing them and then access a db with the filename as a key. That works, but reading ~30.000 docs would take almost 3h. When I looked at the DIH-Debug-console it showed me, that way to many db-calls were made: 1 for the 1st doc, then 2, 3, 4, ..
I tried different attributes combinations (eg stripped it to the minimum), but still the same. This problem was asked before: http://lucene.472066.n3.nabble.com/DIH-multiple-queries-per-sub-entity-tt701038.html thanks a lot! regards Arne -- <?xml version="1.0" encoding="UTF-8"?> <dataConfig> <dataSource name="cr-db" jndiName="xyz" type="JdbcDataSource" /> <dataSource name="cr-xml" type="FileDataSource" encoding="utf-8" /> <document name="doc"> <entity dataSource="cr-xml" name="f" processor="FileListEntityProcessor" baseDir="/path/to/xml" filename="*.xml" recursive="true" rootEntity="true" onError="skip"> <entity name="xml-data" dataSource="cr-xml" processor="XPathEntityProcessor" forEach="/root" url="${f.fileAbsolutePath}" transformer="DateFormatTransformer" onError="skip"> <field column="id" xpath="/root/id" /> <field column="A" xpath="/root/a" /> </entity> <entity name="db-data" dataSource="cr-db" query=" SELECT id, b FROM a_table WHERE id = '${f.file}'"> <field column="B" name="b" /> </entity> </entity> </document> </dataConfig> -- -- View this message in context: http://lucene.472066.n3.nabble.com/solr-dih-does-multiple-queries-for-sub-entities-tp4044522.html Sent from the Solr - User mailing list archive at Nabble.com.