Hi,

I am trying to use the DIH for crawling over some xml-files and xpathing
them and then access a db with the filename as a key. That works, but
reading ~30.000 docs would take almost 3h. When I looked at the
DIH-Debug-console it showed me, that way to many db-calls were made: 1 for
the 1st doc, then 2, 3, 4, ..

I tried different attributes combinations (eg stripped it to the minimum),
but still the same. 

This problem was asked before:
http://lucene.472066.n3.nabble.com/DIH-multiple-queries-per-sub-entity-tt701038.html

thanks a lot!

regards
Arne

--
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource 
        name="cr-db"
        jndiName="xyz"
        type="JdbcDataSource" />
    <dataSource 
        name="cr-xml" 
        type="FileDataSource" 
        encoding="utf-8" />


    <document name="doc">
        <entity 
            dataSource="cr-xml" 
            name="f" 
            processor="FileListEntityProcessor" 
            baseDir="/path/to/xml" 
            filename="*.xml" 
            recursive="true" 
            rootEntity="true" 
            onError="skip">
            <entity
                name="xml-data" 
                dataSource="cr-xml" 
                processor="XPathEntityProcessor" 
                forEach="/root" 
                url="${f.fileAbsolutePath}" 
                transformer="DateFormatTransformer" 
                onError="skip">
                <field column="id" xpath="/root/id" /> 

                <field column="A" xpath="/root/a" />
            </entity>

            <entity 
                name="db-data" 
                dataSource="cr-db"
                query="
                    SELECT  
                        id, b
                    FROM 
                        a_table
                    WHERE 
                        id = '${f.file}'">
                <field column="B" name="b" />
            </entity>
        </entity>
    </document>
</dataConfig>
--





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-dih-does-multiple-queries-for-sub-entities-tp4044522.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to