You can cache the subentity, then it will retrieve all the data for that entity 
in 1 query.  

See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor for 
more information.  This section focuses on caching data from 
SQLEntityProcessor.  However, it is now possible to cache data from other 
entity types also.  Also, it is possible to plug in cache implementations if 
the default in-memory cache does not scale for you.  See 
https://issues.apache.org/jira/browse/SOLR-2382 .

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: harpax [mailto:a.psczo...@pan-sonic.de] 
Sent: Monday, March 04, 2013 8:49 AM
To: solr-user@lucene.apache.org
Subject: solr-dih does multiple queries for sub-entities

Hi,

I am trying to use the DIH for crawling over some xml-files and xpathing
them and then access a db with the filename as a key. That works, but
reading ~30.000 docs would take almost 3h. When I looked at the
DIH-Debug-console it showed me, that way to many db-calls were made: 1 for
the 1st doc, then 2, 3, 4, ..

I tried different attributes combinations (eg stripped it to the minimum),
but still the same. 

This problem was asked before:
http://lucene.472066.n3.nabble.com/DIH-multiple-queries-per-sub-entity-tt701038.html

thanks a lot!

regards
Arne

--
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource 
        name="cr-db"
        jndiName="xyz"
        type="JdbcDataSource" />
    <dataSource 
        name="cr-xml" 
        type="FileDataSource" 
        encoding="utf-8" />


    <document name="doc">
        <entity 
            dataSource="cr-xml" 
            name="f" 
            processor="FileListEntityProcessor" 
            baseDir="/path/to/xml" 
            filename="*.xml" 
            recursive="true" 
            rootEntity="true" 
            onError="skip">
            <entity
                name="xml-data" 
                dataSource="cr-xml" 
                processor="XPathEntityProcessor" 
                forEach="/root" 
                url="${f.fileAbsolutePath}" 
                transformer="DateFormatTransformer" 
                onError="skip">
                <field column="id" xpath="/root/id" /> 

                <field column="A" xpath="/root/a" />
            </entity>

            <entity 
                name="db-data" 
                dataSource="cr-db"
                query="
                    SELECT  
                        id, b
                    FROM 
                        a_table
                    WHERE 
                        id = '${f.file}'">
                <field column="B" name="b" />
            </entity>
        </entity>
    </document>
</dataConfig>
--





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-dih-does-multiple-queries-for-sub-entities-tp4044522.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to