Solr - DataImportHandler - Large Dataset results ?

Kay Kay Fri, 12 Dec 2008 12:50:39 -0800

As per the example in the wiki - http://wiki.apache.org/solr/DataImportHandler  
- I am seeing the following fragment.


<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" 
user="sa" />
    <document name="products">
        <entity name="item" query="select * from item">
            <field column="ID" name="id" />
            <field column="NAME" name="name" />
              ......................
    </entity>
</document>
</dataSource>

My scaled-down application looks very similar along these lines but where my 
resultset is so big that it cannot fit within main memory by any chance. 

So I was planning to split this single query into multiple subqueries - with 
another conditional based on the id . ( id < 0 and id > 100 , say ) . 

I am curious if there is any way to specify another conditional clause , 
(<splitData Column = "id"  batch="10000" />, where the column is supposed to be 
an integer value) - and internally , the implementation could actually generate 
the subqueries - 

i) get the min , max of the numeric column , and send queries to the database 
based on the batch size 

ii) Add Documents for each batch and close the resultset . 

This might end up putting more load on the database (but at least the dataset 
would fit in the main memory ). 

Let me know if anyone else had run into similar issues and how this was 
encountered.

Solr - DataImportHandler - Large Dataset results ?

Reply via email to