Re: Solr - DataImportHandler - Large Dataset results ?

Bryan Talbot Fri, 12 Dec 2008 14:27:27 -0800

It only supports streaming if properly enabled which is completelylame: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

By default, ResultSets are completely retrieved and stored inmemory. In most cases this is the most efficient way to operate, anddue to the design of the MySQL network protocol is easier toimplement. If you are working with ResultSets that have a large numberof rows or large values, and can not allocate heap space in your JVMfor the memory required, you can tell the driver to stream the resultsback one row at a time.

To enable this functionality, you need to create a Statement instancein the following manner:


stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetchsize of Integer.MIN_VALUE serves as a signal to the driver to streamresult sets row-by-row. After this any result sets created with thestatement will be retrieved row-by-row.




-Bryan




On Dec 12, 2008, at Dec 12, 2:15 PM, Kay Kay wrote:

I am using MySQL. I believe (since MySQL 5) supports streaming.
On more about streaming - can we assume that when the databasedriver supports streaming , the resultset iterator is a forwarddirectional iterator.
If , say the streaming size is 10K records and we are trying toretrieve a total of 100K records - what exactly happens when thethreshold is reached , (say , the first 10K records were retrieved ).
Are the previous set of records thrown away and replaced in memoryby the new batch of records.
--- On Fri, 12/12/08, Shalin Shekhar Mangar <shalinman...@gmail.com>wrote:
From: Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject: Re: Solr - DataImportHandler - Large Dataset results ?
To: solr-user@lucene.apache.org
Date: Friday, December 12, 2008, 9:41 PM

DataImportHandler is designed to stream rows one by one to create Solr
documents. As long as your database driver supports streaming, youshould be
fine. Which database are you using?
On Sat, Dec 13, 2008 at 2:20 AM, Kay Kay <kaykay.uni...@yahoo.com>wrote:
As per the example in the wiki -
http://wiki.apache.org/solr/DataImportHandler - I am seeing thefollowing
fragment.

<dataSource driver="org.hsqldb.jdbcDriver"
url="jdbc:hsqldb:/temp/example/ex" user="sa" />
  <document name="products">
      <entity name="item" query="select * from
item">
          <field column="ID" name="id" />
          <field column="NAME" name="name" />
            ......................
  </entity>
</document>
</dataSource>
My scaled-down application looks very similar along these lines butwhere
my resultset is so big that it cannot fit within main memory by any
chance.
So I was planning to split this single query into multiplesubqueries -
with another conditional based on the id . ( id < 0 and id > 100 ,
say ) .
I am curious if there is any way to specify another conditionalclause ,
(<splitData Column = "id"  batch="10000" />,
where the column is supposed to
be an integer value) - and internally , the implementation couldactually
generate the subqueries -

i) get the min , max of the numeric column , and send queries to the
database based on the batch size

ii) Add Documents for each batch and close the resultset .

This might end up putting more load on the database (but at least the
dataset would fit in the main memory ).
Let me know if anyone else had run into similar issues and how thiswas
encountered.
--
Regards,
Shalin Shekhar Mangar.

Re: Solr - DataImportHandler - Large Dataset results ?

Reply via email to