I noticed enormous number of commits, which reasonably triggers merges that
hits OOMe. Try to disable autocommits completely. Monitor commit
occurrences in the log.


On Sun, Apr 20, 2014 at 9:12 PM, Candygram For Mongo <
candygram.for.mo...@gmail.com> wrote:

> We have tried using fetchSize and we still got the same out of memory
> errors.
>
>
> On Fri, Apr 18, 2014 at 9:39 PM, Shawn Heisey <s...@elyograg.org> wrote:
>
> > On 4/18/2014 6:15 PM, Candygram For Mongo wrote:
> > > We are getting Out Of Memory errors when we try to execute a full
> import
> > > using the Data Import Handler.  This error originally occurred on a
> > > production environment with a database containing 27 million records.
> >  Heap
> > > memory was configured for 6GB and the server had 32GB of physical
> memory.
> > >  We have been able to replicate the error on a local system with 6
> > million
> > > records.  We set the memory heap size to 64MB to accelerate the error
> > > replication.  The indexing process has been failing in different
> > scenarios.
> > >  We have 9 test cases documented.  In some of the test cases we
> increased
> > > the heap size to 128MB.  In our first test case we set heap memory to
> > 512MB
> > > which also failed.
> >
> > One characteristic of a JDBC connection is that unless you tell it
> > otherwise, it will try to retrieve the entire resultset into RAM before
> > any results are delivered to the application.  It's not Solr doing this,
> > it's JDBC.
> >
> > In this case, there are 27 million rows in the resultset.  It's highly
> > unlikely that this much data (along with the rest of Solr's memory
> > requirements) will fit in 6GB of heap.
> >
> > JDBC has a built-in way to deal with this.  It's called fetchSize.  By
> > using the batchSize parameter on your JdbcDataSource config, you can set
> > the JDBC fetchSize.  Set it to something small, between 100 and 1000,
> > and you'll probably get rid of the OOM problem.
> >
> > http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource
> >
> > If you had been using MySQL, I would have recommended that you set
> > batchSize to -1.  This sets fetchSize to Integer.MIN_VALUE, which tells
> > the MySQL driver to stream results instead of trying to either batch
> > them or return everything.  I'm pretty sure that the Oracle driver
> > doesn't work this way -- you would have to modify the dataimport source
> > code to use their streaming method.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to