solrconfig.xml has a setting ramBufferSizeMB that can be set to limit the memory consumed during indexing. When this limit is reached, the buffers are flushed to the current segment. NOTE: the segment is NOT closed, there is no implied commit here, and the data will not be searchable until a commit happens.
Best Erick On Wed, Aug 22, 2012 at 7:10 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Thanks, I will look into autoCommit. > > I assume there are memory implications of not committing? Or is it > just writing in a separate file and can theoretically do it > indefinitely? > > Regards, > Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Wed, Aug 22, 2012 at 2:42 AM, Lance Norskog <goks...@gmail.com> wrote: >> Solr has a separate feature called 'autoCommit'. This is configured in >> solrconfig.xml. You can set Solr to commit all documents every N >> milliseconds or every N documents, whichever comes first. If you want >> intermediate commits during a long DIH session, you have to use this >> or make your own script that does commits. >> >> On Tue, Aug 21, 2012 at 8:48 AM, Shawn Heisey <s...@elyograg.org> wrote: >>> On 8/21/2012 6:41 AM, Alexandre Rafalovitch wrote: >>>> >>>> I am doing an import of large records (with large full-text fields) >>>> and somewhere around 300000 records DataImportHandler runs out of >>>> memory (Heap) on a TIKA import (triggered from custom Processor) and >>>> does roll-back. I am using store=false and trying some tricks and >>>> tracking possible memory leaks, but also have a question about DIH >>>> itself. >>>> >>>> What actually happens when I run DIH on a large (XML Source) job? Does >>>> it accumulate some sort of status in memory that it commits at the >>>> end? If so, can I do intermediate commits to drop the memory >>>> requirements? Or, will it help to do several passes over the same >>>> dataset and import only particular entries at a time? I am using the >>>> Solr 4 (alpha) UI, so I can see some of the options there. >>> >>> >>> I use Solr 3.5 and a MySQL database for import, so my setup may not be >>> completely relevant, but here is my experience. >>> >>> Unless you turn on autocommit in solrconfig, documents will not be >>> searchable during the import. If you have "commit=true" for DIH (which I >>> believe is the default), there will be a commit at the end of the import. >>> >>> It looks like there's an out of memory issue filed on Solr 4 DIH with Tika >>> that is suspected to be a bug in Tika rather than Solr. The issue details >>> talk about some workarounds for those who are familiar with Tika -- I'm not. >>> The issue URL: >>> >>> https://issues.apache.org/jira/browse/SOLR-2886 >>> >>> Thanks, >>> Shawn >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com