Solr has a separate feature called 'autoCommit'. This is configured in
solrconfig.xml. You can set Solr to commit all documents every N
milliseconds or every N documents, whichever comes first. If you want
intermediate commits during a long DIH session, you have to use this
or make your own script that does commits.

On Tue, Aug 21, 2012 at 8:48 AM, Shawn Heisey <s...@elyograg.org> wrote:
> On 8/21/2012 6:41 AM, Alexandre Rafalovitch wrote:
>>
>> I am doing an import of large records (with large full-text fields)
>> and somewhere around 300000 records DataImportHandler runs out of
>> memory (Heap) on a TIKA import (triggered from custom Processor) and
>> does roll-back. I am using store=false and trying some tricks and
>> tracking possible memory leaks, but also have a question about DIH
>> itself.
>>
>> What actually happens when I run DIH on a large (XML Source) job? Does
>> it accumulate some sort of status in memory that it commits at the
>> end? If so, can I do intermediate commits to drop the memory
>> requirements? Or, will it help to do several passes over the same
>> dataset and import only particular entries at a time? I am using the
>> Solr 4 (alpha) UI, so I can see some of the options there.
>
>
> I use Solr 3.5 and a MySQL database for import, so my setup may not be
> completely relevant, but here is my experience.
>
> Unless you turn on autocommit in solrconfig, documents will not be
> searchable during the import.  If you have "commit=true" for DIH (which I
> believe is the default), there will be a commit at the end of the import.
>
> It looks like there's an out of memory issue filed on Solr 4 DIH with Tika
> that is suspected to be a bug in Tika rather than Solr.  The issue details
> talk about some workarounds for those who are familiar with Tika -- I'm not.
> The issue URL:
>
> https://issues.apache.org/jira/browse/SOLR-2886
>
> Thanks,
> Shawn
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to