Hello,

I am doing an import of large records (with large full-text fields)
and somewhere around 300000 records DataImportHandler runs out of
memory (Heap) on a TIKA import (triggered from custom Processor) and
does roll-back. I am using store=false and trying some tricks and
tracking possible memory leaks, but also have a question about DIH
itself.

What actually happens when I run DIH on a large (XML Source) job? Does
it accumulate some sort of status in memory that it commits at the
end? If so, can I do intermediate commits to drop the memory
requirements? Or, will it help to do several passes over the same
dataset and import only particular entries at a time? I am using the
Solr 4 (alpha) UI, so I can see some of the options there.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Reply via email to