Hello, I am doing an import of large records (with large full-text fields) and somewhere around 300000 records DataImportHandler runs out of memory (Heap) on a TIKA import (triggered from custom Processor) and does roll-back. I am using store=false and trying some tricks and tracking possible memory leaks, but also have a question about DIH itself.
What actually happens when I run DIH on a large (XML Source) job? Does it accumulate some sort of status in memory that it commits at the end? If so, can I do intermediate commits to drop the memory requirements? Or, will it help to do several passes over the same dataset and import only particular entries at a time? I am using the Solr 4 (alpha) UI, so I can see some of the options there. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)