On 10/13/2013 10:16 AM, Josh Lincoln wrote:
> I have a large solr response in xml format and would like to import it into
> a new solr collection. I'm able to use DIH with solrEntityProcessor, but
> only if I first truncate the file to a small subset of the records. I was
> hoping to set stream="true" to handle the full file, but I still get an out
> of memory error, so I believe stream does not work with solrEntityProcessor
> (I know the docs only mention the stream option for the
> XPathEntityProcessor, but I was hoping solrEntityProcessor just might have
> the same capability).
> 
> Before I open a jira to request stream support for solrEntityProcessor in
> DIH, is there an alternate approach for importing large files that are in
> the solr results format?
> Maybe a way to use xpath to get the values and a transformer to set the
> field names? I'm hoping to not have to declare the field names in
> dataConfig so I can reuse the process across data sets.

How big is the XML file?  You might be running into a size limit for
HTTP POST.

In newer 4.x versions, Solr itself sets the size of the POST buffer
regardless of what the container config has.  That size defaults to 2MB
but is configurable using the formdataUploadLimitInKB setting that you
can find in the example solrconfig.xml file, on the requestParsers tag.

In Solr 3.x, if you used the included jetty, it had a configured HTTP
POST size limit of 1MB.  In early Solr 4.x, there was a bug in the
included Jetty that prevented the configuration element from working, so
the actual limit was Jetty's default of 200KB.  With other containers
and these older versions, you would need to change your container
configuration.

https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130

Thanks,
Shawn

Reply via email to