On 10/13/2013 10:02 AM, Shawn Heisey wrote:
On 10/13/2013 10:16 AM, Josh Lincoln wrote:
I have a large solr response in xml format and would like to import it into
a new solr collection. I'm able to use DIH with solrEntityProcessor, but
only if I first truncate the file to a small subset of the records. I was
hoping to set stream="true" to handle the full file, but I still get an out
of memory error, so I believe stream does not work with solrEntityProcessor
(I know the docs only mention the stream option for the
XPathEntityProcessor, but I was hoping solrEntityProcessor just might have
the same capability).

Before I open a jira to request stream support for solrEntityProcessor in
DIH, is there an alternate approach for importing large files that are in
the solr results format?
Maybe a way to use xpath to get the values and a transformer to set the
field names? I'm hoping to not have to declare the field names in
dataConfig so I can reuse the process across data sets.
How big is the XML file?  You might be running into a size limit for
HTTP POST.

In newer 4.x versions, Solr itself sets the size of the POST buffer
regardless of what the container config has.  That size defaults to 2MB
but is configurable using the formdataUploadLimitInKB setting that you
can find in the example solrconfig.xml file, on the requestParsers tag.

In Solr 3.x, if you used the included jetty, it had a configured HTTP
POST size limit of 1MB.  In early Solr 4.x, there was a bug in the
included Jetty that prevented the configuration element from working, so
the actual limit was Jetty's default of 200KB.  With other containers
and these older versions, you would need to change your container
configuration.

https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130

Thanks,
Shawn

The SEP calls out to another Solr and reads. Are you importing data from another Solr and cross-connecting it with your uploaded XML?

If the memory errors are a problem with streaming, you could try "piping" your uploaded documents through a processor that supports streaming. This would then push one document at a time into your processor that calls out to Solr and combines records.

Reply via email to