On 10/13/2013 10:16 AM, Josh Lincoln wrote: > I have a large solr response in xml format and would like to import it into > a new solr collection. I'm able to use DIH with solrEntityProcessor, but > only if I first truncate the file to a small subset of the records. I was > hoping to set stream="true" to handle the full file, but I still get an out > of memory error, so I believe stream does not work with solrEntityProcessor > (I know the docs only mention the stream option for the > XPathEntityProcessor, but I was hoping solrEntityProcessor just might have > the same capability). > > Before I open a jira to request stream support for solrEntityProcessor in > DIH, is there an alternate approach for importing large files that are in > the solr results format? > Maybe a way to use xpath to get the values and a transformer to set the > field names? I'm hoping to not have to declare the field names in > dataConfig so I can reuse the process across data sets.
How big is the XML file? You might be running into a size limit for HTTP POST. In newer 4.x versions, Solr itself sets the size of the POST buffer regardless of what the container config has. That size defaults to 2MB but is configurable using the formdataUploadLimitInKB setting that you can find in the example solrconfig.xml file, on the requestParsers tag. In Solr 3.x, if you used the included jetty, it had a configured HTTP POST size limit of 1MB. In early Solr 4.x, there was a bug in the included Jetty that prevented the configuration element from working, so the actual limit was Jetty's default of 200KB. With other containers and these older versions, you would need to change your container configuration. https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130 Thanks, Shawn