Thanks very much, both your and Rafal's advice are very helpful! -----Original Message----- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, July 26, 2012 8:47 AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr
On 7/26/2012 7:34 AM, Rafał Kuć wrote: > If you use Java (and I think you do, because you mention Lucene) you > should take a look at StreamingUpdateSolrServer. It not only allows > you to send data in batches, but also index using multiple threads. A caveat to what Rafał said: The streaming object has no error detection out of the box. It queues everything up internally and returns immediately. Behind the scenes, it uses multiple threads to send documents to Solr, but any errors encountered are simply sent to the logging mechanism, then ignored. When you use HttpSolrServer, all errors encountered will throw exceptions, but you have to wait for completion. If you need both concurrent capability and error detection, you would have to manage multiple indexing threads yourself. Apparently there is a method in the concurrent class that you can override and handle errors differently, though I have not seen how to write code so your program would know that an error occurred. I filed an issue with a patch to solve this, but some of the developers have come up with an idea that might be better. None of the ideas have been committed to the project. https://issues.apache.org/jira/browse/SOLR-3284 Just an FYI, the streaming class was renamed to ConcurrentUpdateSolrServer in Solr 4.0 Alpha. Both are available in 3.6.x. Thanks, Shawn