My two cents: - pulling is better than pushing - http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update - DIH is not thread safe https://issues.apache.org/jira/browse/SOLR-3011 But there are few patches for trunk which fix it.
Regards On Mon, Feb 27, 2012 at 10:46 PM, Erik Hatcher <erik.hatc...@gmail.com>wrote: > Yes, absolutely. Parallelizing indexing can make a huge difference. How > you do so will depend on your indexing environment. Most crudely, running > multiple indexing scripts on different subsets of data up to the the > limitations of your operating system and hardware is how many do it. > SolrJ has some multithreaded facility, as does DataImportHandler. > Distributing the indexing to multiple machines, but pointing all to the > same Solr server, is effectively the same as multi-threading it.... push > documents into Solr from wherever as fast as it can handle it. This is > definitely how many do this. > > Erik > > On Feb 27, 2012, at 13:24 , Memory Makers wrote: > > > Hi, > > > > Is there a way to speed up indexing by increasing the number of threads > > doing the indexing or perhaps by distributing indexing on multiple > machines? > > > > Thanks. > > -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>