My two cents:
 - pulling is better than pushing -
http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update
 - DIH is not thread safe https://issues.apache.org/jira/browse/SOLR-3011 But
there are few patches for trunk which fix it.

Regards

On Mon, Feb 27, 2012 at 10:46 PM, Erik Hatcher <erik.hatc...@gmail.com>wrote:

> Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
> you do so will depend on your indexing environment.  Most crudely, running
> multiple indexing scripts on different subsets of data up to the the
> limitations of your operating system and hardware is how many do it.
> SolrJ has some multithreaded facility, as does DataImportHandler.
>  Distributing the indexing to multiple machines, but pointing all to the
> same Solr server, is effectively the same as multi-threading it.... push
> documents into Solr from wherever as fast as it can handle it.  This is
> definitely how many do this.
>
>        Erik
>
> On Feb 27, 2012, at 13:24 , Memory Makers wrote:
>
> > Hi,
> >
> > Is there a way to speed up indexing by increasing the number of threads
> > doing the indexing or perhaps by distributing indexing on multiple
> machines?
> >
> > Thanks.
>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to