I was curious if anyone has done work on finding what an optimal (or max)
number of client processes are for indexing. That is, if I have the ability
to spin up N number of processes that construct a POST to add/update a Solr
document, is there a point at which the number of clients posting
simultaneously overloads Solr's ability to keep up with the Add's? I know
this is very hardware dependent, but am looking for ballpark guidelines.
This will be in a Tomcat process running on Windows Server 2008, 2 Solr
instances, one master, one slave standard replication.

Related to this, is there a best practice number of documents to send in a
single POST. (again I know it depends on the complexity of the document,
field types, analyzers/tokenizers etc).

And finally, what do you find to be the best approach to getting data into
Solr. If the technology aspect isn't an issue (except I don't want to use
EmbeddedSolr), you just want to get documents added/updated as quickly as
possible.  POST, xml or csv document upload, DataImportHandler, other?  I'm
just looking for raw speed, not architectural factors.

So, nutshell, all other factors put aside, I'm looking for best approach to
indexing with pure raw speed the only criteria. 

Thanks,
Ken
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-best-practices-tp973274p973274.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to