I was curious if anyone has done work on finding what an optimal (or max) number of client processes are for indexing. That is, if I have the ability to spin up N number of processes that construct a POST to add/update a Solr document, is there a point at which the number of clients posting simultaneously overloads Solr's ability to keep up with the Add's? I know this is very hardware dependent, but am looking for ballpark guidelines. This will be in a Tomcat process running on Windows Server 2008, 2 Solr instances, one master, one slave standard replication.
Related to this, is there a best practice number of documents to send in a single POST. (again I know it depends on the complexity of the document, field types, analyzers/tokenizers etc). And finally, what do you find to be the best approach to getting data into Solr. If the technology aspect isn't an issue (except I don't want to use EmbeddedSolr), you just want to get documents added/updated as quickly as possible. POST, xml or csv document upload, DataImportHandler, other? I'm just looking for raw speed, not architectural factors. So, nutshell, all other factors put aside, I'm looking for best approach to indexing with pure raw speed the only criteria. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-best-practices-tp973274p973274.html Sent from the Solr - User mailing list archive at Nabble.com.