On 1/7/2015 7:14 PM, Nishanth S wrote: > Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the > moment would be in the 1000 reads/second. Guess finding out the right > number of shards would be my starting point.
I don't think indexing 12000 docs per second would be too much for Solr to handle, as long as you architect the indexing application properly. You would likely need to have several indexing threads or processes that index in parallel. Solr is fully thread-safe and can handle several indexing requests at the same time. If the indexing application is single-threaded, indexing speed will not reach its full potential. Be aware that indexing at the same time as querying will reduce the number of queries per second that you can handle. In an environment where both reads and writes are heavy like you have described, more shards and/or more replicas might be required. For the query side ... even 1000 queries per second is a fairly heavy query rate. You're likely to need at least a few replicas, possibly several, to handle that. The type and complexity of the queries you do will make a big difference as well. To handle that query level, I would still recommend only running one shard replica on each server. If you have three shards and three replicas, that means 9 Solr servers. How many documents will you have in total? You said they are about 6KB each ... but depending on the fieldType definitions (and the analysis chain for TextField types), 6KB might be very large or fairly small. Do you have any idea how large the Solr index will be with all your documents? Estimating that will require indexing a significant percentage of your documents with the actual schema and config that you will use in production. If I know how many documents you have, how large the full index will be, and can see an example of the more complex queries you will do, I can make *preliminary* guesses about the number of shards you might need. I do have to warn you that it will only be a guess. You'll have to experiment to see what works best. Thanks, Shawn