Thanks guys for your inputs I would be looking at around 100 Tb of total index size with 5100 million documents for a period of 30 days before we purge the indexes.I had estimated it slightly on the higher side of things but that's where I feel we would be.
Thanks, Nishanth On Wed, Jan 7, 2015 at 7:50 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 1/7/2015 7:14 PM, Nishanth S wrote: > > Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the > > moment would be in the 1000 reads/second. Guess finding out the right > > number of shards would be my starting point. > > I don't think indexing 12000 docs per second would be too much for Solr > to handle, as long as you architect the indexing application properly. > You would likely need to have several indexing threads or processes that > index in parallel. Solr is fully thread-safe and can handle several > indexing requests at the same time. If the indexing application is > single-threaded, indexing speed will not reach its full potential. > > Be aware that indexing at the same time as querying will reduce the > number of queries per second that you can handle. In an environment > where both reads and writes are heavy like you have described, more > shards and/or more replicas might be required. > > For the query side ... even 1000 queries per second is a fairly heavy > query rate. You're likely to need at least a few replicas, possibly > several, to handle that. The type and complexity of the queries you do > will make a big difference as well. To handle that query level, I would > still recommend only running one shard replica on each server. If you > have three shards and three replicas, that means 9 Solr servers. > > How many documents will you have in total? You said they are about 6KB > each ... but depending on the fieldType definitions (and the analysis > chain for TextField types), 6KB might be very large or fairly small. > > Do you have any idea how large the Solr index will be with all your > documents? Estimating that will require indexing a significant > percentage of your documents with the actual schema and config that you > will use in production. > > If I know how many documents you have, how large the full index will be, > and can see an example of the more complex queries you will do, I can > make *preliminary* guesses about the number of shards you might need. I > do have to warn you that it will only be a guess. You'll have to > experiment to see what works best. > > Thanks, > Shawn > >