On 8/1/2015 6:49 PM, Jay Potharaju wrote: > I currently have a single collection with 40 million documents and index > size of 25 GB. The collections gets updated every n minutes and as a result > the number of deleted documents is constantly growing. The data in the > collection is an amalgamation of more than 1000+ customer records. The > number of documents per each customer is around 100,000 records on average. > > Now that being said, I 'm trying to get an handle on the growing deleted > document size. Because of the growing index size both the disk space and > memory is being used up. And would like to reduce it to a manageable size. > > I have been thinking of splitting the data into multiple core, 1 for each > customer. This would allow me manage the smaller collection easily and can > create/update the collection also fast. My concern is that number of > collections might become an issue. Any suggestions on how to address this > problem. What are my other alternatives to moving to a multicore > collections.? > > Solr: 4.9 > Index size:25 GB > Max doc: 40 million > Doc count:29 million > > Replication:4 > > 4 servers in solrcloud.
Creating 1000+ collections in SolrCloud is definitely problematic. If you need to choose between a lot of shards and a lot of collections, I would definitely go with a lot of shards. I would also want a lot of servers for an index with that many pieces. https://issues.apache.org/jira/browse/SOLR-7191 I don't think it would matter how many collections or shards you have when it comes to how many deleted documents are in your index. If you want to clean up a large number of deletes in an index, the best option is an optimize. An optimize requires a large amount of disk I/O, so it can be extremely disruptive if the query volume is high. It should be done when the query volume is at its lowest. For the index you describe, a nightly or weekly optimize seems like a good option. Aside from having a lot of deleted documents in your index, what kind of problems are you trying to solve? Thanks, Shawn