Stop.. 2 billion is _per shard_ not per collection. You'll probably never have that many in practice as the search performance would be pretty iffy. Every filterCache entry would occupy up to .25G for instance. So just don't expect to fit 2B docs per shard unless you've tested the heck out of it and are doing totally simple searches.
I've seen between 10M and 300M docs on a shard give reasonable performance. I've never seen 1B docs on a single shard work well in production. It's possible, but I sure wouldn't plan on it. You have to test to see what _your_ data and _your_ query patterns allow. See: https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Jul 6, 2017 at 11:10 PM, <calamita.agost...@libero.it> wrote: > > Thanks Erik. I used implicit shards. So the right maintenance could be > add other shards after a period of time, change the roule that fill > partition field in collection and drop old shards when they are empty. Is > it right ? How can I see that 2 billion records limit is reached ? Is > there an API ? > -- > Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 da > Erick Erickson erickerick...@gmail.com : > >>Right, every individual shard is limited to 2B records. That does >>include deleted docs. But I've never seen a shard (a Lucene index >>actually) perform satisfactorily at that scale so while this is a >>limit people usually add shards long before that. >> >>There is no technical reason to optimize every time, normal segment >>merging will eventually remove the data associated with deleted >>documents. You'll carry forward a number of deleted docs, but I >>usually see it stabilize around 10%-15%. >> >>You don't necessarily have to re-index, you can split existing shards. >> >>But from your e-mail, it looks like you think you have to do something >>explicit to reclaim the resources associated with deleted documents. >>You do not have to do this. Optimize is really a special heavyweight >>merge. Normal merging happens when you do a commit and that process >>also reclaims the deleted document resources. >> >>Best, >>Erick >> >>On Thu, Jul 6, 2017 at 11:59 AM, < calamita.agost...@libero.it > wrote: >>> Hi, >>> >>> I'm working on an application that index CDR ( Call Detail Record ) in >>> SolrCloud with 1 collection and 3 shards. >>> >>> Every day the application index 30 millions of CDR. >>> >>> I have a purge application that delete records older than 10 days, and call >>> OPTIMIZE, so the collection will keep only 300 millions of CDR. >>> >>> Do you know if there is a limit on max number of documents per shard , >>> included deleted documents ? >>> >>> I read in some blogs that there is a limit of 2 Billions per shard included >>> deleted documents, that is I can have an empty collection, but if I already >>> indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get >>> an error. Is it true ? Have I recreate the collection ? >>> >>> I see that when I call delete records, apache solr free space on disk. >>> >>> Thanks. >>> >>> Agostino >>> >>> >>>