The deleted records will be automatically cleaned up in the background. You don’t have to do anything.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2017, at 1:25 PM, calamita.agost...@libero.it wrote: > > > Sorry , I know that size is for shard and not for collection. My doubt is: > if every day I insert 10M documents in a shard and delete 10M of documents > (the old ones ) after 20 days I have to add a new shard or not ? Number of > undeleted documents is always the same. ( 100M for example ) > Thanks. > Agos. > -- > Sent from Libero Mail for Android Friday, 07 July 2017, 07:51PM +02:00 from > Erick Erickson erickerick...@gmail.com : > >> You seem to be confusing shards with collections. >> >> You can have 100 shards each with 100M documents for a total of 10B >> documents in the _collection_, but no individual shard has more than >> 100M docs. >> >> Best, >> Erick >> >> On Fri, Jul 7, 2017 at 10:02 AM, < calamita.agost...@libero.it > wrote: >>> >>> Ok. I will never have more than 100 Million of document per shard in >>> the same time, because I delete old documents every night To keep last >>> 10 days I don't understand if I have add shards after months of >>> indexing ( insert and delete can reach 2B after a few months ) or >>> leave the same shards forever. >>> -- >>> Inviato da Libero Mail per Android Venerdì, 07 Luglio 2017, 06:46PM +02:00 >>> da Erick Erickson erickerick...@gmail.com : >>> >>>> Stop.. 2 billion is _per shard_ not per collection. You'll probably >>>> never have that many in practice as the search performance would be >>>> pretty iffy. Every filterCache entry would occupy up to .25G for >>>> instance. So just don't expect to fit 2B docs per shard unless you've >>>> tested the heck out of it and are doing totally simple searches. >>>> >>>> I've seen between 10M and 300M docs on a shard give reasonable >>>> performance. I've never seen 1B docs on a single shard work well in >>>> production. It's possible, but I sure wouldn't plan on it. >>>> >>>> You have to test to see what _your_ data and _your_ query patterns >>>> allow. See: >>>> https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ >>>> >>>> Best, >>>> Erick >>>> >>>> On Thu, Jul 6, 2017 at 11:10 PM, < calamita.agost...@libero.it > wrote: >>>>> >>>>> Thanks Erik. I used implicit shards. So the right maintenance could >>>>> be add other shards after a period of time, change the roule that >>>>> fill partition field in collection and drop old shards when they are >>>>> empty. Is it right ? How can I see that 2 billion records limit is >>>>> reached ? Is there an API ? >>>>> -- >>>>> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM >>>>> +02:00 da Erick Erickson erickerick...@gmail.com : >>>>> >>>>>> Right, every individual shard is limited to 2B records. That does >>>>>> include deleted docs. But I've never seen a shard (a Lucene index >>>>>> actually) perform satisfactorily at that scale so while this is a >>>>>> limit people usually add shards long before that. >>>>>> >>>>>> There is no technical reason to optimize every time, normal segment >>>>>> merging will eventually remove the data associated with deleted >>>>>> documents. You'll carry forward a number of deleted docs, but I >>>>>> usually see it stabilize around 10%-15%. >>>>>> >>>>>> You don't necessarily have to re-index, you can split existing shards. >>>>>> >>>>>> But from your e-mail, it looks like you think you have to do something >>>>>> explicit to reclaim the resources associated with deleted documents. >>>>>> You do not have to do this. Optimize is really a special heavyweight >>>>>> merge. Normal merging happens when you do a commit and that process >>>>>> also reclaims the deleted document resources. >>>>>> >>>>>> Best, >>>>>> Erick >>>>>> >>>>>> On Thu, Jul 6, 2017 at 11:59 AM, < calamita.agost...@libero.it > wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm working on an application that index CDR ( Call Detail Record ) in >>>>>>> SolrCloud with 1 collection and 3 shards. >>>>>>> >>>>>>> Every day the application index 30 millions of CDR. >>>>>>> >>>>>>> I have a purge application that delete records older than 10 days, and >>>>>>> call OPTIMIZE, so the collection will keep only 300 millions of CDR. >>>>>>> >>>>>>> Do you know if there is a limit on max number of documents per shard , >>>>>>> included deleted documents ? >>>>>>> >>>>>>> I read in some blogs that there is a limit of 2 Billions per shard >>>>>>> included deleted documents, that is I can have an empty collection, but >>>>>>> if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that >>>>>>> collection, I'll get an error. Is it true ? Have I recreate the >>>>>>> collection ? >>>>>>> >>>>>>> I see that when I call delete records, apache solr free space on disk. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Agostino >>>>>>> >>>>>>> >>>>>>>