Ok. I will never have more than 100 Million of document per shard in the same time, because I delete old documents every night To keep last 10 days I don't understand if I have add shards after months of indexing ( insert and delete can reach 2B after a few months ) or leave the same shards forever. -- Inviato da Libero Mail per Android Venerdì, 07 Luglio 2017, 06:46PM +02:00 da Erick Erickson erickerick...@gmail.com :
>Stop.. 2 billion is _per shard_ not per collection. You'll probably >never have that many in practice as the search performance would be >pretty iffy. Every filterCache entry would occupy up to .25G for >instance. So just don't expect to fit 2B docs per shard unless you've >tested the heck out of it and are doing totally simple searches. > >I've seen between 10M and 300M docs on a shard give reasonable >performance. I've never seen 1B docs on a single shard work well in >production. It's possible, but I sure wouldn't plan on it. > >You have to test to see what _your_ data and _your_ query patterns >allow. See: >https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > >Best, >Erick > >On Thu, Jul 6, 2017 at 11:10 PM, < calamita.agost...@libero.it > wrote: >> >> Thanks Erik. I used implicit shards. So the right maintenance could be >> add other shards after a period of time, change the roule that fill >> partition field in collection and drop old shards when they are empty. >> Is it right ? How can I see that 2 billion records limit is reached ? >> Is there an API ? >> -- >> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 >> da Erick Erickson erickerick...@gmail.com : >> >>>Right, every individual shard is limited to 2B records. That does >>>include deleted docs. But I've never seen a shard (a Lucene index >>>actually) perform satisfactorily at that scale so while this is a >>>limit people usually add shards long before that. >>> >>>There is no technical reason to optimize every time, normal segment >>>merging will eventually remove the data associated with deleted >>>documents. You'll carry forward a number of deleted docs, but I >>>usually see it stabilize around 10%-15%. >>> >>>You don't necessarily have to re-index, you can split existing shards. >>> >>>But from your e-mail, it looks like you think you have to do something >>>explicit to reclaim the resources associated with deleted documents. >>>You do not have to do this. Optimize is really a special heavyweight >>>merge. Normal merging happens when you do a commit and that process >>>also reclaims the deleted document resources. >>> >>>Best, >>>Erick >>> >>>On Thu, Jul 6, 2017 at 11:59 AM, < calamita.agost...@libero.it > wrote: >>>> Hi, >>>> >>>> I'm working on an application that index CDR ( Call Detail Record ) in >>>> SolrCloud with 1 collection and 3 shards. >>>> >>>> Every day the application index 30 millions of CDR. >>>> >>>> I have a purge application that delete records older than 10 days, and >>>> call OPTIMIZE, so the collection will keep only 300 millions of CDR. >>>> >>>> Do you know if there is a limit on max number of documents per shard , >>>> included deleted documents ? >>>> >>>> I read in some blogs that there is a limit of 2 Billions per shard >>>> included deleted documents, that is I can have an empty collection, but if >>>> I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, >>>> I'll get an error. Is it true ? Have I recreate the collection ? >>>> >>>> I see that when I call delete records, apache solr free space on disk. >>>> >>>> Thanks. >>>> >>>> Agostino >>>> >>>> >>>>