You seem to be confusing shards with collections.

You can have 100 shards each with 100M documents for a total of 10B
documents in the _collection_, but no individual shard has more than
100M docs.

Best,
Erick

On Fri, Jul 7, 2017 at 10:02 AM,  <calamita.agost...@libero.it> wrote:
>
> Ok. I will  never  have  more than  100 Million of document per shard in the 
> same time, because I delete old  documents every  night To keep last  10 days 
>   I don't understand  if I have add shards after months  of indexing  ( 
> insert  and delete can reach 2B after a few  months  ) or  leave the same 
> shards forever.
> --
> Inviato da Libero Mail per Android Venerdì, 07 Luglio 2017, 06:46PM +02:00 da 
> Erick Erickson  erickerick...@gmail.com :
>
>>Stop.. 2 billion is _per shard_ not per collection. You'll probably
>>never have that many in practice as the search performance would be
>>pretty iffy. Every filterCache entry would occupy up to .25G for
>>instance. So just don't expect to fit 2B docs per shard unless you've
>>tested the heck out of it and are doing totally simple searches.
>>
>>I've seen between 10M and 300M docs on a shard give reasonable
>>performance. I've never seen 1B docs on a single shard work well in
>>production. It's possible, but I sure wouldn't plan on it.
>>
>>You have to test to see what _your_ data and _your_ query patterns
>>allow. See:  
>>https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>>Best,
>>Erick
>>
>>On Thu, Jul 6, 2017 at 11:10 PM,  < calamita.agost...@libero.it > wrote:
>>>
>>> Thanks  Erik. I used  implicit  shards. So the right  maintenance  could  
>>> be add  other shards after a period  of  time, change  the  roule that  
>>> fill  partition  field  in collection and  drop old shards when they  are 
>>> empty. Is  it  right ? How  can  I  see that 2 billion records  limit is  
>>> reached ? Is there  an  API ?
>>> --
>>> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 
>>> da Erick Erickson  erickerick...@gmail.com :
>>>
>>>>Right, every individual shard is limited to 2B records. That does
>>>>include deleted docs. But I've never seen a shard (a Lucene index
>>>>actually) perform satisfactorily at that scale so while this is a
>>>>limit people usually add shards long before that.
>>>>
>>>>There is no technical reason to optimize every time, normal segment
>>>>merging will eventually remove the data associated with deleted
>>>>documents. You'll carry forward a number of deleted docs, but I
>>>>usually see it stabilize around 10%-15%.
>>>>
>>>>You don't necessarily have to re-index, you can split existing shards.
>>>>
>>>>But from your e-mail, it looks like you think you have to do something
>>>>explicit to reclaim the resources associated with deleted documents.
>>>>You do not have to do this. Optimize is really a special heavyweight
>>>>merge. Normal merging happens when you do a commit and that process
>>>>also reclaims the deleted document resources.
>>>>
>>>>Best,
>>>>Erick
>>>>
>>>>On Thu, Jul 6, 2017 at 11:59 AM,  <  calamita.agost...@libero.it > wrote:
>>>>> Hi,
>>>>>
>>>>> I'm working on an application that index CDR ( Call Detail Record ) in 
>>>>> SolrCloud with 1 collection and 3 shards.
>>>>>
>>>>> Every day the application index 30 millions of CDR.
>>>>>
>>>>> I have a purge application that delete records older than 10 days, and 
>>>>> call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>>>>>
>>>>> Do you know if there is a limit on max number of documents per shard , 
>>>>> included deleted documents ?
>>>>>
>>>>> I read in some blogs that there is a limit of 2 Billions per shard 
>>>>> included deleted documents, that is I can have an empty collection, but 
>>>>> if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that 
>>>>> collection, I'll get an error. Is it true ? Have I recreate the 
>>>>> collection ?
>>>>>
>>>>> I see that when I call delete records, apache solr free space on disk.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Agostino
>>>>>
>>>>>
>>>>>

Reply via email to