Re: Shard size variation

2018-05-03 Thread Erick Erickson
"We generally try not to change defaults when possible, sounds like there will be new default settings for the segment sizes and merging policy?" usually wise. No, there won't be any change in the default settings. What _will_ change is the behavior of a forceMerge (aka optimize) and expungeDele

Re: Shard size variation

2018-05-03 Thread Michael Joyner
We generally try not to change defaults when possible, sounds like there will be new default settings for the segment sizes and merging policy? Am I right in thinking that expungeDeletes will (in theory) be a 7.4 forwards option? On 05/02/2018 01:29 PM, Erick Erickson wrote: You can always

Re: Shard size variation

2018-05-02 Thread Erick Erickson
You can always increase the maximum segment size. For large indexes that should reduce the number of segments. But watch your indexing stats, I can't predict the consequences of bumping it to 100G for instance. I'd _expect_ bursty I/O whne those large segments started to be created or merged

Re: Shard size variation

2018-05-02 Thread Michael Joyner
The main reason we go this route is that after awhile (with default settings) we end up with hundreds of shards and performance of course drops abysmally as a result. By using a stepped optimize a) we don't run into the we need the 3x+ head room issue, b) optimize performance penalty during opt

Re: Shard size variation

2018-04-30 Thread Shawn Heisey
On 4/30/2018 2:56 PM, Michael Joyner wrote: > Based on experience, 2x head room is room is not always enough, > sometimes not even 3x, if you are optimizing from many segments down > to 1 segment in a single go. In all situations a user is likely to encounter in the wild, having enough extra disk

Re: Shard size variation

2018-04-30 Thread Antony A
Thank you all. I have around 70% free space in production. I will compute for the additional fields. Sent from my mobile. Please excuse any typos. > On Apr 30, 2018, at 5:10 PM, Erick Erickson wrote: > > There's really no good way to purge deleted documents from the index > other than to wait

Re: Shard size variation

2018-04-30 Thread Erick Erickson
There's really no good way to purge deleted documents from the index other than to wait until merging happens. Optimize/forceMerge and expungeDeletes both suffer from the problem that they create massive segments that then stick around for a very long time, see: https://lucidworks.com/2017/10/13/s

Re: Shard size variation

2018-04-30 Thread Michael Joyner
Based on experience, 2x head room is room is not always enough, sometimes not even 3x, if you are optimizing from many segments down to 1 segment in a single go. We have however figured out a way that can work with as little as 51% free space via the following iteration cycle: public void so

Re: Shard size variation

2018-04-30 Thread Walter Underwood
You need 2X the minimum index size in disk space anyway, so don’t worry about keeping the indexes as small as possible. Worry about having enough headroom. If your indexes are 250 GB, you need 250 GB of free space. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (m

Re: Shard size variation

2018-04-30 Thread Antony A
Thanks Erick/Deepak. The cloud is running on baremetal (128 GB/24 cpu). Is there an option to run a compact on the data files to make the size equal on both the clouds? I am trying find all the options before I add the new fields into the production cloud. Thanks AA On Mon, Apr 30, 2018 at 10:4

Re: Shard size variation

2018-04-30 Thread Erick Erickson
Anthony: You are probably seeing the results of removing deleted documents from the shards as they're merged. Even on replicas in the same _shard_, the size of the index on disk won't necessarily be identical. This has to do with which segments are selected for merging, which are not necessarily c

Re: Shard size variation

2018-04-30 Thread Deepak Goel
Could you please also give the machine details of the two clouds you are running? Deepak "The greatness of a nation can be judged by the way its animals are treated. Please stop cruelty to Animals, become a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool Lin

Re: Shard size variation

2018-04-30 Thread Antony A
Hi Shawn, The cloud is running version 6.2.1. with ClassicIndexSchemaFactory The sum of size from admin UI on all the shards is around 265 G vs 224 G between the two clouds. I created the collection using "numShards" so compositeId router. If you need more information, please let me know. Than

Re: Shard size variation

2018-04-30 Thread Shawn Heisey
On 4/30/2018 9:51 AM, Antony A wrote: I am running two separate solr clouds. I have 8 shards in each with a total of 300 million documents. Both the clouds are indexing the document from the same source/configuration. I am noticing there is a difference in the size of the collection between them

Shard size variation

2018-04-30 Thread Antony A
Hi all, I am trying to find if anyone has suggestion for the below. I am running two separate solr clouds. I have 8 shards in each with a total of 300 million documents. Both the clouds are indexing the document from the same source/configuration. I am noticing there is a difference in the size