25% overhead is pretty good. It is easy for a merge to need almost double the space of a minimum sized index. It is possible to use 3X the space.
Don’t try use the least possible disk space. If there isn’t enough free space on the disk, Solr cannot merge the big indexes. Ever. That may be what has happened here. Make sure the nodes have at lease 100 Gb of free space on the volumes, maybe 150. That space is not “wasted” or “unused”. It is necessary for merges. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 28, 2016, at 12:20 AM, Arkadi Colson <ark...@smartbit.be> wrote: > > The index size of 1 shard is about 125GB and we are running 11 shards with > replication factor 2 so it's a lot of data. The deletions percentage at the > bottom of the segment page is around 25%. So it's quite some space which we > could recover. That's why I was looking for an optimize. > > Do you have any idea why the merge policy does not merge away the deletions? > Should I tweak some parameters somehow? It's a default installation using the > default settings and parameters. If you need more info, just let me know... > > Thx! > > On 27-10-16 17:40, Erick Erickson wrote: >> Why do you think you need to get rid of the deleted data? During normal >> indexing, these will be "merged away". Optimizing has some downsides >> for continually changing indexes, in particular since the default >> tieredmergepolicy tries to merge "like size" segments, deletions will >> accumulate in your one large segment and the percentage of >> deleted documents may get even higher. >> >> Unless there's some measurable performance gain that the users >> will notice, I'd just leave this alone. >> >> The exception here is if you have, say, an index that changes rarely >> in which case optimizing then makes more sense. >> >> Best, >> Erick >> >> On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson <ark...@smartbit.be >> <mailto:ark...@smartbit.be>> wrote: >> Thanks for the answer! >> Do you know if there is a way to trigger an optimize for only 1 shard and >> not the whole collection at once? >> >> On 27-10-16 15:30, Pushkar Raste wrote: >>> Try commit with expungeDeletes="true" >>> >>> I am not sure if it will merge old segments that have deleted documents. >>> >>> In the worst case you can 'optimize' your index which should take care of >>> removing deleted document >>> >>> >>> On Oct 27, 2016 4:20 AM, "Arkadi Colson" <ark...@smartbit.be >>> <mailto:ark...@smartbit.be>> wrote: >>> Hi<Mail Attachment.png> >>> >>> As you can see in the screenshot above in the oldest segments there are a >>> lot of deletions. In total the shard has about 26% deletions. How can I get >>> rid of them so the index will be smaller again? >>> Can this only be done with an optimize or does it also depend on the merge >>> policy? If it also depends also on the merge policy which one should I >>> choose then? >>> >>> Thanks! >>> >>> BR, >>> Arkadi >> >> >