Whoops - they way I have mail come in, not easy to tell if I'm replying to Lucene or Solr list ;)
The way Solr works with Searchers and reopen, it shouldn't run into a situation that requires greater than 2x to optimize. I won't guarantee it ;) But based on what I know, it shouldn't happen under normal circumstances. Mark Miller wrote: > Phillip Farber wrote: > >> I am trying to automate a build process that adds documents to 10 >> shards over 5 machines and need to limit the size of a shard to no >> more than 200GB because I only have 400GB of disk available to >> optimize a given shard. >> >> Why does the size (du) of an index typically decrease after a commit? >> I've observed a decrease in size of as much as from 296GB down to >> 151GB or as little as from 183GB to 182GB. Is that size after a >> commit close to the size the index would be after an optimize? >> > Likely. Until you commit or close the Writer, the unoptimized index is > the "live" index. And then you also have the optimized index. Once you > commit and make the optimized index the "live" index, the unoptimized > index can be removed (depending on your delete policy, which by default > only keeps the latest commit point). > >> For that matter, are there cases where optimization can take more than >> 2x? I've heard of cases but have not observed them in my system. I >> only do adds to the shards, never query them. An LVM snapshot of the >> shard receives the queries. >> > There are cases where it takes over 2x - but they involve using reopen. > If you have more than one Reader on the index, and only reopen some of > them, the new Readers created can hold open the partially optimized > segments that existed at that moment, creating a need for greater than 2x. > >> Is doing a commit before I take a du a reliable way to gauge the size >> of the shard? It is really bad news to allow a shard to go over 200GB >> in my use case. How do others manage this problem of 2x space needed >> to optimize with "limited" dosk space? >> > Get more disk space ;) Or don't optimize. A lower mergefactor can make > optimizations less necessary. > >> Advice greatly appreciated. >> >> Phil >> >> > > > -- - Mark http://www.lucidimagination.com