Re: How much disk space does optimize really take

2009-10-07 Thread Yonik Seeley
On Wed, Oct 7, 2009 at 3:56 PM, Mark Miller wrote: > I guess you can't guarantee 2x though, as if you have queries coming in > that take a while, a commit opening a new Reader will not guarantee the > old Reader is quite ready to go away. Might want to wait a short bit > after the commit. Right -

Re: How much disk space does optimize really take

2009-10-07 Thread Mark Miller
Yonik Seeley wrote: > On Wed, Oct 7, 2009 at 3:31 PM, Mark Miller wrote: > >> I can't tell why calling a commit or restarting is going to help >> anything >> > > Depends on what scenarios you consider, and what you are taking 2x of. > > 1) Open reader on index > 2) Open writer and add two

Re: How much disk space does optimize really take

2009-10-07 Thread Yonik Seeley
On Wed, Oct 7, 2009 at 3:31 PM, Mark Miller wrote: > I can't tell why calling a commit or restarting is going to help > anything Depends on what scenarios you consider, and what you are taking 2x of. 1) Open reader on index 2) Open writer and add two documents... the first causes a large merge,

Re: How much disk space does optimize really take

2009-10-07 Thread Mark Miller
Okay - I think I've got you - your talking about the case of adding a bunch of docs, not calling commit, and then trying to optimize. I keep coming at it from a cold optimize. Making sense to me now. Mark Miller wrote: > I can't tell why calling a commit or restarting is going to help > anything -

Re: How much disk space does optimize really take

2009-10-07 Thread Mark Miller
I can't tell why calling a commit or restarting is going to help anything - or why you need more than 2x in any case. The only reason i can see this being is if you have turned on auto-commit. Otherwise the Reader is *always* only referencing what would have to be around anyway. Your likely to jus

Re: How much disk space does optimize really take

2009-10-07 Thread Yonik Seeley
On Wed, Oct 7, 2009 at 3:16 PM, Phillip Farber wrote: > Wow, this is weird.  I commit before I optimize.  In fact, I bounce tomcat > before I optimize just in case. It makse sense, as you say, that then "the > open reader can only be holding references to segments that wouldn't be > deleted until

Re: How much disk space does optimize really take

2009-10-07 Thread Lance Norskog
Oops, send before finished. "Partial Optimize" aka "maxSegments" is a recent Solr 1.4/Lucene 2.9 feature. As to 2x v.s. 3x, the general wisdom is that an optimize on a "simple" index takes at most 2x disk space, and on a "compound" index takes at most 3x. "Simple" is the default (*). At Divvio we

Re: How much disk space does optimize really take

2009-10-07 Thread Phillip Farber
Wow, this is weird. I commit before I optimize. In fact, I bounce tomcat before I optimize just in case. It makse sense, as you say, that then "the open reader can only be holding references to segments that wouldn't be deleted until the optimize is complete anyway". But we're still exceedin

Re: How much disk space does optimize really take

2009-10-07 Thread Michael McCandless
On Wed, Oct 7, 2009 at 1:34 PM, Shalin Shekhar Mangar wrote: > On Wed, Oct 7, 2009 at 10:45 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> It would be good to be able to commit without opening a new >> reader however with Lucene 2.9 the segment readers for all >> available segmen

Re: How much disk space does optimize really take

2009-10-07 Thread Yonik Seeley
On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber wrote: > So this implies that for a "normal" optimize, in every case, due to the > Searcher holding open the existing segment prior to optimize that we'd > always need 3x even in the normal case. > > This seems wrong since it is repeated stated that i

Re: How much disk space does optimize really take

2009-10-07 Thread Jason Rutherglen
To be clear, the SRs created by merges don't have the term index loaded which is the main cost. One would need to use IndexReaderWarmer to load the term index before the new SR becomes a part of SegmentInfos. On Wed, Oct 7, 2009 at 10:34 AM, Shalin Shekhar Mangar wrote: > On Wed, Oct 7, 2009 at

Re: How much disk space does optimize really take

2009-10-07 Thread Phillip Farber
Yonik Seeley wrote: Does this means that there's always a lucene IndexReader holding segment files open so they can't be deleted during an optimize so we run out of disk space > 2x? Yes. A feature could probably now be developed now that avoids opening a reader until it's requested. That wa

Re: How much disk space does optimize really take

2009-10-07 Thread Mark Miller
I think that argument requires auto commit to be on and opening readers after the optimize starts? Otherwise, the optimized version is not put into place until a commit is called, and a Reader won't see the newly merged segments until then - so the original index is kept around in either case - hav

Re: How much disk space does optimize really take

2009-10-07 Thread Shalin Shekhar Mangar
On Wed, Oct 7, 2009 at 10:45 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > It would be good to be able to commit without opening a new > reader however with Lucene 2.9 the segment readers for all > available segments are already created and available via > getReader which manages the

Re: How much disk space does optimize really take

2009-10-07 Thread Jason Rutherglen
It would be good to be able to commit without opening a new reader however with Lucene 2.9 the segment readers for all available segments are already created and available via getReader which manages the reference counting internally. Using reopen redundantly creates SRs that are already held inte

Re: How much disk space does optimize really take

2009-10-07 Thread Yonik Seeley
On Wed, Oct 7, 2009 at 12:51 PM, Phillip Farber wrote: > > In a separate thread, I've detailed how an optimize is taking > 2x disk > space. We don't use solr distribution/snapshooter.  We are using the default > deletion policy = 1. We can't optimize a 192G index in 400GB of space. > > This thread