Thanks Simon and Jay .That was helpful .

So what we are looking at  during optimize is  2 or 3 times free Disk Space
to recreate the index.

Regards
Sujatha



On Wed, Oct 26, 2011 at 12:26 AM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:

> RAM costs during optimize / merge is generally low. Optimize is
> basically a merge of all segments into one, however there are
> exceptions. Lucene streams existing segments from disk and serializes
> the new segment on the fly. When you optimize or in general when you
> merge segments you need disk space for the "source" segments and the
> "targed" (merged) segment.
>
> If you use CompoundFileSystem (CFS) you need to additional space once
> the merge is done and your files are packed into the CFS which is
> basically the size of the "target" (merged) segment. Once the merge is
> done lucene can free the diskspace unless you have an IndexReader open
> that references those segments (lucene keeps track of these files and
> frees diskspace once possible).
>
> That said, I think you should use optimize very very rarely. Usually
> if you document collection is rarely changing optimize is useful and
> reasonable once in a while. if you collection is constantly changing
> you should rely on the merge policy to balance the number of segments
> for you in the background. Lucene 3.4 has a nice improved
> TieredMergePolicy that does a great job. (previous version are also
> good - just saying)
>
> A commit is basically flushing the segment you have in memory
> (IndexWriter memory) to disk. compression ratio can be up to 30% of
> the ram cost or even more depending on your data. The actual commit
> doesn't need a notable amount of memory.
>
> hope this helps
>
> simon
>
> On Mon, Oct 24, 2011 at 7:38 PM, Jaeger, Jay - DOT
> <jay.jae...@dot.wi.gov> wrote:
> > I have not spent a lot of time researching it, but one would expect that
> the OS RAM requirement for optimization of an index to be minimal.
> >
> > My understanding is that during optimization an essentially new index is
> built.  Once complete it switches out the indexes and will throw away the
> old one.  (In Windows it may not throw away the old one until the next
> Commit).
> >
> > JRJ
> >
> > -----Original Message-----
> > From: Sujatha Arun [mailto:suja.a...@gmail.com]
> > Sent: Friday, October 21, 2011 12:10 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Optimization /Commit memory
> >
> > Just one more thing ,when we are talking about Optimization , we
> > are referring to  HD  free space for  replicating the index  (2 or 3
> times
> > the index size  ) .what is role of  RAM (OS) here?
> >
> > Regards
> > Suajtha
> >
> > On Fri, Oct 21, 2011 at 10:12 AM, Sujatha Arun <suja.a...@gmail.com>
> wrote:
> >
> >> Thanks that helps.
> >>
> >> Regards
> >> Sujatha
> >>
> >>
> >> On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT <
> jay.jae...@dot.wi.gov>wrote:
> >>
> >>> Well, since the OS RAM includes the JVM RAM, that is part of your
> >>> requirement, yes?  Aside from the JVM and normal OS requirements, all
> you
> >>> need OS RAM for is file caching.  Thus, for updates, the OS RAM is not
> a
> >>> major factor.  For searches, you want sufficient OS RAM to cache enough
> of
> >>> the index to get the query performance you need, and to cache queries
> inside
> >>> the JVM if you get a lot of repeat queries (see solrconfig.xml for the
> >>> various caches: we have not played with them much).  So, the amount of
> RAM
> >>> necessary for that is very much dependent upon the size of your index,
> so I
> >>> cannot give you a simple number.
> >>>
> >>> You seem to believe that you have to have sufficient memory to have the
> >>> entire index in memory.  Except where extremely high performance is
> >>> required, I have not found that to be the case.
> >>>
> >>> This is just one of those "your mileage may vary" things.  There is not
> a
> >>> single answer or formula that fits every situation.
> >>>
> >>> JRJ
> >>>
> >>> -----Original Message-----
> >>> From: Sujatha Arun [mailto:suja.a...@gmail.com]
> >>> Sent: Wednesday, October 19, 2011 11:58 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Optimization /Commit memory
> >>>
> >>> Thanks  Jay ,
> >>>
> >>> I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a
> 14
> >>> GB
> >>> Index [cumulative Index size of all Instances].And I put it thus -
> >>>
> >>> Requirement of Operating System RAM for an Index of  14GB is   - Index
> >>> Size
> >>> + 3 Times the  maximum Index Size of Individual Instance for Optimize .
> >>>
> >>> That is to say ,I have several Instances ,combined Index Size is 14GB
> >>> .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM
> is
> >>>  14GB +3 * 2.5 GB  ~ = 22GB.
> >>>
> >>> Correct?
> >>>
> >>> Regards
> >>> Sujatha
> >>>
> >>>
> >>>
> >>> On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT <
> jay.jae...@dot.wi.gov
> >>> >wrote:
> >>>
> >>> > Commit does not particularly spike disk or memory usage, unless you
> are
> >>> > adding a very large number of documents between commits.  A commit
> can
> >>> cause
> >>> > a need to merge indexes, which can increase disk space temporarily.
>  An
> >>> > optimize is *likely* to merge indexes, which will usually increase
> disk
> >>> > space temporarily.
> >>> >
> >>> > How much disk space depends very much upon how big your index is in
> the
> >>> > first place.  A 2 to 3 times factor of the sum of your peak index
> file
> >>> size
> >>> > seems safe, to me.
> >>> >
> >>> > Solr uses only modest amounts of memory for the JVM for this stuff.
> >>> >
> >>> > JRJ
> >>> >
> >>> > -----Original Message-----
> >>> > From: Sujatha Arun [mailto:suja.a...@gmail.com]
> >>> > Sent: Wednesday, October 19, 2011 4:04 AM
> >>> > To: solr-user@lucene.apache.org
> >>> > Subject: Optimization /Commit memory
> >>> >
> >>> > Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
> >>> > performing Commit or Optimize or Both?
> >>> >
> >>> > what is the requirement in terms of  size of RAM and HD for commit
> and
> >>> > Optimize
> >>> >
> >>> > Regards
> >>> > Sujatha
> >>> >
> >>>
> >>
> >>
> >
>

Reply via email to