Re: segment number during optimize of index

Lance Norskog Wed, 10 Oct 2012 18:58:45 -0700

Study index merging. This is awesome.
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html


Jame- opening lots of segments is not a problem. A major performance problem 
you will find is 'Large Pages'. This is an operating-system strategy for 
managing servers with 10s of gigabytes of memory. Without it, all large 
programs run much more slowly than they could. It is not a Solr or JVM problem.


----- Original Message -----
| From: "jun Wang" <wangjun...@gmail.com>
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 6:36:09 PM
| Subject: Re: segment number during optimize of index
| 
| I have an other question, does the number of segment affect speed for
| update index?
| 
| 2012/10/10 jame vaalet <jamevaa...@gmail.com>
| 
| > Guys,
| > thanks for all the inputs, I was continuing my research to know
| > more about
| > segments in Lucene. Below are my conclusion, please correct me if
| > am wrong.
| >
| >    1. Segments are independent sub-indexes in seperate file, while
| >    indexing
| >    its better to create new segment as it doesnt have to modify an
| >    existing
| >    file. where as while searching, smaller the segment the better
| >    it is
| > since
| >    you open x (not exactly x but xn a value proportional to x)
| >    physical
| > files
| >    to search if you have got x segments in the index.
| >    2. since lucene has memory map concept, for each file/segment in
| >    index a
| >    new m-map file is created and mapped to the physcial file in
| >    disk. Can
| >    someone explain or correct this in detail, i am sure there are
| >    lot many
| >    people wondering how m-map works while you merge or optimze
| >    index
| > segments.
| >
| >
| >
| > On 6 October 2012 07:41, Otis Gospodnetic
| > <otis.gospodne...@gmail.com
| > >wrote:
| >
| > > If I were you.... and not knowing all your details...
| > >
| > > I would optimize indices that are static (not being modified) and
| > > would optimize down to 1 segment.
| > > I would do it when search traffic is low.
| > >
| > > Otis
| > > --
| > > Search Analytics -
| > > http://sematext.com/search-analytics/index.html
| > > Performance Monitoring - http://sematext.com/spm/index.html
| > >
| > >
| > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet
| > > <jamevaa...@gmail.com>
| > wrote:
| > > > Hi Eric,
| > > > I  am in a major dilemma with my index now. I have got 8 cores
| > > > each
| > > around
| > > > 300 GB in size and half of them are deleted documents in it and
| > > > above
| > > that
| > > > each has got around 100 segments as well. Do i issue a
| > > > expungeDelete
| > and
| > > > allow the merge policy to take care of the segments or optimize
| > > > them
| > into
| > > > single segment. Search performance is not at par compared to
| > > > usual solr
| > > > speed.
| > > > If i have to optimize what segment number should i choose? my
| > > > RAM size
| > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30
| > > > GB). Pleas
| > > > advice !
| > > >
| > > > thanks.
| > > >
| > > >
| > > > On 6 October 2012 00:00, Erick Erickson
| > > > <erickerick...@gmail.com>
| > wrote:
| > > >
| > > >> because eventually you'd run out of file handles. Imagine a
| > > >> long-running server with 100,000 segments. Totally
| > > >> unmanageable.
| > > >>
| > > >> I think shawn was emphasizing that RAM requirements don't
| > > >> depend on the number of segments. There are other
| > > >> resources that file consume however.
| > > >>
| > > >> Best
| > > >> Erick
| > > >>
| > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet
| > > >> <jamevaa...@gmail.com>
| > > wrote:
| > > >> > hi Shawn,
| > > >> > thanks for the detailed explanation.
| > > >> > I have got one doubt, you said it doesn matter how many
| > > >> > segments
| > index
| > > >> have
| > > >> > but then why does solr has this merge policy which merges
| > > >> > segments
| > > >> > frequently?  why can it leave the segments as it is rather
| > > >> > than
| > > merging
| > > >> > smaller one's into bigger one?
| > > >> >
| > > >> > thanks
| > > >> > .
| > > >> >
| > > >> > On 5 October 2012 05:46, Shawn Heisey <s...@elyograg.org>
| > > >> > wrote:
| > > >> >
| > > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote:
| > > >> >>
| > > >> >>> so imagine i have merged the 150 Gb index into single
| > > >> >>> segment,
| > this
| > > >> would
| > > >> >>> make a single segment of 150 GB in memory. When new docs
| > > >> >>> are
| > > indexed it
| > > >> >>> wouldn't alter this 150 Gb index unless i update or delete
| > > >> >>> the
| > older
| > > >> docs,
| > > >> >>> right? will 150 Gb single segment have problem with memory
| > swapping
| > > at
| > > >> OS
| > > >> >>> level?
| > > >> >>>
| > > >> >>
| > > >> >> Supplement to my previous reply:  the real memory mentioned
| > > >> >> in the
| > > last
| > > >> >> paragraph does not include the memory that the OS uses to
| > > >> >> cache
| > disk
| > > >> >> access.  If more memory is needed and all the free memory
| > > >> >> is being
| > > used
| > > >> by
| > > >> >> the disk cache, the OS will throw away part of the disk
| > > >> >> cache (a
| > > >> >> near-instantaneous operation that should never involve disk
| > > >> >> I/O)
| > and
| > > >> give
| > > >> >> that memory to the application that requests it.
| > > >> >>
| > > >> >> Here's a very good breakdown of how memory gets used with
| > > MMapDirectory
| > > >> in
| > > >> >> Solr.  It's applicable to any program that uses memory
| > > >> >> mapping, not
| > > just
| > > >> >> Solr:
| > > >> >>
| > > >> >>
| > > http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory<
| > > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory>
| > > >> >>
| > > >> >> Thanks,
| > > >> >> Shawn
| > > >> >>
| > > >> >>
| > > >> >
| > > >> >
| > > >> > --
| > > >> >
| > > >> > -JAME
| > > >>
| > > >
| > > >
| > > >
| > > > --
| > > >
| > > > -JAME
| > >
| >
| >
| >
| > --
| >
| > -JAME
| >
| 
| 
| 
| --
| from Jun Wang
|

Re: segment number during optimize of index

Reply via email to