Re: segment number during optimize of index

jame vaalet Thu, 11 Oct 2012 02:59:45 -0700

Hi Lance,
My earlier point may be misleading
"   1. Segments are independent sub-indexes in seperate file, while
| >    indexing
| >    its better to create new segment as it doesnt have to modify an
| >    existing
| >    file. where as while searching, *smaller the segment* the better
| >    it is
| > since
| >    you open x (not exactly x but xn a value proportional to x)
| >    physical
| > files
| >    to search if you have got x segments in the index."


The "smaller"was referencing to the segment number rather than segment
size.

When you said "Large Pages" does it mean segment size should be less than a
threshold for a better performance from OS point of view?  My main concern
here is what would be the main disadvantage (indexing  or searching) if i
merge my entire 150 GB index (right now 100 segments) into a single segment
?





On 11 October 2012 07:28, Lance Norskog <goks...@gmail.com> wrote:

> Study index merging. This is awesome.
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Jame- opening lots of segments is not a problem. A major performance
> problem you will find is 'Large Pages'. This is an operating-system
> strategy for managing servers with 10s of gigabytes of memory. Without it,
> all large programs run much more slowly than they could. It is not a Solr
> or JVM problem.
>
>
> ----- Original Message -----
> | From: "jun Wang" <wangjun...@gmail.com>
> | To: solr-user@lucene.apache.org
> | Sent: Wednesday, October 10, 2012 6:36:09 PM
> | Subject: Re: segment number during optimize of index
> |
> | I have an other question, does the number of segment affect speed for
> | update index?
> |
> | 2012/10/10 jame vaalet <jamevaa...@gmail.com>
> |
> | > Guys,
> | > thanks for all the inputs, I was continuing my research to know
> | > more about
> | > segments in Lucene. Below are my conclusion, please correct me if
> | > am wrong.
> | >
> | >    1. Segments are independent sub-indexes in seperate file, while
> | >    indexing
> | >    its better to create new segment as it doesnt have to modify an
> | >    existing
> | >    file. where as while searching, smaller the segment the better
> | >    it is
> | > since
> | >    you open x (not exactly x but xn a value proportional to x)
> | >    physical
> | > files
> | >    to search if you have got x segments in the index.
> | >    2. since lucene has memory map concept, for each file/segment in
> | >    index a
> | >    new m-map file is created and mapped to the physcial file in
> | >    disk. Can
> | >    someone explain or correct this in detail, i am sure there are
> | >    lot many
> | >    people wondering how m-map works while you merge or optimze
> | >    index
> | > segments.
> | >
> | >
> | >
> | > On 6 October 2012 07:41, Otis Gospodnetic
> | > <otis.gospodne...@gmail.com
> | > >wrote:
> | >
> | > > If I were you.... and not knowing all your details...
> | > >
> | > > I would optimize indices that are static (not being modified) and
> | > > would optimize down to 1 segment.
> | > > I would do it when search traffic is low.
> | > >
> | > > Otis
> | > > --
> | > > Search Analytics -
> | > > http://sematext.com/search-analytics/index.html
> | > > Performance Monitoring - http://sematext.com/spm/index.html
> | > >
> | > >
> | > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet
> | > > <jamevaa...@gmail.com>
> | > wrote:
> | > > > Hi Eric,
> | > > > I  am in a major dilemma with my index now. I have got 8 cores
> | > > > each
> | > > around
> | > > > 300 GB in size and half of them are deleted documents in it and
> | > > > above
> | > > that
> | > > > each has got around 100 segments as well. Do i issue a
> | > > > expungeDelete
> | > and
> | > > > allow the merge policy to take care of the segments or optimize
> | > > > them
> | > into
> | > > > single segment. Search performance is not at par compared to
> | > > > usual solr
> | > > > speed.
> | > > > If i have to optimize what segment number should i choose? my
> | > > > RAM size
> | > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30
> | > > > GB). Pleas
> | > > > advice !
> | > > >
> | > > > thanks.
> | > > >
> | > > >
> | > > > On 6 October 2012 00:00, Erick Erickson
> | > > > <erickerick...@gmail.com>
> | > wrote:
> | > > >
> | > > >> because eventually you'd run out of file handles. Imagine a
> | > > >> long-running server with 100,000 segments. Totally
> | > > >> unmanageable.
> | > > >>
> | > > >> I think shawn was emphasizing that RAM requirements don't
> | > > >> depend on the number of segments. There are other
> | > > >> resources that file consume however.
> | > > >>
> | > > >> Best
> | > > >> Erick
> | > > >>
> | > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet
> | > > >> <jamevaa...@gmail.com>
> | > > wrote:
> | > > >> > hi Shawn,
> | > > >> > thanks for the detailed explanation.
> | > > >> > I have got one doubt, you said it doesn matter how many
> | > > >> > segments
> | > index
> | > > >> have
> | > > >> > but then why does solr has this merge policy which merges
> | > > >> > segments
> | > > >> > frequently?  why can it leave the segments as it is rather
> | > > >> > than
> | > > merging
> | > > >> > smaller one's into bigger one?
> | > > >> >
> | > > >> > thanks
> | > > >> > .
> | > > >> >
> | > > >> > On 5 October 2012 05:46, Shawn Heisey <s...@elyograg.org>
> | > > >> > wrote:
> | > > >> >
> | > > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote:
> | > > >> >>
> | > > >> >>> so imagine i have merged the 150 Gb index into single
> | > > >> >>> segment,
> | > this
> | > > >> would
> | > > >> >>> make a single segment of 150 GB in memory. When new docs
> | > > >> >>> are
> | > > indexed it
> | > > >> >>> wouldn't alter this 150 Gb index unless i update or delete
> | > > >> >>> the
> | > older
> | > > >> docs,
> | > > >> >>> right? will 150 Gb single segment have problem with memory
> | > swapping
> | > > at
> | > > >> OS
> | > > >> >>> level?
> | > > >> >>>
> | > > >> >>
> | > > >> >> Supplement to my previous reply:  the real memory mentioned
> | > > >> >> in the
> | > > last
> | > > >> >> paragraph does not include the memory that the OS uses to
> | > > >> >> cache
> | > disk
> | > > >> >> access.  If more memory is needed and all the free memory
> | > > >> >> is being
> | > > used
> | > > >> by
> | > > >> >> the disk cache, the OS will throw away part of the disk
> | > > >> >> cache (a
> | > > >> >> near-instantaneous operation that should never involve disk
> | > > >> >> I/O)
> | > and
> | > > >> give
> | > > >> >> that memory to the application that requests it.
> | > > >> >>
> | > > >> >> Here's a very good breakdown of how memory gets used with
> | > > MMapDirectory
> | > > >> in
> | > > >> >> Solr.  It's applicable to any program that uses memory
> | > > >> >> mapping, not
> | > > just
> | > > >> >> Solr:
> | > > >> >>
> | > > >> >>
> | > >
> http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory<
> | > > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory
> >
> | > > >> >>
> | > > >> >> Thanks,
> | > > >> >> Shawn
> | > > >> >>
> | > > >> >>
> | > > >> >
> | > > >> >
> | > > >> > --
> | > > >> >
> | > > >> > -JAME
> | > > >>
> | > > >
> | > > >
> | > > >
> | > > > --
> | > > >
> | > > > -JAME
> | > >
> | >
> | >
> | >
> | > --
> | >
> | > -JAME
> | >
> |
> |
> |
> | --
> | from Jun Wang
> |
>



-- 

-JAME

Re: segment number during optimize of index

Reply via email to