Hi Lance, My earlier point may be misleading " 1. Segments are independent sub-indexes in seperate file, while | > indexing | > its better to create new segment as it doesnt have to modify an | > existing | > file. where as while searching, *smaller the segment* the better | > it is | > since | > you open x (not exactly x but xn a value proportional to x) | > physical | > files | > to search if you have got x segments in the index."
The "smaller"was referencing to the segment number rather than segment size. When you said "Large Pages" does it mean segment size should be less than a threshold for a better performance from OS point of view? My main concern here is what would be the main disadvantage (indexing or searching) if i merge my entire 150 GB index (right now 100 segments) into a single segment ? On 11 October 2012 07:28, Lance Norskog <goks...@gmail.com> wrote: > Study index merging. This is awesome. > > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > Jame- opening lots of segments is not a problem. A major performance > problem you will find is 'Large Pages'. This is an operating-system > strategy for managing servers with 10s of gigabytes of memory. Without it, > all large programs run much more slowly than they could. It is not a Solr > or JVM problem. > > > ----- Original Message ----- > | From: "jun Wang" <wangjun...@gmail.com> > | To: solr-user@lucene.apache.org > | Sent: Wednesday, October 10, 2012 6:36:09 PM > | Subject: Re: segment number during optimize of index > | > | I have an other question, does the number of segment affect speed for > | update index? > | > | 2012/10/10 jame vaalet <jamevaa...@gmail.com> > | > | > Guys, > | > thanks for all the inputs, I was continuing my research to know > | > more about > | > segments in Lucene. Below are my conclusion, please correct me if > | > am wrong. > | > > | > 1. Segments are independent sub-indexes in seperate file, while > | > indexing > | > its better to create new segment as it doesnt have to modify an > | > existing > | > file. where as while searching, smaller the segment the better > | > it is > | > since > | > you open x (not exactly x but xn a value proportional to x) > | > physical > | > files > | > to search if you have got x segments in the index. > | > 2. since lucene has memory map concept, for each file/segment in > | > index a > | > new m-map file is created and mapped to the physcial file in > | > disk. Can > | > someone explain or correct this in detail, i am sure there are > | > lot many > | > people wondering how m-map works while you merge or optimze > | > index > | > segments. > | > > | > > | > > | > On 6 October 2012 07:41, Otis Gospodnetic > | > <otis.gospodne...@gmail.com > | > >wrote: > | > > | > > If I were you.... and not knowing all your details... > | > > > | > > I would optimize indices that are static (not being modified) and > | > > would optimize down to 1 segment. > | > > I would do it when search traffic is low. > | > > > | > > Otis > | > > -- > | > > Search Analytics - > | > > http://sematext.com/search-analytics/index.html > | > > Performance Monitoring - http://sematext.com/spm/index.html > | > > > | > > > | > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet > | > > <jamevaa...@gmail.com> > | > wrote: > | > > > Hi Eric, > | > > > I am in a major dilemma with my index now. I have got 8 cores > | > > > each > | > > around > | > > > 300 GB in size and half of them are deleted documents in it and > | > > > above > | > > that > | > > > each has got around 100 segments as well. Do i issue a > | > > > expungeDelete > | > and > | > > > allow the merge policy to take care of the segments or optimize > | > > > them > | > into > | > > > single segment. Search performance is not at par compared to > | > > > usual solr > | > > > speed. > | > > > If i have to optimize what segment number should i choose? my > | > > > RAM size > | > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30 > | > > > GB). Pleas > | > > > advice ! > | > > > > | > > > thanks. > | > > > > | > > > > | > > > On 6 October 2012 00:00, Erick Erickson > | > > > <erickerick...@gmail.com> > | > wrote: > | > > > > | > > >> because eventually you'd run out of file handles. Imagine a > | > > >> long-running server with 100,000 segments. Totally > | > > >> unmanageable. > | > > >> > | > > >> I think shawn was emphasizing that RAM requirements don't > | > > >> depend on the number of segments. There are other > | > > >> resources that file consume however. > | > > >> > | > > >> Best > | > > >> Erick > | > > >> > | > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet > | > > >> <jamevaa...@gmail.com> > | > > wrote: > | > > >> > hi Shawn, > | > > >> > thanks for the detailed explanation. > | > > >> > I have got one doubt, you said it doesn matter how many > | > > >> > segments > | > index > | > > >> have > | > > >> > but then why does solr has this merge policy which merges > | > > >> > segments > | > > >> > frequently? why can it leave the segments as it is rather > | > > >> > than > | > > merging > | > > >> > smaller one's into bigger one? > | > > >> > > | > > >> > thanks > | > > >> > . > | > > >> > > | > > >> > On 5 October 2012 05:46, Shawn Heisey <s...@elyograg.org> > | > > >> > wrote: > | > > >> > > | > > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote: > | > > >> >> > | > > >> >>> so imagine i have merged the 150 Gb index into single > | > > >> >>> segment, > | > this > | > > >> would > | > > >> >>> make a single segment of 150 GB in memory. When new docs > | > > >> >>> are > | > > indexed it > | > > >> >>> wouldn't alter this 150 Gb index unless i update or delete > | > > >> >>> the > | > older > | > > >> docs, > | > > >> >>> right? will 150 Gb single segment have problem with memory > | > swapping > | > > at > | > > >> OS > | > > >> >>> level? > | > > >> >>> > | > > >> >> > | > > >> >> Supplement to my previous reply: the real memory mentioned > | > > >> >> in the > | > > last > | > > >> >> paragraph does not include the memory that the OS uses to > | > > >> >> cache > | > disk > | > > >> >> access. If more memory is needed and all the free memory > | > > >> >> is being > | > > used > | > > >> by > | > > >> >> the disk cache, the OS will throw away part of the disk > | > > >> >> cache (a > | > > >> >> near-instantaneous operation that should never involve disk > | > > >> >> I/O) > | > and > | > > >> give > | > > >> >> that memory to the application that requests it. > | > > >> >> > | > > >> >> Here's a very good breakdown of how memory gets used with > | > > MMapDirectory > | > > >> in > | > > >> >> Solr. It's applicable to any program that uses memory > | > > >> >> mapping, not > | > > just > | > > >> >> Solr: > | > > >> >> > | > > >> >> > | > > > http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory< > | > > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory > > > | > > >> >> > | > > >> >> Thanks, > | > > >> >> Shawn > | > > >> >> > | > > >> >> > | > > >> > > | > > >> > > | > > >> > -- > | > > >> > > | > > >> > -JAME > | > > >> > | > > > > | > > > > | > > > > | > > > -- > | > > > > | > > > -JAME > | > > > | > > | > > | > > | > -- > | > > | > -JAME > | > > | > | > | > | -- > | from Jun Wang > | > -- -JAME