Study index merging. This is awesome. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
Jame- opening lots of segments is not a problem. A major performance problem you will find is 'Large Pages'. This is an operating-system strategy for managing servers with 10s of gigabytes of memory. Without it, all large programs run much more slowly than they could. It is not a Solr or JVM problem. ----- Original Message ----- | From: "jun Wang" <wangjun...@gmail.com> | To: solr-user@lucene.apache.org | Sent: Wednesday, October 10, 2012 6:36:09 PM | Subject: Re: segment number during optimize of index | | I have an other question, does the number of segment affect speed for | update index? | | 2012/10/10 jame vaalet <jamevaa...@gmail.com> | | > Guys, | > thanks for all the inputs, I was continuing my research to know | > more about | > segments in Lucene. Below are my conclusion, please correct me if | > am wrong. | > | > 1. Segments are independent sub-indexes in seperate file, while | > indexing | > its better to create new segment as it doesnt have to modify an | > existing | > file. where as while searching, smaller the segment the better | > it is | > since | > you open x (not exactly x but xn a value proportional to x) | > physical | > files | > to search if you have got x segments in the index. | > 2. since lucene has memory map concept, for each file/segment in | > index a | > new m-map file is created and mapped to the physcial file in | > disk. Can | > someone explain or correct this in detail, i am sure there are | > lot many | > people wondering how m-map works while you merge or optimze | > index | > segments. | > | > | > | > On 6 October 2012 07:41, Otis Gospodnetic | > <otis.gospodne...@gmail.com | > >wrote: | > | > > If I were you.... and not knowing all your details... | > > | > > I would optimize indices that are static (not being modified) and | > > would optimize down to 1 segment. | > > I would do it when search traffic is low. | > > | > > Otis | > > -- | > > Search Analytics - | > > http://sematext.com/search-analytics/index.html | > > Performance Monitoring - http://sematext.com/spm/index.html | > > | > > | > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet | > > <jamevaa...@gmail.com> | > wrote: | > > > Hi Eric, | > > > I am in a major dilemma with my index now. I have got 8 cores | > > > each | > > around | > > > 300 GB in size and half of them are deleted documents in it and | > > > above | > > that | > > > each has got around 100 segments as well. Do i issue a | > > > expungeDelete | > and | > > > allow the merge policy to take care of the segments or optimize | > > > them | > into | > > > single segment. Search performance is not at par compared to | > > > usual solr | > > > speed. | > > > If i have to optimize what segment number should i choose? my | > > > RAM size | > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30 | > > > GB). Pleas | > > > advice ! | > > > | > > > thanks. | > > > | > > > | > > > On 6 October 2012 00:00, Erick Erickson | > > > <erickerick...@gmail.com> | > wrote: | > > > | > > >> because eventually you'd run out of file handles. Imagine a | > > >> long-running server with 100,000 segments. Totally | > > >> unmanageable. | > > >> | > > >> I think shawn was emphasizing that RAM requirements don't | > > >> depend on the number of segments. There are other | > > >> resources that file consume however. | > > >> | > > >> Best | > > >> Erick | > > >> | > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet | > > >> <jamevaa...@gmail.com> | > > wrote: | > > >> > hi Shawn, | > > >> > thanks for the detailed explanation. | > > >> > I have got one doubt, you said it doesn matter how many | > > >> > segments | > index | > > >> have | > > >> > but then why does solr has this merge policy which merges | > > >> > segments | > > >> > frequently? why can it leave the segments as it is rather | > > >> > than | > > merging | > > >> > smaller one's into bigger one? | > > >> > | > > >> > thanks | > > >> > . | > > >> > | > > >> > On 5 October 2012 05:46, Shawn Heisey <s...@elyograg.org> | > > >> > wrote: | > > >> > | > > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote: | > > >> >> | > > >> >>> so imagine i have merged the 150 Gb index into single | > > >> >>> segment, | > this | > > >> would | > > >> >>> make a single segment of 150 GB in memory. When new docs | > > >> >>> are | > > indexed it | > > >> >>> wouldn't alter this 150 Gb index unless i update or delete | > > >> >>> the | > older | > > >> docs, | > > >> >>> right? will 150 Gb single segment have problem with memory | > swapping | > > at | > > >> OS | > > >> >>> level? | > > >> >>> | > > >> >> | > > >> >> Supplement to my previous reply: the real memory mentioned | > > >> >> in the | > > last | > > >> >> paragraph does not include the memory that the OS uses to | > > >> >> cache | > disk | > > >> >> access. If more memory is needed and all the free memory | > > >> >> is being | > > used | > > >> by | > > >> >> the disk cache, the OS will throw away part of the disk | > > >> >> cache (a | > > >> >> near-instantaneous operation that should never involve disk | > > >> >> I/O) | > and | > > >> give | > > >> >> that memory to the application that requests it. | > > >> >> | > > >> >> Here's a very good breakdown of how memory gets used with | > > MMapDirectory | > > >> in | > > >> >> Solr. It's applicable to any program that uses memory | > > >> >> mapping, not | > > just | > > >> >> Solr: | > > >> >> | > > >> >> | > > http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory< | > > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory> | > > >> >> | > > >> >> Thanks, | > > >> >> Shawn | > > >> >> | > > >> >> | > > >> > | > > >> > | > > >> > -- | > > >> > | > > >> > -JAME | > > >> | > > > | > > > | > > > | > > > -- | > > > | > > > -JAME | > > | > | > | > | > -- | > | > -JAME | > | | | | -- | from Jun Wang |