I have an other question, does the number of segment affect speed for update index?
2012/10/10 jame vaalet <jamevaa...@gmail.com> > Guys, > thanks for all the inputs, I was continuing my research to know more about > segments in Lucene. Below are my conclusion, please correct me if am wrong. > > 1. Segments are independent sub-indexes in seperate file, while indexing > its better to create new segment as it doesnt have to modify an existing > file. where as while searching, smaller the segment the better it is > since > you open x (not exactly x but xn a value proportional to x) physical > files > to search if you have got x segments in the index. > 2. since lucene has memory map concept, for each file/segment in index a > new m-map file is created and mapped to the physcial file in disk. Can > someone explain or correct this in detail, i am sure there are lot many > people wondering how m-map works while you merge or optimze index > segments. > > > > On 6 October 2012 07:41, Otis Gospodnetic <otis.gospodne...@gmail.com > >wrote: > > > If I were you.... and not knowing all your details... > > > > I would optimize indices that are static (not being modified) and > > would optimize down to 1 segment. > > I would do it when search traffic is low. > > > > Otis > > -- > > Search Analytics - http://sematext.com/search-analytics/index.html > > Performance Monitoring - http://sematext.com/spm/index.html > > > > > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet <jamevaa...@gmail.com> > wrote: > > > Hi Eric, > > > I am in a major dilemma with my index now. I have got 8 cores each > > around > > > 300 GB in size and half of them are deleted documents in it and above > > that > > > each has got around 100 segments as well. Do i issue a expungeDelete > and > > > allow the merge policy to take care of the segments or optimize them > into > > > single segment. Search performance is not at par compared to usual solr > > > speed. > > > If i have to optimize what segment number should i choose? my RAM size > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas > > > advice ! > > > > > > thanks. > > > > > > > > > On 6 October 2012 00:00, Erick Erickson <erickerick...@gmail.com> > wrote: > > > > > >> because eventually you'd run out of file handles. Imagine a > > >> long-running server with 100,000 segments. Totally > > >> unmanageable. > > >> > > >> I think shawn was emphasizing that RAM requirements don't > > >> depend on the number of segments. There are other > > >> resources that file consume however. > > >> > > >> Best > > >> Erick > > >> > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet <jamevaa...@gmail.com> > > wrote: > > >> > hi Shawn, > > >> > thanks for the detailed explanation. > > >> > I have got one doubt, you said it doesn matter how many segments > index > > >> have > > >> > but then why does solr has this merge policy which merges segments > > >> > frequently? why can it leave the segments as it is rather than > > merging > > >> > smaller one's into bigger one? > > >> > > > >> > thanks > > >> > . > > >> > > > >> > On 5 October 2012 05:46, Shawn Heisey <s...@elyograg.org> wrote: > > >> > > > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote: > > >> >> > > >> >>> so imagine i have merged the 150 Gb index into single segment, > this > > >> would > > >> >>> make a single segment of 150 GB in memory. When new docs are > > indexed it > > >> >>> wouldn't alter this 150 Gb index unless i update or delete the > older > > >> docs, > > >> >>> right? will 150 Gb single segment have problem with memory > swapping > > at > > >> OS > > >> >>> level? > > >> >>> > > >> >> > > >> >> Supplement to my previous reply: the real memory mentioned in the > > last > > >> >> paragraph does not include the memory that the OS uses to cache > disk > > >> >> access. If more memory is needed and all the free memory is being > > used > > >> by > > >> >> the disk cache, the OS will throw away part of the disk cache (a > > >> >> near-instantaneous operation that should never involve disk I/O) > and > > >> give > > >> >> that memory to the application that requests it. > > >> >> > > >> >> Here's a very good breakdown of how memory gets used with > > MMapDirectory > > >> in > > >> >> Solr. It's applicable to any program that uses memory mapping, not > > just > > >> >> Solr: > > >> >> > > >> >> > > http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory< > > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory> > > >> >> > > >> >> Thanks, > > >> >> Shawn > > >> >> > > >> >> > > >> > > > >> > > > >> > -- > > >> > > > >> > -JAME > > >> > > > > > > > > > > > > -- > > > > > > -JAME > > > > > > -- > > -JAME > -- from Jun Wang