Re: Solr 4.0 segment flush times has bigger difference between tow machines

Jun Wang Fri, 19 Oct 2012 04:59:47 -0700

I have found that segment flush is controlled by
DocumentWriterFlushControl, and indexing is implemented by
DocumentWriterPerThread. DocumentWriterFlushControl has information about
number of doc and size of RAM buffer, but this seemed be shared by
all DocumentWriterPerThread. Is that RAM limit is sum of all buffer
of DocumentWriterPerThread?


2012/10/19 Jun Wang <wangjun...@gmail.com>

> Hi
>
> I have 2 machine for a collection, and it's using DIH to import data, DIH
> is trigger via url request at one machine, let's call it A, and A will
> forward some index to machine B. Recently I have found that segment flush
> happened more in machine B. here is part of INFOSTREAM.txt.
>
> Machine A:
> ----------------------------
> DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as
> segment _4r3 numDocs=71616
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0
> deleted docs
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no
> vectors; no norms; no docValues; prox; freqs
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]:
> flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm,
> _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq]
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40
> D
>
> Machine B
> ----------------------------------
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings
> as segment _zi0 numDocs=4302
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
> has 0 deleted docs
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
> has no vectors; no norms; no docValues; prox; freqs
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]:
> flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt,
> _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip]
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed
> codec=Lucene40
> D
>
> I have found that flush occured  when number of doc in RAM reached
> 70000~9000 in machine A, but the number in machine B is very different,
> almost is 4000.  It seem that every doc in buffer used more RAM in machine
> B then machine A, that result in more flush . Does any one know why this
> happened?
>
> My conf is here.
>
> <ramBufferSizeMB>64</ramBufferSizeMB><maxBufferedDocs>100000</maxBufferedDocs>
>
>
>
>
> --
> from Jun Wang
>
>
>


-- 
from Jun Wang

Re: Solr 4.0 segment flush times has bigger difference between tow machines

Reply via email to