I have found that segment flush is controlled by DocumentWriterFlushControl, and indexing is implemented by DocumentWriterPerThread. DocumentWriterFlushControl has information about number of doc and size of RAM buffer, but this seemed be shared by all DocumentWriterPerThread. Is that RAM limit is sum of all buffer of DocumentWriterPerThread?
2012/10/19 Jun Wang <wangjun...@gmail.com> > Hi > > I have 2 machine for a collection, and it's using DIH to import data, DIH > is trigger via url request at one machine, let's call it A, and A will > forward some index to machine B. Recently I have found that segment flush > happened more in machine B. here is part of INFOSTREAM.txt. > > Machine A: > ---------------------------- > DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as > segment _4r3 numDocs=71616 > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0 > deleted docs > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no > vectors; no norms; no docValues; prox; freqs > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: > flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm, > _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq] > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40 > D > > Machine B > ---------------------------------- > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings > as segment _zi0 numDocs=4302 > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment > has 0 deleted docs > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment > has no vectors; no norms; no docValues; prox; freqs > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: > flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt, > _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip] > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed > codec=Lucene40 > D > > I have found that flush occured when number of doc in RAM reached > 70000~9000 in machine A, but the number in machine B is very different, > almost is 4000. It seem that every doc in buffer used more RAM in machine > B then machine A, that result in more flush . Does any one know why this > happened? > > My conf is here. > > <ramBufferSizeMB>64</ramBufferSizeMB><maxBufferedDocs>100000</maxBufferedDocs> > > > > > -- > from Jun Wang > > > -- from Jun Wang