This makes sense, any ideas why lucene/solr will use 10g heap for a 20g index.My hypothesis was merging segments was trying to read it all but if that's not the case I am out of ideas. The one caveat is we are trying to add the documents quickly (~1g an hour) but if lucene does write 100m segments and does streaming merge it shouldn't matter?
On Sat, Jun 1, 2019 at 9:24 AM Walter Underwood <wun...@wunderwood.org> wrote: > > On May 31, 2019, at 11:27 PM, John Davis <johndavis925...@gmail.com> > wrote: > > > > 2. Merging segments - does solr load the entire segment in memory or > chunks > > of it? if later how large are these chunks > > No, it does not read the entire segment into memory. > > A fundamental part of the Lucene design is streaming posting lists into > memory and processing them sequentially. The same amount of memory is > needed for small or large segments. Each posting list is in document-id > order. The merge is a merge of sorted lists, writing a new posting list in > document-id order. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >