This makes sense, any ideas why lucene/solr will use 10g heap for a 20g
index.My hypothesis was merging segments was trying to read it all but if
that's not the case I am out of ideas. The one caveat is we are trying to
add the documents quickly (~1g an hour) but if lucene does write 100m
segments and does streaming merge it shouldn't matter?

On Sat, Jun 1, 2019 at 9:24 AM Walter Underwood <wun...@wunderwood.org>
wrote:

> > On May 31, 2019, at 11:27 PM, John Davis <johndavis925...@gmail.com>
> wrote:
> >
> > 2. Merging segments - does solr load the entire segment in memory or
> chunks
> > of it? if later how large are these chunks
>
> No, it does not read the entire segment into memory.
>
> A fundamental part of the Lucene design is streaming posting lists into
> memory and processing them sequentially. The same amount of memory is
> needed for small or large segments. Each posting list is in document-id
> order. The merge is a merge of sorted lists, writing a new posting list in
> document-id order.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>

Reply via email to