Re: JVM GC Issue

S G Mon, 04 Dec 2017 10:28:49 -0800

I think the below article explains it well:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html


I was thinking that doc-Values need to be transitioned into JVM from the OS
cache.
Turns out that is not required as the docValues are loaded into the virtual
address space by the OS.
The JVM need not think about loading them into its own memory as it can
just access the virtual memory as easily.
The OS keeps track of whether the docValues should be loaded into memory
(if their address is actually being accessed by the JVM) or they just keep
lying on the disk.


Thx - SG

On Sun, Dec 3, 2017 at 12:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/2/2017 6:59 PM, S G wrote:
>
>> I am a bit curious on the docValues implementation.
>> I understand that docValues do not use JVM memory and
>> they make use of OS cache - that is why they are more performant.
>>
>> But to return any response from the docValues, the values in the
>> docValues' column-oriented-structures would need to be brought
>> into the JVM's memory. And that will then increase the pressure
>> on the JVM's memory anyways. So how do docValues actually
>> help from memory perspective?
>>
>
> What I'm writing below is my understanding of docValues.  If it turns out
> that I've gotten any of it wrong, that is MY error, not Solr's.
>
> When there are no docValues, Solr must do something called "uninverting
> the index" in order to satisfy certain operations -- primarily faceting,
> grouping, and sorting.
>
> A Lucene index is an inverted index.  This means that it is a big list of
> terms, and then each of those entries is a second list that describes which
> fields and documents have the term, as well as some other information like
> positions.  Uninverting the index is pretty efficient, but it does take
> time.  The uninverted index structure is a list of all terms for a specific
> field.  Then there's a second phase -- the info in the uninverted field is
> read and processed for the query operation, which will use heap.  I do not
> know if there are additional phases. There might be.
>
> In case you don't know, in the Lucene index, docValues data on disk
> consists of every entry in the index for one field, written sequentially in
> an uncompressed format.
>
> This means that for those query types, docValues is *exactly* what Solr
> needs for the first phase.  And instead of generating it into heap memory
> and then reading it, Solr can just read the data right off the disk (which
> the OS caches, so it might be REALLY fast and use OS memory) in order to
> handle second and later phases.  This is faster than building an uninverted
> field, and consumes no heap memory.
>
> As I mentioned, the uninverted data is built from indexed terms.  The
> contents of docValue data is the same as a stored field -- the original
> indexed data.  Because docValues cannot be added to fields using
> solr.TextField, the only type that undergoes text analysis, there's no
> possibility of a difference between an uninverted field and docValues.
>
> Thanks,
> Shawn
>

Re: JVM GC Issue

Reply via email to