Re: JVM GC Issue

Shawn Heisey Sun, 03 Dec 2017 12:03:10 -0800

On 12/2/2017 6:59 PM, S G wrote:

I am a bit curious on the docValues implementation.
I understand that docValues do not use JVM memory and
they make use of OS cache - that is why they are more performant.


But to return any response from the docValues, the values in the
docValues' column-oriented-structures would need to be brought
into the JVM's memory. And that will then increase the pressure
on the JVM's memory anyways. So how do docValues actually
help from memory perspective?

What I'm writing below is my understanding of docValues. If it turnsout that I've gotten any of it wrong, that is MY error, not Solr's.

When there are no docValues, Solr must do something called "uninvertingthe index" in order to satisfy certain operations -- primarily faceting,grouping, and sorting.

A Lucene index is an inverted index. This means that it is a big listof terms, and then each of those entries is a second list that describeswhich fields and documents have the term, as well as some otherinformation like positions. Uninverting the index is pretty efficient,but it does take time. The uninverted index structure is a list of allterms for a specific field. Then there's a second phase -- the info inthe uninverted field is read and processed for the query operation,which will use heap. I do not know if there are additional phases.There might be.

In case you don't know, in the Lucene index, docValues data on diskconsists of every entry in the index for one field, written sequentiallyin an uncompressed format.

This means that for those query types, docValues is *exactly* what Solrneeds for the first phase. And instead of generating it into heapmemory and then reading it, Solr can just read the data right off thedisk (which the OS caches, so it might be REALLY fast and use OS memory)in order to handle second and later phases. This is faster thanbuilding an uninverted field, and consumes no heap memory.

As I mentioned, the uninverted data is built from indexed terms. Thecontents of docValue data is the same as a stored field -- the originalindexed data. Because docValues cannot be added to fields usingsolr.TextField, the only type that undergoes text analysis, there's nopossibility of a difference between an uninverted field and docValues.


Thanks,
Shawn

Re: JVM GC Issue

Reply via email to