On 12/2/2017 6:59 PM, S G wrote:
I am a bit curious on the docValues implementation.
I understand that docValues do not use JVM memory and
they make use of OS cache - that is why they are more performant.

But to return any response from the docValues, the values in the
docValues' column-oriented-structures would need to be brought
into the JVM's memory. And that will then increase the pressure
on the JVM's memory anyways. So how do docValues actually
help from memory perspective?

What I'm writing below is my understanding of docValues. If it turns out that I've gotten any of it wrong, that is MY error, not Solr's.

When there are no docValues, Solr must do something called "uninverting the index" in order to satisfy certain operations -- primarily faceting, grouping, and sorting.

A Lucene index is an inverted index. This means that it is a big list of terms, and then each of those entries is a second list that describes which fields and documents have the term, as well as some other information like positions. Uninverting the index is pretty efficient, but it does take time. The uninverted index structure is a list of all terms for a specific field. Then there's a second phase -- the info in the uninverted field is read and processed for the query operation, which will use heap. I do not know if there are additional phases. There might be.

In case you don't know, in the Lucene index, docValues data on disk consists of every entry in the index for one field, written sequentially in an uncompressed format.

This means that for those query types, docValues is *exactly* what Solr needs for the first phase. And instead of generating it into heap memory and then reading it, Solr can just read the data right off the disk (which the OS caches, so it might be REALLY fast and use OS memory) in order to handle second and later phases. This is faster than building an uninverted field, and consumes no heap memory.

As I mentioned, the uninverted data is built from indexed terms. The contents of docValue data is the same as a stored field -- the original indexed data. Because docValues cannot be added to fields using solr.TextField, the only type that undergoes text analysis, there's no possibility of a difference between an uninverted field and docValues.

Thanks,
Shawn

Reply via email to