Hi,

It seems that docValues are not really explained well anywhere.
Here are 2 links that try to explain it:
1) https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/
2)
https://www.elastic.co/guide/en/elasticsearch/guide/current/docvalues.html

And official Solr documentation that does not explain the internal details
at all:
3) https://lucene.apache.org/solr/guide/6_6/docvalues.html

The first links says that:
  The row-oriented (stored fields) are
  {
    'doc1': {'A':1, 'B':2, 'C':3},
    'doc2': {'A':2, 'B':3, 'C':4},
    'doc3': {'A':4, 'B':3, 'C':2}
  }

  while column-oriented (docValues) are:
  {
    'A': {'doc1':1, 'doc2':2, 'doc3':4},
    'B': {'doc1':2, 'doc2':3, 'doc3':3},
    'C': {'doc1':3, 'doc2':4, 'doc3':2}
  }

And the second link gives an example as:
Doc values maps documents to the terms contained by the document:

  Doc      Terms
  -----------------------------------------------------------------
  Doc_1 | brown, dog, fox, jumped, lazy, over, quick, the
  Doc_2 | brown, dogs, foxes, in, lazy, leap, over, quick, summer
  Doc_3 | dog, dogs, fox, jumped, over, quick, the
  -----------------------------------------------------------------


To me, this example is same as the row-oriented (stored fields) format in
the first link.
Which one is right?



Also, the column-oriented (docValues) mentioned above are:
{
  'A': {'doc1':1, 'doc2':2, 'doc3':4},
  'B': {'doc1':2, 'doc2':3, 'doc3':3},
  'C': {'doc1':3, 'doc2':4, 'doc3':2}
}
Isn't this what the inverted index also looks like?
Inverted index is an index of the term (A,B,C) to the document and the
position it is found in the document.


Or is it better to say that the inverted index is of the form:
{
   map-for-field-A: {1: doc1, 2: doc2, 4: doc3}
   map-for-field-B: {2: doc1, 3: [doc2,doc3]}
   map-for-field-C: {3: doc1, 4: doc2, 2: doc3}
}
But even if that is true, I do not see why sorting or faceting on any field
A, B or C would be a problem.
All the values for a field are there in one data-structure and it should be
easy to sort or group-by on that.

Can someone explain the above a bit more clearly please? A build-upon the
same example as above would be really good.


Thanks
SG

Reply via email to