Re: Docvalue v.s. invert index

Mikhail Khludnev Sun, 12 Aug 2018 22:49:56 -0700

My expectation is that scanning Doc Values might be faster than inverted
index if a query matches more than %25 of documents.


On Sun, Aug 12, 2018 at 7:59 PM Erick Erickson <[email protected]>
wrote:

> bq. I have been informed that the performance of such a search is
> absolutely terrible.
>
> Yep. Horrible.
>
> These two structures answer completely different questions
> indexed - "for this word, what docs contain it in field X?"
> DocValues - "for this document, what is the value of field X?"
>
> On my, my usual examples are going out of date. "phone book" and
> "dictionary". There used to be, in the old days, these book-like
> things that were printed on actual paper and you could use them to
> find people's phone number and address, or what the meaning of a word
> was. Siiiiggghhhh.
>
> Well, get a paper phone book from somewhere off the shelf and consider
> each entry a "document", and the phone number and address the "text"
>
> DocValues answers "for person X, what is the phone number" easily, the
> whole thing is alphabetically arranged. But to answer the question
> "Who lives on Maple street" you have to read _everything_ in the
> entire phone book. Think "table scan".
>
> To answer the question "Who lives on Maple street", you want to index
> all the text.
>
> The whole point of docValues was that the structure that was used to
> answer the first question was built in the heap at runtime, consuming
> memory and CPU cycles. DocValues serialized that structure to disk at
> index time where it is
> 1> easily read as memory pages
> 2> almost entirely kept in MMapDirectory space, see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best,
> Erick
>
>
> On Sun, Aug 12, 2018 at 8:56 AM, Shawn Heisey <[email protected]> wrote:
> > On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote:
> >>
> >> Could we say that docvalue technique is better for sorting and faceting
> >> and
> >> inverted index one is better for searching?
> >
> >
> > Yes.  That is how things work.
> >
> > If docValues do not exist, then an equivalent data structure must be
> built
> > in heap memory *from* the inverted index in order for faceting or
> sorting to
> > take place.  When docValues are present, Solr can just read the data
> > directly instead of generating it.  If there is plenty of spare memory
> for
> > the OS to cache data, this is faster.  It also uses less Java heap
> memory.
> >
> >> Will I lose anything if I only use docvalue?
> >>
> >> Does docvalue technique have better performance?
> >
> >
> > From what I understand, it actually is possible to search when docValues
> are
> > present but the inverted index isn't, assuming that what you're searching
> > for is the full value of the field, not an individual word.  I have been
> > informed that the performance of such a search is absolutely terrible.
> >
> > Thanks,
> > Shawn
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Docvalue v.s. invert index

Reply via email to