My expectation is that scanning Doc Values might be faster than inverted index if a query matches more than %25 of documents.
On Sun, Aug 12, 2018 at 7:59 PM Erick Erickson <erickerick...@gmail.com> wrote: > bq. I have been informed that the performance of such a search is > absolutely terrible. > > Yep. Horrible. > > These two structures answer completely different questions > indexed - "for this word, what docs contain it in field X?" > DocValues - "for this document, what is the value of field X?" > > On my, my usual examples are going out of date. "phone book" and > "dictionary". There used to be, in the old days, these book-like > things that were printed on actual paper and you could use them to > find people's phone number and address, or what the meaning of a word > was. Siiiiggghhhh. > > Well, get a paper phone book from somewhere off the shelf and consider > each entry a "document", and the phone number and address the "text" > > DocValues answers "for person X, what is the phone number" easily, the > whole thing is alphabetically arranged. But to answer the question > "Who lives on Maple street" you have to read _everything_ in the > entire phone book. Think "table scan". > > To answer the question "Who lives on Maple street", you want to index > all the text. > > The whole point of docValues was that the structure that was used to > answer the first question was built in the heap at runtime, consuming > memory and CPU cycles. DocValues serialized that structure to disk at > index time where it is > 1> easily read as memory pages > 2> almost entirely kept in MMapDirectory space, see: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Best, > Erick > > > On Sun, Aug 12, 2018 at 8:56 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote: > >> > >> Could we say that docvalue technique is better for sorting and faceting > >> and > >> inverted index one is better for searching? > > > > > > Yes. That is how things work. > > > > If docValues do not exist, then an equivalent data structure must be > built > > in heap memory *from* the inverted index in order for faceting or > sorting to > > take place. When docValues are present, Solr can just read the data > > directly instead of generating it. If there is plenty of spare memory > for > > the OS to cache data, this is faster. It also uses less Java heap > memory. > > > >> Will I lose anything if I only use docvalue? > >> > >> Does docvalue technique have better performance? > > > > > > From what I understand, it actually is possible to search when docValues > are > > present but the inverted index isn't, assuming that what you're searching > > for is the full value of the field, not an individual word. I have been > > informed that the performance of such a search is absolutely terrible. > > > > Thanks, > > Shawn > > > -- Sincerely yours Mikhail Khludnev