Re: Results grouping performance with groups.ngroups=true
I mean, you might probably count the same counts by json facet *instead* slow grouping count, like https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789 On Sun, Aug 12, 2018 at 7:09 AM SayantiGmail wrote: > Hi Mikhail > > Even after using json facets latency seems to be high if > group.ngroups=true. > > Regards, > Sayan > > > On 12 Aug 2018, at 02:07, Mikhail Khludnev wrote: > > > > As far as I remember, groups facets can be calculated with json.facets a > > way faster. > > > >> On Sat, Aug 11, 2018 at 1:43 PM SayantiGmail > wrote: > >> > >> Hi, > >> > >> The time taken to group results when the resultset has ~ 200k items is > >> very high. > >> > >> Is there a way to optimize the performance. > >> The group count and facet count is required. > >> > >> Regards, > >> Sayan > >> > >> > >> > > > > -- > > Sincerely yours > > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
Docvalue v.s. invert index
Could we say that docvalue technique is better for sorting and faceting and inverted index one is better for searching? Will I lose anything if I only use docvalue? Does docvalue technique have better performance? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Docvalue v.s. invert index
> Could we say that docvalue technique is better for sorting and faceting and inverted index one is better for searching? The short answer is yes. In addition, there are several special data structures for numeric/date range/geo spatial search. https://lucene.apache.org/solr/guide/7_4/field-types-included-with-solr.html > Will I lose anything if I only use docvalue? > Does docvalue technique have better performance? I guess no one can answer to such too general question. If you have any concrete problems/concerns, you should specify more details of that to get good advices. Regards, Tomoko 2018年8月12日(日) 19:39 Zahra Aminolroaya : > Could we say that docvalue technique is better for sorting and faceting and > inverted index one is better for searching? > > Will I lose anything if I only use docvalue? > > Does docvalue technique have better performance? > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Tomoko Uchida
Re: Docvalue v.s. invert index
On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote: Could we say that docvalue technique is better for sorting and faceting and inverted index one is better for searching? Yes. That is how things work. If docValues do not exist, then an equivalent data structure must be built in heap memory *from* the inverted index in order for faceting or sorting to take place. When docValues are present, Solr can just read the data directly instead of generating it. If there is plenty of spare memory for the OS to cache data, this is faster. It also uses less Java heap memory. Will I lose anything if I only use docvalue? Does docvalue technique have better performance? From what I understand, it actually is possible to search when docValues are present but the inverted index isn't, assuming that what you're searching for is the full value of the field, not an individual word. I have been informed that the performance of such a search is absolutely terrible. Thanks, Shawn
Re: Docvalue v.s. invert index
bq. I have been informed that the performance of such a search is absolutely terrible. Yep. Horrible. These two structures answer completely different questions indexed - "for this word, what docs contain it in field X?" DocValues - "for this document, what is the value of field X?" On my, my usual examples are going out of date. "phone book" and "dictionary". There used to be, in the old days, these book-like things that were printed on actual paper and you could use them to find people's phone number and address, or what the meaning of a word was. Sggg. Well, get a paper phone book from somewhere off the shelf and consider each entry a "document", and the phone number and address the "text" DocValues answers "for person X, what is the phone number" easily, the whole thing is alphabetically arranged. But to answer the question "Who lives on Maple street" you have to read _everything_ in the entire phone book. Think "table scan". To answer the question "Who lives on Maple street", you want to index all the text. The whole point of docValues was that the structure that was used to answer the first question was built in the heap at runtime, consuming memory and CPU cycles. DocValues serialized that structure to disk at index time where it is 1> easily read as memory pages 2> almost entirely kept in MMapDirectory space, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Sun, Aug 12, 2018 at 8:56 AM, Shawn Heisey wrote: > On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote: >> >> Could we say that docvalue technique is better for sorting and faceting >> and >> inverted index one is better for searching? > > > Yes. That is how things work. > > If docValues do not exist, then an equivalent data structure must be built > in heap memory *from* the inverted index in order for faceting or sorting to > take place. When docValues are present, Solr can just read the data > directly instead of generating it. If there is plenty of spare memory for > the OS to cache data, this is faster. It also uses less Java heap memory. > >> Will I lose anything if I only use docvalue? >> >> Does docvalue technique have better performance? > > > From what I understand, it actually is possible to search when docValues are > present but the inverted index isn't, assuming that what you're searching > for is the full value of the field, not an individual word. I have been > informed that the performance of such a search is absolutely terrible. > > Thanks, > Shawn >
Re: Docvalue v.s. invert index
My expectation is that scanning Doc Values might be faster than inverted index if a query matches more than %25 of documents. On Sun, Aug 12, 2018 at 7:59 PM Erick Erickson wrote: > bq. I have been informed that the performance of such a search is > absolutely terrible. > > Yep. Horrible. > > These two structures answer completely different questions > indexed - "for this word, what docs contain it in field X?" > DocValues - "for this document, what is the value of field X?" > > On my, my usual examples are going out of date. "phone book" and > "dictionary". There used to be, in the old days, these book-like > things that were printed on actual paper and you could use them to > find people's phone number and address, or what the meaning of a word > was. Sggg. > > Well, get a paper phone book from somewhere off the shelf and consider > each entry a "document", and the phone number and address the "text" > > DocValues answers "for person X, what is the phone number" easily, the > whole thing is alphabetically arranged. But to answer the question > "Who lives on Maple street" you have to read _everything_ in the > entire phone book. Think "table scan". > > To answer the question "Who lives on Maple street", you want to index > all the text. > > The whole point of docValues was that the structure that was used to > answer the first question was built in the heap at runtime, consuming > memory and CPU cycles. DocValues serialized that structure to disk at > index time where it is > 1> easily read as memory pages > 2> almost entirely kept in MMapDirectory space, see: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Best, > Erick > > > On Sun, Aug 12, 2018 at 8:56 AM, Shawn Heisey wrote: > > On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote: > >> > >> Could we say that docvalue technique is better for sorting and faceting > >> and > >> inverted index one is better for searching? > > > > > > Yes. That is how things work. > > > > If docValues do not exist, then an equivalent data structure must be > built > > in heap memory *from* the inverted index in order for faceting or > sorting to > > take place. When docValues are present, Solr can just read the data > > directly instead of generating it. If there is plenty of spare memory > for > > the OS to cache data, this is faster. It also uses less Java heap > memory. > > > >> Will I lose anything if I only use docvalue? > >> > >> Does docvalue technique have better performance? > > > > > > From what I understand, it actually is possible to search when docValues > are > > present but the inverted index isn't, assuming that what you're searching > > for is the full value of the field, not an individual word. I have been > > informed that the performance of such a search is absolutely terrible. > > > > Thanks, > > Shawn > > > -- Sincerely yours Mikhail Khludnev
Re: Docvalue v.s. invert index
Thanks Erick, Shawn and Tomoko for complete answers. If I set both docvalue and indexed "true" in a field, will Solr understand to use which technique for faceting or searching? Or Is there any way to inform Solr to use which technique? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html