Re: Results grouping performance with groups.ngroups=true

2018-08-12 Thread Mikhail Khludnev
I mean, you might probably count the same counts by json facet *instead*
slow grouping count, like
https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789



On Sun, Aug 12, 2018 at 7:09 AM SayantiGmail  wrote:

> Hi Mikhail
>
> Even after using json facets latency seems to be high if
> group.ngroups=true.
>
> Regards,
> Sayan
>
> > On 12 Aug 2018, at 02:07, Mikhail Khludnev  wrote:
> >
> > As far as I remember, groups facets can be calculated with json.facets a
> > way faster.
> >
> >> On Sat, Aug 11, 2018 at 1:43 PM SayantiGmail 
> wrote:
> >>
> >> Hi,
> >>
> >> The time taken to group results when the resultset has ~ 200k items is
> >> very high.
> >>
> >> Is there a way to optimize the performance.
> >> The group count and facet count is required.
> >>
> >> Regards,
> >> Sayan
> >>
> >>
> >>
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Docvalue v.s. invert index

2018-08-12 Thread Zahra Aminolroaya
Could we say that docvalue technique is better for sorting and faceting and
inverted index one is better for searching?

Will I lose anything if I only use docvalue?

Does docvalue technique have better performance?





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Docvalue v.s. invert index

2018-08-12 Thread Tomoko Uchida
> Could we say that docvalue technique is better for sorting and faceting
and inverted index one is better for searching?

The short answer is yes.
In addition, there are several special data structures for numeric/date
range/geo spatial search.
https://lucene.apache.org/solr/guide/7_4/field-types-included-with-solr.html

> Will I lose anything if I only use docvalue?
> Does docvalue technique have better performance?

I guess no one can answer to such too general question. If you have any
concrete problems/concerns, you should specify more details of that to get
good advices.

Regards,
Tomoko

2018年8月12日(日) 19:39 Zahra Aminolroaya :

> Could we say that docvalue technique is better for sorting and faceting and
> inverted index one is better for searching?
>
> Will I lose anything if I only use docvalue?
>
> Does docvalue technique have better performance?
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Tomoko Uchida


Re: Docvalue v.s. invert index

2018-08-12 Thread Shawn Heisey

On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote:

Could we say that docvalue technique is better for sorting and faceting and
inverted index one is better for searching?


Yes.  That is how things work.

If docValues do not exist, then an equivalent data structure must be 
built in heap memory *from* the inverted index in order for faceting or 
sorting to take place.  When docValues are present, Solr can just read 
the data directly instead of generating it.  If there is plenty of spare 
memory for the OS to cache data, this is faster.  It also uses less Java 
heap memory.



Will I lose anything if I only use docvalue?

Does docvalue technique have better performance?


From what I understand, it actually is possible to search when 
docValues are present but the inverted index isn't, assuming that what 
you're searching for is the full value of the field, not an individual 
word.  I have been informed that the performance of such a search is 
absolutely terrible.


Thanks,
Shawn



Re: Docvalue v.s. invert index

2018-08-12 Thread Erick Erickson
bq. I have been informed that the performance of such a search is
absolutely terrible.

Yep. Horrible.

These two structures answer completely different questions
indexed - "for this word, what docs contain it in field X?"
DocValues - "for this document, what is the value of field X?"

On my, my usual examples are going out of date. "phone book" and
"dictionary". There used to be, in the old days, these book-like
things that were printed on actual paper and you could use them to
find people's phone number and address, or what the meaning of a word
was. Sggg.

Well, get a paper phone book from somewhere off the shelf and consider
each entry a "document", and the phone number and address the "text"

DocValues answers "for person X, what is the phone number" easily, the
whole thing is alphabetically arranged. But to answer the question
"Who lives on Maple street" you have to read _everything_ in the
entire phone book. Think "table scan".

To answer the question "Who lives on Maple street", you want to index
all the text.

The whole point of docValues was that the structure that was used to
answer the first question was built in the heap at runtime, consuming
memory and CPU cycles. DocValues serialized that structure to disk at
index time where it is
1> easily read as memory pages
2> almost entirely kept in MMapDirectory space, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick


On Sun, Aug 12, 2018 at 8:56 AM, Shawn Heisey  wrote:
> On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote:
>>
>> Could we say that docvalue technique is better for sorting and faceting
>> and
>> inverted index one is better for searching?
>
>
> Yes.  That is how things work.
>
> If docValues do not exist, then an equivalent data structure must be built
> in heap memory *from* the inverted index in order for faceting or sorting to
> take place.  When docValues are present, Solr can just read the data
> directly instead of generating it.  If there is plenty of spare memory for
> the OS to cache data, this is faster.  It also uses less Java heap memory.
>
>> Will I lose anything if I only use docvalue?
>>
>> Does docvalue technique have better performance?
>
>
> From what I understand, it actually is possible to search when docValues are
> present but the inverted index isn't, assuming that what you're searching
> for is the full value of the field, not an individual word.  I have been
> informed that the performance of such a search is absolutely terrible.
>
> Thanks,
> Shawn
>


Re: Docvalue v.s. invert index

2018-08-12 Thread Mikhail Khludnev
My expectation is that scanning Doc Values might be faster than inverted
index if a query matches more than %25 of documents.

On Sun, Aug 12, 2018 at 7:59 PM Erick Erickson 
wrote:

> bq. I have been informed that the performance of such a search is
> absolutely terrible.
>
> Yep. Horrible.
>
> These two structures answer completely different questions
> indexed - "for this word, what docs contain it in field X?"
> DocValues - "for this document, what is the value of field X?"
>
> On my, my usual examples are going out of date. "phone book" and
> "dictionary". There used to be, in the old days, these book-like
> things that were printed on actual paper and you could use them to
> find people's phone number and address, or what the meaning of a word
> was. Sggg.
>
> Well, get a paper phone book from somewhere off the shelf and consider
> each entry a "document", and the phone number and address the "text"
>
> DocValues answers "for person X, what is the phone number" easily, the
> whole thing is alphabetically arranged. But to answer the question
> "Who lives on Maple street" you have to read _everything_ in the
> entire phone book. Think "table scan".
>
> To answer the question "Who lives on Maple street", you want to index
> all the text.
>
> The whole point of docValues was that the structure that was used to
> answer the first question was built in the heap at runtime, consuming
> memory and CPU cycles. DocValues serialized that structure to disk at
> index time where it is
> 1> easily read as memory pages
> 2> almost entirely kept in MMapDirectory space, see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best,
> Erick
>
>
> On Sun, Aug 12, 2018 at 8:56 AM, Shawn Heisey  wrote:
> > On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote:
> >>
> >> Could we say that docvalue technique is better for sorting and faceting
> >> and
> >> inverted index one is better for searching?
> >
> >
> > Yes.  That is how things work.
> >
> > If docValues do not exist, then an equivalent data structure must be
> built
> > in heap memory *from* the inverted index in order for faceting or
> sorting to
> > take place.  When docValues are present, Solr can just read the data
> > directly instead of generating it.  If there is plenty of spare memory
> for
> > the OS to cache data, this is faster.  It also uses less Java heap
> memory.
> >
> >> Will I lose anything if I only use docvalue?
> >>
> >> Does docvalue technique have better performance?
> >
> >
> > From what I understand, it actually is possible to search when docValues
> are
> > present but the inverted index isn't, assuming that what you're searching
> > for is the full value of the field, not an individual word.  I have been
> > informed that the performance of such a search is absolutely terrible.
> >
> > Thanks,
> > Shawn
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Docvalue v.s. invert index

2018-08-12 Thread Zahra Aminolroaya
Thanks Erick, Shawn and Tomoko for complete answers.
 
If I set both docvalue and indexed "true" in a field, will Solr understand
to use which technique for faceting or searching? Or Is there any way to
inform Solr to use which technique?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html