Billions of documents?

Mikhail Khludnev Thu, 23 Aug 2012 11:10:21 -0700

Tom,
Feel free to find my benchmark results for two alternative joining
approaches.
http://blog.griddynamics.com/2012/08/block-join-query-performs.html


Regards

On Thu, Aug 23, 2012 at 4:40 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Tom:
>
> I thin my comments were that grouping on a field where there was
> a unique value _per document_ chewed up a lot of resources.
> Conceptually, there's a bucket for each unique group value. And
> grouping on a file path is just asking for trouble.
>
> But the memory used for grouping should max as a function of
> the unique values in the grouped field.
>
> Best
> Erick
>
> On Wed, Aug 22, 2012 at 11:32 PM, Lance Norskog <goks...@gmail.com> wrote:
> > Yes, distributed grouping works, but grouping takes a lot of
> > resources. If you can avoid in distributed mode, so much the better.
> >
> > On Wed, Aug 22, 2012 at 3:35 PM, Tom Burton-West <tburt...@umich.edu>
> wrote:
> >> Thanks Tirthankar,
> >>
> >> So the issue in memory use for sorting.  I'm not sure I understand how
> >> sorting of grouping fields  is involved with the defaults and field
> >> collapsing, since the default sorts by relevance not grouping field.  On
> >> the other hand I don't know much about how field collapsing is
> implemented.
> >>
> >> So far the few tests I've made haven't revealed any memory problems.  We
> >> are using very small string fields for grouping and I think that we
> >> probably only have a couple of cases where we are grouping more than a
> few
> >> thousand docs.   I will try to find a query with a lot of docs per group
> >> and take a look at the memory use using JConsole.
> >>
> >> Tom
> >>
> >>
> >> On Wed, Aug 22, 2012 at 4:02 PM, Tirthankar Chatterjee <
> >> tchatter...@commvault.com> wrote:
> >>
> >>>  Hi Tom,****
> >>>
> >>> We had an issue where we are keeping millions of docs in a single node
> and
> >>> we were trying to group them on a string field which is nothing but
> full
> >>> file path… that caused SOLR to go out of memory…****
> >>>
> >>> ** **
> >>>
> >>> Erick has explained nicely in the thread as to why it won’t work and I
> had
> >>> to find another way of architecting it. ****
> >>>
> >>> ** **
> >>>
> >>> How do you think this is different in your case. If you want to group
> by a
> >>> string field with thousands of similar entries I am guessing you will
> face
> >>> the same issue. ****
> >>>
> >>> ** **
> >>>
> >>> Thanks,****
> >>>
> >>> Tirthankar****
> >>> ***************************Legal Disclaimer***************************
> >>> "This communication may contain confidential and privileged material
> for
> >>> the
> >>> sole use of the intended recipient. Any unauthorized review, use or
> >>> distribution
> >>> by others is strictly prohibited. If you have received the message in
> >>> error,
> >>> please advise the sender by reply email and delete the message. Thank
> you."
> >>> **********************************************************************
> >>>
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

Reply via email to