Tom, Feel free to find my benchmark results for two alternative joining approaches. http://blog.griddynamics.com/2012/08/block-join-query-performs.html
Regards On Thu, Aug 23, 2012 at 4:40 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Tom: > > I thin my comments were that grouping on a field where there was > a unique value _per document_ chewed up a lot of resources. > Conceptually, there's a bucket for each unique group value. And > grouping on a file path is just asking for trouble. > > But the memory used for grouping should max as a function of > the unique values in the grouped field. > > Best > Erick > > On Wed, Aug 22, 2012 at 11:32 PM, Lance Norskog <goks...@gmail.com> wrote: > > Yes, distributed grouping works, but grouping takes a lot of > > resources. If you can avoid in distributed mode, so much the better. > > > > On Wed, Aug 22, 2012 at 3:35 PM, Tom Burton-West <tburt...@umich.edu> > wrote: > >> Thanks Tirthankar, > >> > >> So the issue in memory use for sorting. I'm not sure I understand how > >> sorting of grouping fields is involved with the defaults and field > >> collapsing, since the default sorts by relevance not grouping field. On > >> the other hand I don't know much about how field collapsing is > implemented. > >> > >> So far the few tests I've made haven't revealed any memory problems. We > >> are using very small string fields for grouping and I think that we > >> probably only have a couple of cases where we are grouping more than a > few > >> thousand docs. I will try to find a query with a lot of docs per group > >> and take a look at the memory use using JConsole. > >> > >> Tom > >> > >> > >> On Wed, Aug 22, 2012 at 4:02 PM, Tirthankar Chatterjee < > >> tchatter...@commvault.com> wrote: > >> > >>> Hi Tom,**** > >>> > >>> We had an issue where we are keeping millions of docs in a single node > and > >>> we were trying to group them on a string field which is nothing but > full > >>> file path… that caused SOLR to go out of memory…**** > >>> > >>> ** ** > >>> > >>> Erick has explained nicely in the thread as to why it won’t work and I > had > >>> to find another way of architecting it. **** > >>> > >>> ** ** > >>> > >>> How do you think this is different in your case. If you want to group > by a > >>> string field with thousands of similar entries I am guessing you will > face > >>> the same issue. **** > >>> > >>> ** ** > >>> > >>> Thanks,**** > >>> > >>> Tirthankar**** > >>> ***************************Legal Disclaimer*************************** > >>> "This communication may contain confidential and privileged material > for > >>> the > >>> sole use of the intended recipient. Any unauthorized review, use or > >>> distribution > >>> by others is strictly prohibited. If you have received the message in > >>> error, > >>> please advise the sender by reply email and delete the message. Thank > you." > >>> ********************************************************************** > >>> > > > > > > > > -- > > Lance Norskog > > goks...@gmail.com > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>