On Fri, Jul 12, 2013 at 12:19 PM, mihaela olteanu <mihaela...@yahoo.com>wrote:
> Hi Mikhail, > > I have used wrong the term block join. When I said block join I was > referring to a join performed on a single core versus cross join which was > performed on multiple cores. > But I saw your benchmark (from cache) and it seems that block join has > better performance. Is this functionality available on Solr 4.3.1? nope SOLR-3076 awaits for ages. > I did not find such examples on Solr's wiki page. > Does this functionality require a special schema, or a special indexing? Special indexing - yes. > How would I need to index the data from my tables? In my case anyway all > the indices have a common schema since I am using dynamic fields, thus I > can easily add all documents from all tables in one Solr core, but for each > document to add a discriminator field. > correct. but notion of ' discriminator field' is a little bit different for blockjoin. > > Could you point me to some more documentation? > I can recommend only those http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html http://www.youtube.com/watch?v=-OiIlIijWH0 > Thanks in advance, > Mihaela > > > ________________________________ > From: Mikhail Khludnev <mkhlud...@griddynamics.com> > To: solr-user <solr-user@lucene.apache.org>; mihaela olteanu < > mihaela...@yahoo.com> > Sent: Thursday, July 11, 2013 2:25 PM > Subject: Re: Performance of cross join vs block join > > > Mihaela, > > For me it's reasonable that single core join takes the same time as cross > core one. I just can't see which gain can be obtained from in the former > case. > I hardly able to comment join code, I looked into, it's not trivial, at > least. With block join it doesn't need to obtain parentId term > values/numbers and lookup parents by them. Both of these actions are > expensive. Also blockjoin works as an iterator, but join need to allocate > memory for parents bitset and populate it out of order that impacts > scalability. > Also in None scoring mode BJQ don't need to walk through all children, but > only hits first. Also, nice feature is 'both side leapfrog' if you have a > highly restrictive filter/query intersects with BJQ, it allows to skip many > parents and children as well, that's not possible in Join, which has fairly > 'full-scan' nature. > Main performance factor for Join is number of child docs. > I'm not sure I got all your questions, please specify them in more details, > if something is still unclear. > have you saw my benchmark > http://blog.griddynamics.com/2012/08/block-join-query-performs.html ? > > > > On Thu, Jul 11, 2013 at 1:52 PM, mihaela olteanu <mihaela...@yahoo.com > >wrote: > > > Hello, > > > > Does anyone know about some measurements in terms of performance for > cross > > joins compared to joins inside a single index? > > > > Is it faster the join inside a single index that stores all documents of > > various types (from parent table or from children tables)with a > > discriminator field compared to the cross join (basically in this case > each > > document type resides in its own index)? > > > > I have performed some tests but to me it seems that having a join in a > > single index (bigger index) does not add too much speed improvements > > compared to cross joins. > > > > Why a block join would be faster than a cross join if this is the case? > > What are the variables that count when trying to improve the query > > execution time? > > > > Thanks! > > Mihaela > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>