Re: Performance of cross join vs block join

Mikhail Khludnev Fri, 12 Jul 2013 01:59:52 -0700

On Fri, Jul 12, 2013 at 12:19 PM, mihaela olteanu <mihaela...@yahoo.com>wrote:


> Hi Mikhail,
>
> I have used wrong the term block join. When I said block join I was
> referring to a join performed on a single core versus cross join which was
> performed on multiple cores.
> But I saw your benchmark (from cache) and it seems that block join has
> better performance. Is this functionality available on Solr 4.3.1?

nope SOLR-3076 awaits for ages.


> I did not find such examples on Solr's wiki page.
> Does this functionality require a special schema, or a special indexing?

Special indexing - yes.


> How would I need to index the data from my tables? In my case anyway all
> the indices have a common schema since I am using dynamic fields, thus I
> can easily add all documents from all tables in one Solr core, but for each
> document to add a discriminator field.
>
correct. but notion of ' discriminator field' is a little bit different for
blockjoin.


>
> Could you point me to some more documentation?
>

I can recommend only those
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
http://www.youtube.com/watch?v=-OiIlIijWH0


> Thanks in advance,
> Mihaela
>
>
> ________________________________
>  From: Mikhail Khludnev <mkhlud...@griddynamics.com>
> To: solr-user <solr-user@lucene.apache.org>; mihaela olteanu <
> mihaela...@yahoo.com>
> Sent: Thursday, July 11, 2013 2:25 PM
> Subject: Re: Performance of cross join vs block join
>
>
> Mihaela,
>
> For me it's reasonable that single core join takes the same time as cross
> core one. I just can't see which gain can be obtained from in the former
> case.
> I hardly able to comment join code, I looked into, it's not trivial, at
> least. With block join it doesn't need to obtain parentId term
> values/numbers and lookup parents by them. Both of these actions are
> expensive. Also blockjoin works as an iterator, but join need to allocate
> memory for parents bitset and populate it out of order that impacts
> scalability.
> Also in None scoring mode BJQ don't need to walk through all children, but
> only hits first. Also, nice feature is 'both side leapfrog' if you have a
> highly restrictive filter/query intersects with BJQ, it allows to skip many
> parents and children as well, that's not possible in Join, which has fairly
> 'full-scan' nature.
> Main performance factor for Join is number of child docs.
> I'm not sure I got all your questions, please specify them in more details,
> if something is still unclear.
> have you saw my benchmark
> http://blog.griddynamics.com/2012/08/block-join-query-performs.html ?
>
>
>
> On Thu, Jul 11, 2013 at 1:52 PM, mihaela olteanu <mihaela...@yahoo.com
> >wrote:
>
> > Hello,
> >
> > Does anyone know about some measurements in terms of performance for
> cross
> > joins compared to joins inside a single index?
> >
> > Is it faster the join inside a single index that stores all documents of
> > various types (from parent table or from children tables)with a
> > discriminator field compared to the cross join (basically in this case
> each
> > document type resides in its own index)?
> >
> > I have performed some tests but to me it seems that having a join in a
> > single index (bigger index) does not add too much speed improvements
> > compared to cross joins.
> >
> > Why a block join would be faster than a cross join if this is the case?
> > What are the variables that count when trying to improve the query
> > execution time?
> >
> > Thanks!
> > Mihaela
>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

 <http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Re: Performance of cross join vs block join

Reply via email to