On Thu, 2007-03-15 at 08:31 -0500, Tim Chase wrote:
> I'm trying to inject a table into my query, but need to do an 
> INNER JOIN with a custom ON clause.  This is due to the volumes 
> of data in the joined table.  Without being able to do something like
> 
> 
>       ... INNER JOIN tblBig b ON
>       (b.other_id = other.id AND
>       b.foo = 42 AND
>       b.bar = 'zip'
>       )
> 
> the cartesian product makes the join huge.  I've performance 
> tested the resulting queries at the psql prompt, and the above 
> version runs in under 2 seconds.  Doing the logical equiv of
> 
> 
>       ..., tblBig b
>       WHERE
>       b.other_id = other.id AND
>       b.foo = 42 AND
>       b.bar = 'zip'
> 
> takes upwards of 10-15 minutes.  My figuring is that it's doing a 
> full cartesian product of the two tables (producing billions of 
> rows) and then filtering that result.  

Something is odd here.  Any respectable database server knows that "FROM
tableA, tableB where tableA.id = tableB.id" is the same as "FROM tableA
INNER JOIN tableB ON (id)". In fact, the former form is often an easier
one for them to work with internally because the optimiser has a bit
more flexibility about the order in which to construct the joins.
PostgreSQL, for example, has a setting for how many inner joins it is
permitted to rewrite in the earlier form to help the optimiser. But it
will not construct the full outer product at any point, don't worry. I
would be looking to ensure the right indexes are on the right columns,
etc -- basically staring at "EXPLAIN ..." output for a while -- but
that's not the issue you are asking about here.

> By filtering in the ON 
> clause, I can reduce the impact of having 600,000+ rows in tblBig 
> because it only does the product of the outside stuff with the 
> germane data.
> 
> I'm currently mucking under the hood with a custom object that 
> toys with the query, but am fighting with it to a degree that 
> someone else may have already done this or would have hints on 
> best practices.

Trying to tweak current QuerySet objects like this is a very hard at the
moment -- as you've no doubt discovered -- because internally they just
push string fragments around. I am in the middle of rewriting QuerySet
to make exactly this type of Query manipulation easier. That is targeted
for pre-1.0 (and in fact, very shortly after 0.96 because it is a
preliminary step for a bunch of other improvements and enhancements).

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to