#29262: Custom Left Outer Join in Queries
-------------------------------------+-------------------------------------
Reporter: Sassan Haradji | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: ORM Join | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Thorben Luepkes):
I think I am running into a similar problem here, where I would like to
append multiple `conditions` onto my join:
I will try to be precise with this as much as possible.
Imagine these two models. whose relation was set up years ago:
{{{
class Event(models.Model):
instance_created_date = models.DateTimeField(auto_now_add=True)
car = models.ForeignKey(Car, on_delete=models.CASCADE,
related_name="car_events")
...
a lot of normal text fields here, but they dont matter for this
problem.
}}}
and
{{{
class Car(models.Model):
a lot of text fields here, but they dont matter for this problem.
hide_from_company_search = models.BooleanField(default=False)
images = models.ManyToManyField(Image, through=CarImage)
}}}
Lets say I want to query the amount of events for a given car:
{{{
def get_car_events_qs() -> QuerySet:
six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
microsecond=0) - timedelta(days=6))
cars = Car.objects.prefetch_related(
'car_events',
).filter(
some_conditions_on_fields=False,
).annotate(
num_car_events=Count(
'car_events',
filter=Q(car_events__instance_created_date__gt=six_days_ago),
distinct=True)
)
return cars
}}}
The really tricky part for this is the performance of the query: `Cars`
has `450.000` entries, and `Events` has `156.850.048`. All fields that I
am using to query are indexed. The query takes around 4 minutes to
complete, depending on the db load. It took 18 minutes before adding the
indicies.
This above ORM query will result in the following sql:
{{{
SELECT
"core_car"."id",
COUNT("analytics_carevent"."id") FILTER (WHERE
("analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date"
>= '2022-05-10T07:45:16.672279+00:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <
'2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
"analytics_carevent"."car_id")
WHERE
... some conditions that dont matter
GROUP BY
"core_car"."id"
}}}
I somehow suspect this `FILTER` to be a problem.
I tried with
{{{
.annotate(num_car_events=Count('car_events'))
}}}
and moving the `car_events__instance_created_date__gt=six_days_ago` into
the `filter`:
{{{
.filter(some_conditions_on_fields=False,
car_events__instance_created_date__gt=six_days_ago)
}}}
But of course this would filter out Cars with no Events, which is not what
we want - but it is super fast!
I fiddled a bit with it in raw sql and came to his nice working example,
that I now would like to write into ORM, since we dont really want to use
rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
faaaaar less than the 18minutes.
{{{
SELECT
"core_car"."id",
COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
FROM
"core_car"
LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
"analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date" >
'2022-05-14T00:00:00+02:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <=
'2022-05-15T00:00:00+02:00'::timestamptz)
WHERE (some conditions that dont matter)
GROUP BY "core_car"."id";
}}}
My question now is:
How can I make the above query into the ORM?
I need to put the "filter" or conditions onto the `left join`. If I just
use `filter()` it will just put it into the `where` clause, which is
wrong.
I tried:
{{{
two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
microsecond=0) - timedelta(days=2))
cars = Car.objects.prefetch_related(
'car_events',
).filter(some_filters,)
}}}
and
{{{
cars =
cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
flat=True), car_events__instance_created_date__gt=six_days_ago)
}}}
But I dont think this is quite correct. I also need the count of the
annotation.
Using Django 4 and latest python release as of this writing. :)
Thanks a lot!
**TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
of `queryset.filter()`**
--
Ticket URL: <https://code.djangoproject.com/ticket/29262#comment:19>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107018114139efa-41cd6312-f731-4538-8350-ac14635fbaee-000000%40eu-central-1.amazonses.com.