#33751: Multiple Conditions on LEFT JOIN - FilteredRelations not working?
-----------------------------------+--------------------------------------
Reporter: Thorben Luepkes | Owner: nobody
Type: Uncategorized | Status: new
Component: Utilities | Version: 4.0
Severity: Normal | Resolution:
Keywords: orm, django, join | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------------+--------------------------------------
Description changed by Thorben Luepkes:
Old description:
> I think I am running into a similar problem here, where I would like to
> append multiple `conditions` onto my join:
> I will try to be precise with this as much as possible.
> Imagine these two models. whose relation was set up years ago:
>
> {{{
>
> class Event(models.Model):
> instance_created_date = models.DateTimeField(auto_now_add=True)
> car = models.ForeignKey(Car, on_delete=models.CASCADE,
> related_name="car_events")
> ...
> a lot of normal text fields here, but they dont matter for this
> problem.
> }}}
>
> and
>
> {{{
>
> class Car(models.Model):
> a lot of text fields here, but they dont matter for this problem.
> hide_from_company_search = models.BooleanField(default=False)
> images = models.ManyToManyField(Image, through=CarImage)
> }}}
>
>
> Lets say I want to query the amount of events for a given car:
> {{{
> def get_car_events_qs() -> QuerySet:
> six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
> microsecond=0) - timedelta(days=6))
> cars = Car.objects.prefetch_related(
> 'car_events',
> ).filter(
> some_conditions_on_fields=False,
> ).annotate(
> num_car_events=Count(
> 'car_events',
> filter=Q(car_events__instance_created_date__gt=six_days_ago),
> distinct=True)
> )
>
> return cars
> }}}
>
> The really tricky part for this is the performance of the query: `Cars`
> has `450.000` entries, and `Events` has `156.850.048`. All fields that I
> am using to query are indexed. The query takes around 4 minutes to
> complete, depending on the db load. It took 18 minutes before adding the
> indicies.
>
> This above ORM query will result in the following sql:
>
> {{{
> SELECT
> "core_car"."id",
> COUNT("analytics_carevent"."id") FILTER (WHERE
> ("analytics_carevent"."event" = 'view'
> AND "analytics_carevent"."instance_created_date"
> >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
> AND "analytics_carevent"."instance_created_date"
> < '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
> LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
> "analytics_carevent"."car_id")
> WHERE
> ... some conditions that dont matter
> GROUP BY
> "core_car"."id"
> }}}
>
>
> I somehow suspect this `FILTER` to be a problem.
> I tried with
>
> {{{
> .annotate(num_car_events=Count('car_events'))
> }}}
>
> and moving the `car_events__instance_created_date__gt=six_days_ago` into
> the `filter`:
>
> {{{
> .filter(some_conditions_on_fields=False,
> car_events__instance_created_date__gt=six_days_ago)
> }}}
>
> But of course this would filter out Cars with no Events, which is not
> what we want - but it is super fast!
> I fiddled a bit with it in raw sql and came to his nice working example,
> that I now would like to write into ORM, since we dont really want to use
> rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
> faaaaar less than the 18minutes.
>
> {{{
> SELECT
> "core_car"."id",
> COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
> FROM
> "core_car"
> LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
> "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
> AND "analytics_carevent"."instance_created_date" >
> '2022-05-14T00:00:00+02:00'::timestamptz
> AND "analytics_carevent"."instance_created_date" <=
> '2022-05-15T00:00:00+02:00'::timestamptz)
>
> WHERE (some conditions that dont matter)
> GROUP BY "core_car"."id";
> }}}
>
>
> My question now is:
> How can I make the above query into the ORM?
> I need to put the "filter" or conditions onto the `left join`. If I just
> use `filter()` it will just put it into the `where` clause, which is
> wrong.
> I tried:
>
> {{{
> two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
> microsecond=0) - timedelta(days=2))
> cars = Car.objects.prefetch_related(
> 'car_events',
> ).filter(some_filters,)
> }}}
>
> and
>
> {{{
> cars =
> cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
> flat=True), car_events__instance_created_date__gt=six_days_ago)
> }}}
>
> But I dont think this is quite correct. I also need the count of the
> annotation.
>
> Using Django 4 and latest python release as of this writing. :)
>
> Thanks a lot!
>
> **TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
> of `queryset.filter()`**
New description:
I think I am running into a similar problem here, where I would like to
append multiple `conditions` onto my join:
I will try to be precise with this as much as possible.
Imagine these two models. whose relation was set up years ago:
{{{
class Event(models.Model):
instance_created_date = models.DateTimeField(auto_now_add=True)
car = models.ForeignKey(Car, on_delete=models.CASCADE,
related_name="car_events")
...
a lot of normal text fields here, but they dont matter for this
problem.
}}}
and
{{{
class Car(models.Model):
a lot of text fields here, but they dont matter for this problem.
hide_from_company_search = models.BooleanField(default=False)
images = models.ManyToManyField(Image, through=CarImage)
}}}
Lets say I want to query the amount of events for a given car:
{{{
def get_car_events_qs() -> QuerySet:
six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
microsecond=0) - timedelta(days=6))
cars = Car.objects.prefetch_related(
'car_events',
).filter(
some_conditions_on_fields=False,
).annotate(
num_car_events=Count(
'car_events',
filter=Q(car_events__instance_created_date__gt=six_days_ago),
distinct=True)
)
return cars
}}}
The really tricky part for this is the performance of the query: `Cars`
has `450.000` entries, and `Events` has `156.850.048`. All fields that I
am using to query are indexed. The query takes around 4 minutes to
complete, depending on the db load. It took 18 minutes before adding the
indicies.
This above ORM query will result in the following sql:
{{{
SELECT
"core_car"."id",
COUNT("analytics_carevent"."id") FILTER (WHERE
("analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date"
>= '2022-05-10T07:45:16.672279+00:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <
'2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
"analytics_carevent"."car_id")
WHERE
... some conditions that dont matter
GROUP BY
"core_car"."id"
}}}
I somehow suspect this `FILTER` to be a problem.
I tried with
{{{
.annotate(num_car_events=Count('car_events'))
}}}
and moving the `car_events__instance_created_date__gt=six_days_ago` into
the `filter`:
{{{
.filter(some_conditions_on_fields=False,
car_events__instance_created_date__gt=six_days_ago)
}}}
But of course this would filter out Cars with no Events, which is not what
we want - but it is super fast!
I fiddled a bit with it in raw sql and came to his nice working example,
that I now would like to write into ORM, since we dont really want to use
rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
faaaaar less than the 18minutes.
{{{
SELECT
"core_car"."id",
COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
FROM
"core_car"
LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
"analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date" >
'2022-05-14T00:00:00+02:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <=
'2022-05-15T00:00:00+02:00'::timestamptz)
WHERE (some conditions that dont matter)
GROUP BY "core_car"."id";
}}}
My question now is:
How can I make the above query into the ORM?
I need to put the "filter" or conditions onto the `left join`. If I just
use `filter()` it will just put it into the `where` clause, which is
wrong.
I tried:
{{{
two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
microsecond=0) - timedelta(days=2))
cars = Car.objects.prefetch_related(
'car_events',
).filter(some_filters,)
}}}
and
{{{
cars =
cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
flat=True), car_events__instance_created_date__gt=six_days_ago)
}}}
But I dont think this is quite correct. I also need the count of the
annotation.
I also tried with
{{{
cars = cars.annotate(events=FilteredRelation('car_events',
condition=Q(car_events__car_id__in=ads.values_list("id",
flat=True)))).filter(events__instance_created_date__gt=six_days_ago)
}}}
But this results in an inner join, which i dont think is wanted here
Using Django 4 and latest python release as of this writing. :)
Thanks a lot!
**TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
of `queryset.filter()`**
--
--
Ticket URL: <https://code.djangoproject.com/ticket/33751#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107018114174c13-9fd2eda6-ff31-4ce8-afac-987a8bf903d0-000000%40eu-central-1.amazonses.com.