#33751: Multiple Conditions on LEFT JOIN - FilteredRelations not working?
-----------------------------------+--------------------------------------
     Reporter:  Thorben Luepkes    |                    Owner:  nobody
         Type:  Uncategorized      |                   Status:  new
    Component:  Utilities          |                  Version:  4.0
     Severity:  Normal             |               Resolution:
     Keywords:  orm, django, join  |             Triage Stage:  Unreviewed
    Has patch:  0                  |      Needs documentation:  0
  Needs tests:  0                  |  Patch needs improvement:  0
Easy pickings:  0                  |                    UI/UX:  0
-----------------------------------+--------------------------------------
Description changed by Thorben Luepkes:

Old description:

> I think I am running into a similar problem here, where I would like to
> append multiple `conditions` onto my join:
> I will try to be precise with this as much as possible.
> Imagine these two models. whose relation was set up years ago:
>

> {{{
>
> class Event(models.Model):
>     instance_created_date = models.DateTimeField(auto_now_add=True)
>     car = models.ForeignKey(Car, on_delete=models.CASCADE,
> related_name="car_events")
>     ...
>     a lot of normal text fields here, but they dont matter for this
> problem.
> }}}
>

> and
>

> {{{
>
> class Car(models.Model):
>     a lot of text fields here, but they dont matter for this problem.
>     hide_from_company_search = models.BooleanField(default=False)
>     images = models.ManyToManyField(Image, through=CarImage)
> }}}
>

>
> Lets say I want to query the amount of events for a given car:
> {{{
> def get_car_events_qs() -> QuerySet:
>     six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
> microsecond=0) - timedelta(days=6))
>     cars = Car.objects.prefetch_related(
>         'car_events',
>     ).filter(
>         some_conditions_on_fields=False,
>     ).annotate(
>         num_car_events=Count(
>             'car_events',
>             filter=Q(car_events__instance_created_date__gt=six_days_ago),
> distinct=True)
>     )
>
>     return cars
> }}}
>

> The really tricky part for this is the performance of the query: `Cars`
> has `450.000` entries, and `Events` has `156.850.048`. All fields that I
> am using to query are indexed. The query takes around 4 minutes to
> complete, depending on the db load. It took 18 minutes before adding the
> indicies.
>
> This above ORM query will result in the following sql:
>

> {{{
> SELECT
>   "core_car"."id",
>                 COUNT("analytics_carevent"."id") FILTER (WHERE
> ("analytics_carevent"."event" = 'view'
>                         AND "analytics_carevent"."instance_created_date"
> >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
>                         AND "analytics_carevent"."instance_created_date"
> < '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
>         LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
> "analytics_carevent"."car_id")
> WHERE
>  ... some conditions that dont matter
> GROUP BY
>         "core_car"."id"
> }}}
>

>
> I somehow suspect this `FILTER` to be a problem.
> I tried with
>
> {{{
> .annotate(num_car_events=Count('car_events'))
> }}}
>
> and moving the `car_events__instance_created_date__gt=six_days_ago` into
> the `filter`:
>
> {{{
> .filter(some_conditions_on_fields=False,
> car_events__instance_created_date__gt=six_days_ago)
> }}}
>

> But of course this would filter out Cars with no Events, which is not
> what we want - but it is super fast!
> I fiddled a bit with it in raw sql and came to his nice working example,
> that I now would like to write into ORM, since we dont really want to use
> rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
> faaaaar less than the 18minutes.
>

> {{{
> SELECT
>         "core_car"."id",
>         COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
> FROM
>         "core_car"
>         LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
> "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
> AND "analytics_carevent"."instance_created_date" >
> '2022-05-14T00:00:00+02:00'::timestamptz
>                 AND "analytics_carevent"."instance_created_date" <=
> '2022-05-15T00:00:00+02:00'::timestamptz)
>
> WHERE (some conditions that dont matter)
> GROUP BY "core_car"."id";
> }}}
>

>
> My question now is:
> How can I make the above query into the ORM?
> I need to put the "filter" or conditions onto the `left join`. If I just
> use `filter()` it will just put it into the `where` clause, which is
> wrong.
> I tried:
>

> {{{
>  two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
> microsecond=0) - timedelta(days=2))
>     cars = Car.objects.prefetch_related(
>         'car_events',
>     ).filter(some_filters,)
> }}}
>
> and
>
> {{{
> cars =
> cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
> flat=True), car_events__instance_created_date__gt=six_days_ago)
> }}}
>

> But I dont think this is quite correct. I also need the count of the
> annotation.
>
> Using Django 4 and latest python release as of this writing. :)
>
> Thanks a lot!
>
> **TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
> of `queryset.filter()`**

New description:

 I think I am running into a similar problem here, where I would like to
 append multiple `conditions` onto my join:
 I will try to be precise with this as much as possible.
 Imagine these two models. whose relation was set up years ago:


 {{{

 class Event(models.Model):
     instance_created_date = models.DateTimeField(auto_now_add=True)
     car = models.ForeignKey(Car, on_delete=models.CASCADE,
 related_name="car_events")
     ...
     a lot of normal text fields here, but they dont matter for this
 problem.
 }}}


 and


 {{{

 class Car(models.Model):
     a lot of text fields here, but they dont matter for this problem.
     hide_from_company_search = models.BooleanField(default=False)
     images = models.ManyToManyField(Image, through=CarImage)
 }}}



 Lets say I want to query the amount of events for a given car:
 {{{
 def get_car_events_qs() -> QuerySet:
     six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
 microsecond=0) - timedelta(days=6))
     cars = Car.objects.prefetch_related(
         'car_events',
     ).filter(
         some_conditions_on_fields=False,
     ).annotate(
         num_car_events=Count(
             'car_events',
             filter=Q(car_events__instance_created_date__gt=six_days_ago),
 distinct=True)
     )

     return cars
 }}}


 The really tricky part for this is the performance of the query: `Cars`
 has `450.000` entries, and `Events` has `156.850.048`. All fields that I
 am using to query are indexed. The query takes around 4 minutes to
 complete, depending on the db load. It took 18 minutes before adding the
 indicies.

 This above ORM query will result in the following sql:


 {{{
 SELECT
   "core_car"."id",
                 COUNT("analytics_carevent"."id") FILTER (WHERE
 ("analytics_carevent"."event" = 'view'
                         AND "analytics_carevent"."instance_created_date"
 >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
                         AND "analytics_carevent"."instance_created_date" <
 '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
         LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
 "analytics_carevent"."car_id")
 WHERE
  ... some conditions that dont matter
 GROUP BY
         "core_car"."id"
 }}}



 I somehow suspect this `FILTER` to be a problem.
 I tried with

 {{{
 .annotate(num_car_events=Count('car_events'))
 }}}

 and moving the `car_events__instance_created_date__gt=six_days_ago` into
 the `filter`:

 {{{
 .filter(some_conditions_on_fields=False,
 car_events__instance_created_date__gt=six_days_ago)
 }}}


 But of course this would filter out Cars with no Events, which is not what
 we want - but it is super fast!
 I fiddled a bit with it in raw sql and came to his nice working example,
 that I now would like to write into ORM, since we dont really want to use
 rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
 faaaaar less than the 18minutes.


 {{{
 SELECT
         "core_car"."id",
         COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
 FROM
         "core_car"
         LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
 "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
 AND "analytics_carevent"."instance_created_date" >
 '2022-05-14T00:00:00+02:00'::timestamptz
                 AND "analytics_carevent"."instance_created_date" <=
 '2022-05-15T00:00:00+02:00'::timestamptz)

 WHERE (some conditions that dont matter)
 GROUP BY "core_car"."id";
 }}}



 My question now is:
 How can I make the above query into the ORM?
 I need to put the "filter" or conditions onto the `left join`. If I just
 use `filter()` it will just put it into the `where` clause, which is
 wrong.
 I tried:


 {{{
  two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
 microsecond=0) - timedelta(days=2))
     cars = Car.objects.prefetch_related(
         'car_events',
     ).filter(some_filters,)
 }}}

 and

 {{{
 cars =
 
cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
 flat=True), car_events__instance_created_date__gt=six_days_ago)
 }}}


 But I dont think this is quite correct. I also need the count of the
 annotation.
 I also tried with

 {{{
 cars = cars.annotate(events=FilteredRelation('car_events',
 condition=Q(car_events__car_id__in=ads.values_list("id",
 flat=True)))).filter(events__instance_created_date__gt=six_days_ago)
 }}}

 But this results in an inner join, which i dont think is wanted here

 Using Django 4 and latest python release as of this writing. :)

 Thanks a lot!

 **TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
 of `queryset.filter()`**

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33751#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018114174c13-9fd2eda6-ff31-4ce8-afac-987a8bf903d0-000000%40eu-central-1.amazonses.com.

Reply via email to