#33751: Multiple Conditions on LEFT JOIN - FilteredRelations not working?
-------------------------------------+-------------------------------------
               Reporter:  Thorben    |          Owner:  nobody
  Luepkes                            |
                   Type:             |         Status:  new
  Uncategorized                      |
              Component:  Utilities  |        Version:  4.0
               Severity:  Normal     |       Keywords:  orm, django, join
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 I think I am running into a similar problem here, where I would like to
 append multiple `conditions` onto my join:
 I will try to be precise with this as much as possible.
 Imagine these two models. whose relation was set up years ago:


 {{{

 class Event(models.Model):
     instance_created_date = models.DateTimeField(auto_now_add=True)
     car = models.ForeignKey(Car, on_delete=models.CASCADE,
 related_name="car_events")
     ...
     a lot of normal text fields here, but they dont matter for this
 problem.
 }}}


 and


 {{{

 class Car(models.Model):
     a lot of text fields here, but they dont matter for this problem.
     hide_from_company_search = models.BooleanField(default=False)
     images = models.ManyToManyField(Image, through=CarImage)
 }}}



 Lets say I want to query the amount of events for a given car:
 {{{
 def get_car_events_qs() -> QuerySet:
     six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
 microsecond=0) - timedelta(days=6))
     cars = Car.objects.prefetch_related(
         'car_events',
     ).filter(
         some_conditions_on_fields=False,
     ).annotate(
         num_car_events=Count(
             'car_events',
             filter=Q(car_events__instance_created_date__gt=six_days_ago),
 distinct=True)
     )

     return cars
 }}}


 The really tricky part for this is the performance of the query: `Cars`
 has `450.000` entries, and `Events` has `156.850.048`. All fields that I
 am using to query are indexed. The query takes around 4 minutes to
 complete, depending on the db load. It took 18 minutes before adding the
 indicies.

 This above ORM query will result in the following sql:


 {{{
 SELECT
   "core_car"."id",
                 COUNT("analytics_carevent"."id") FILTER (WHERE
 ("analytics_carevent"."event" = 'view'
                         AND "analytics_carevent"."instance_created_date"
 >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
                         AND "analytics_carevent"."instance_created_date" <
 '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
         LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
 "analytics_carevent"."car_id")
 WHERE
  ... some conditions that dont matter
 GROUP BY
         "core_car"."id"
 }}}



 I somehow suspect this `FILTER` to be a problem.
 I tried with

 {{{
 .annotate(num_car_events=Count('car_events'))
 }}}

 and moving the `car_events__instance_created_date__gt=six_days_ago` into
 the `filter`:

 {{{
 .filter(some_conditions_on_fields=False,
 car_events__instance_created_date__gt=six_days_ago)
 }}}


 But of course this would filter out Cars with no Events, which is not what
 we want - but it is super fast!
 I fiddled a bit with it in raw sql and came to his nice working example,
 that I now would like to write into ORM, since we dont really want to use
 rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
 faaaaar less than the 18minutes.


 {{{
 SELECT
         "core_car"."id",
         COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
 FROM
         "core_car"
         LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
 "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
 AND "analytics_carevent"."instance_created_date" >
 '2022-05-14T00:00:00+02:00'::timestamptz
                 AND "analytics_carevent"."instance_created_date" <=
 '2022-05-15T00:00:00+02:00'::timestamptz)

 WHERE (some conditions that dont matter)
 GROUP BY "core_car"."id";
 }}}



 My question now is:
 How can I make the above query into the ORM?
 I need to put the "filter" or conditions onto the `left join`. If I just
 use `filter()` it will just put it into the `where` clause, which is
 wrong.
 I tried:


 {{{
  two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
 microsecond=0) - timedelta(days=2))
     cars = Car.objects.prefetch_related(
         'car_events',
     ).filter(some_filters,)
 }}}

 and

 {{{
 cars =
 
cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
 flat=True), car_events__instance_created_date__gt=six_days_ago)
 }}}


 But I dont think this is quite correct. I also need the count of the
 annotation.

 Using Django 4 and latest python release as of this writing. :)

 Thanks a lot!

 **TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
 of `queryset.filter()`**

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33751>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070181141510a6-a1207766-7247-4ca9-bc88-13379c45066b-000000%40eu-central-1.amazonses.com.

Reply via email to