#29262: Custom Left Outer Join in Queries
-------------------------------------+-------------------------------------
     Reporter:  Sassan Haradji       |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  ORM Join             |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Thorben Luepkes):

 I think I am running into a similar problem here, where I would like to
 append multiple `conditions` onto my join:
 I will try to be precise with this as much as possible.
 Imagine these two models. whose relation was set up years ago:


 {{{

 class Event(models.Model):
     instance_created_date = models.DateTimeField(auto_now_add=True)
     car = models.ForeignKey(Car, on_delete=models.CASCADE,
 related_name="car_events")
     ...
     a lot of normal text fields here, but they dont matter for this
 problem.
 }}}


 and


 {{{

 class Car(models.Model):
     a lot of text fields here, but they dont matter for this problem.
     hide_from_company_search = models.BooleanField(default=False)
     images = models.ManyToManyField(Image, through=CarImage)
 }}}



 Lets say I want to query the amount of events for a given car:
 {{{
 def get_car_events_qs() -> QuerySet:
     six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
 microsecond=0) - timedelta(days=6))
     cars = Car.objects.prefetch_related(
         'car_events',
     ).filter(
         some_conditions_on_fields=False,
     ).annotate(
         num_car_events=Count(
             'car_events',
             filter=Q(car_events__instance_created_date__gt=six_days_ago),
 distinct=True)
     )

     return cars
 }}}


 The really tricky part for this is the performance of the query: `Cars`
 has `450.000` entries, and `Events` has `156.850.048`. All fields that I
 am using to query are indexed. The query takes around 4 minutes to
 complete, depending on the db load. It took 18 minutes before adding the
 indicies.

 This above ORM query will result in the following sql:


 {{{
 SELECT
   "core_car"."id",
                 COUNT("analytics_carevent"."id") FILTER (WHERE
 ("analytics_carevent"."event" = 'view'
                         AND "analytics_carevent"."instance_created_date"
 >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
                         AND "analytics_carevent"."instance_created_date" <
 '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
         LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" =
 "analytics_carevent"."car_id")
 WHERE
  ... some conditions that dont matter
 GROUP BY
         "core_car"."id"
 }}}



 I somehow suspect this `FILTER` to be a problem.
 I tried with

 {{{
 .annotate(num_car_events=Count('car_events'))
 }}}

 and moving the `car_events__instance_created_date__gt=six_days_ago` into
 the `filter`:

 {{{
 .filter(some_conditions_on_fields=False,
 car_events__instance_created_date__gt=six_days_ago)
 }}}


 But of course this would filter out Cars with no Events, which is not what
 we want - but it is super fast!
 I fiddled a bit with it in raw sql and came to his nice working example,
 that I now would like to write into ORM, since we dont really want to use
 rawsql. This query takes `2.2s`, which is in our acceptable boundary, but
 faaaaar less than the 18minutes.


 {{{
 SELECT
         "core_car"."id",
         COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
 FROM
         "core_car"
         LEFT JOIN "analytics_carevent" ON ("core_car"."id" =
 "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view'
 AND "analytics_carevent"."instance_created_date" >
 '2022-05-14T00:00:00+02:00'::timestamptz
                 AND "analytics_carevent"."instance_created_date" <=
 '2022-05-15T00:00:00+02:00'::timestamptz)

 WHERE (some conditions that dont matter)
 GROUP BY "core_car"."id";
 }}}



 My question now is:
 How can I make the above query into the ORM?
 I need to put the "filter" or conditions onto the `left join`. If I just
 use `filter()` it will just put it into the `where` clause, which is
 wrong.
 I tried:


 {{{
  two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0,
 microsecond=0) - timedelta(days=2))
     cars = Car.objects.prefetch_related(
         'car_events',
     ).filter(some_filters,)
 }}}

 and

 {{{
 cars =
 
cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id",
 flat=True), car_events__instance_created_date__gt=six_days_ago)
 }}}


 But I dont think this is quite correct. I also need the count of the
 annotation.

 Using Django 4 and latest python release as of this writing. :)

 Thanks a lot!

 **TLDR: Putting a filter or conditions on `LEFT JOIN` in django, instead
 of `queryset.filter()`**

-- 
Ticket URL: <https://code.djangoproject.com/ticket/29262#comment:19>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018114139efa-41cd6312-f731-4538-8350-ac14635fbaee-000000%40eu-central-1.amazonses.com.

Reply via email to