Hey,
I'm trying to filter on a subquery expressions using Exists and I ran into 
a weird issue. 
According to the docs 
<https://docs.djangoproject.com/en/2.2/ref/models/expressions/#filtering-on-a-subquery-expression>,
 
to filter on subquery you first need to annotate it, and than filter on it. 
This causes the subquery to appear both in the WHERE and in the SELECT 
which can lead to poor performance.

For example, my queryset:

Payment
.annotate(reconciled=Exists(
    Record
    .objects
    .filter(payment_id=OuterRef('pk'))
    .values_list('payment_id')
))
.filter(reconciled=True)

In the resulting SQL, the subquery is used in WHERE and appears in the 
SELECT as well:

SELECT 
    "payment"."id",
    -- ... many more fields
   



* EXISTS (        SELECT U0."id"        FROM "record" U0        WHERE 
U0."payment_id" = ("payment"."id")    ) AS "reconciled"*
FROM 
    "payment"
WHERE 




*    EXISTS (        SELECT U0."id"        FROM "leumicard_record" U0      
  WHERE U0."payment_id" = ("payment"."id")    ) = False*


This causes the "exists" query to be evaluated twice:

 Seq Scan on payment  (cost=0.00..63982927.73 rows=4772627 width=155)
   Filter: (NOT (SubPlan 2))
   SubPlan 1
     ->  Index Only Scan using record_payment_id_058ca67f on 
leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           Index Cond: (payment_id = payment.id)
   SubPlan 2
     ->  Index Only Scan using record_payment_id_058ca67f on 
leumicard_record u0_1  (cost=0.42..8.47 rows=2 width=0)
           Index Cond: (payment_id = payment.id)

This is most likely because the subquery is "annotated" and so the ORM adds 
it to the values list. 

I tried to `defer` the field, and got the following error:

django.core.exceptions.FieldDoesNotExist: Payment has no field named 
'reconciled'

To make sure that the query can be executed as intended without the 
subquery in the select list, I tried to explicitly list "values_list":


Payment
.annotate(reconciled=Exists(
    Record
    .objects
    .filter(payment_id=OuterRef('pk'))
    .values_list('payment_id')
))
.filter(reconciled=True)
.values_list('pk')

Query is now as intended:

SELECT 
    "payment"."id" 
FROM 
    "payment" 
WHERE 
    EXISTS(
        SELECT U0."payment_id" 
        FROM "record" U0 
        WHERE U0."payment_id" = ("payment"."id")
    ) = False

Plan is cheaper:


 Seq Scan on payment  (cost=0.00..42747797.33 rows=4772627 width=4)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Index Only Scan using record_payment_id_058ca67f on 
leumicard_record u0  (cost=0.42..8.47 rows=2 width=0)
           Index Cond: (payment_id = payment.id)

My questions are:

*-  Am I doing it right?*
*- Is this intended?*
*- What is the best way to exclude the subquery from the query SELECT list?*

Thanks,
*Haki Benita. *

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/b8213fe1-fbb0-4bf3-87fd-3a6f323456ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to