Re: Feature Request (have sample implemenation): ALL subquery

David Sanders Wed, 24 Aug 2022 09:26:32 -0700

I'm no expert on how PG optimises queries but NOT EXISTS appears to produce 
the desired execution plan.


My original rationale was to improve the readability and "solution 
accessibility" for lack of a better term. This all came about because a 
colleague couldn't figure out how to do it as a Django query so resorted to 
app-level rigmarole & multiple queries (our use case involved multiple 
many-to-many's). I showed them the ~Exists() approach then the All() 
approach and they picked All() because of the natural readability 🤷‍♂️.

This is just something I quickly whipped up (ignore the execution time when 
I ran them through pgbench they were roughly the same on average):

# update pgbench_accounts set abalance = 10 where bid in (4, 7);
UPDATE 200000

# explain analyze SELECT * FROM pgbench_branches b WHERE 't' = ALL (SELECT 
abalance > 0 FROM pgbench_accounts a WHERE b.bid = a.bid );
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on pgbench_branches b  (cost=0.00..163366.10 rows=5 width=364) 
(actual time=486.614..602.467 rows=2 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 8
   SubPlan 1
     ->  Seq Scan on pgbench_accounts a  (cost=0.00..32423.00 rows=100000 
width=1) (actual time=46.929..58.739 rows=20001 loops=10)
           Filter: (b.bid = bid)
           Rows Removed by Filter: 430258
 Planning Time: 0.360 ms
 Execution Time: 602.516 ms
(9 rows)

# explain analyze SELECT * FROM pgbench_branches b WHERE NOT EXISTS (SELECT 
1 FROM pgbench_accounts a WHERE b.bid = a.bid and abalance = 0);
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=0.00..32183.84 rows=1 width=364) (actual 
time=753.089..951.244 rows=2 loops=1)
   Join Filter: (b.bid = a.bid)
   Rows Removed by Join Filter: 3702576
   ->  Seq Scan on pgbench_branches b  (cost=0.00..1.10 rows=10 width=364) 
(actual time=0.008..0.013 rows=10 loops=1)
   ->  Seq Scan on pgbench_accounts a  (cost=0.00..32173.00 rows=799867 
width=4) (actual time=0.002..68.581 rows=370258 loops=10)
         Filter: (abalance = 0)
         Rows Removed by Filter: 80000
 Planning Time: 0.206 ms
 Execution Time: 951.263 ms
(9 rows)


On Thursday, 25 August 2022 at 00:41:43 UTC+10 charettes wrote:

> Hello David,
>
> Do you know if ALL provides any performance benefits over NOT EXISTS? 
> Given we already support
>
> Employee.objects.filter(
>     ~Exists(Employee.objects.filter(
>          manager=OuterRef("pk"),
>          join_date__gt=OuterRef("join_date")
>      )
> )
>
> If it's not the case I'm not sure adding support for ALL to core is worth 
> it, particularly because we need to emulate it on SQLite.
>
> If we were to add it I think All(lookup, subquery) would make the most 
> sense as it still looks a bit like how you'd do things In Python and it's 
> relatively easy to translate to NOT EXISTS as you basically take the lookup 
> and the selected value and turn it into 
> ~Exists(subquery.exclude(Lookup(OuterRef(field, value))).
>
> Employee.objects.filter(
>     All(
>        "join_date__lte", Employee.objects.filter(
>            manager=OuterRef("pk")
>        ).values("join_date")
>     )
> )
>
> Generally I'm +0 on the idea though unless we can prove there are use 
> cases for it that Exists can't solve.
>
> Cheers,
> Simon
>
> Le mercredi 24 août 2022 à 09:53:46 UTC-4, shang.xia...@gmail.com a 
> écrit :
>
>> Hi folks!
>>
>> I have a piece of low hanging fruit I'd like to suggest and contribute: 
>> ALL subqueries.
>>
>> Here's the PG docs on ALL subqueries 
>> <https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-ALL>
>>  
>> but basically it causes the subquery expression to evaluate to true only if 
>> all rows themselves evaluate to true when compared against the supplied 
>> value & comparison operator.
>>
>> The syntax is:
>>
>> <value> <operator> ALL (<subquery>)
>>
>> I've played around with this idea in my "Stupid Django Tricks" repo 
>> <https://github.com/shangxiao/stupid-django-tricks/tree/master/all_subqueries>
>>  
>> and have taken the liberty of also creating a small patch for Django 
>> <https://github.com/shangxiao/django/commit/ddfecdc0561b299ce834485b19d2fd57bca8d4b3>
>> .
>>
>> There are a couple of points to note:
>>
>> 1. Postgres, MySQL & SQL Server all support ALL subquerying *but 
>> unfortunately SQLite does not*. A cursory search about Oracle suggests 
>> it may support it but couldn't see anything in the official docs.
>>
>> 2. An equivalent expression is to use NOT EXISTS and move the comparison 
>> logic into the subquery & invert it (but leaving any correlating logic 
>> uninverted).
>>
>> A couple of further points on the NOT EXISTS approach:
>>
>>    1. It uses double negatives and hinders the readability.
>>    2. It's not always obvious what the correct way to invert the 
>>    subquery is – I found I had to be careful with filters involving many 
>>    relationship lookups.
>>
>>
>> That being said I found that getting ALL working in Django only took a 
>> few lines. From my initial experimentation I saw 3 possible ways of doing 
>> this:
>>
>> 1. Lookups: A lookup could be defined for each operator supported.
>>
>> eg using Exact:
>>
>> class All(Exact):
>>     lookup_name = "all"
>>     def get_rhs_op(self, connection, rhs):
>>         return connection.operators[super().lookup_name] % f"ALL {rhs}"
>>
>>
>> Parent.objects.annotate(test=Value("test")).filter(test__all=Subquery(subquery))
>>
>> 2. Subclass Subquery
>>
>> This is my favoured approach as it resembles Python's all() builtin. It 
>> uses almost identical code to Exists except for one line that defines the 
>> exists expression. The idea is that subqueries supply a single boolean 
>> column created with annotate:
>>
>> class All(Subquery):
>>     template = "'t' = ALL (%(subquery)s)"
>>     output_field = fields.BooleanField()
>>     # similar methods to Exists omitted
>>
>> Parent.objects.filter(
>>     All(
>>         Child.objects.annotate(is_result=Q(<expression evaluating to 
>> bool>)).filter(something=OuterRef('something').values('is_result')
>>     )
>> )
>>
>> 2. Translate the expression to NOT EXISTS.
>>
>> I'm not entirely sure what the best approach is for this, or even whether 
>> it's possible, but it would need to negate the subquery filtering while 
>> leaving the correlating filter (ie the OuterRef) alone.
>>
>> My naive attempt here works for the simplest of cases but doesn't take 
>> into account anything more complex. 
>> <https://github.com/shangxiao/stupid-django-tricks/blob/master/all_subqueries/models.py#L119>
>>
>>
>> I found this work well for my codebase and I'd love to explore 
>> contributing it back to Django. I'd suggest the subquery subclass as it 
>> results in more readable code – but there is the question about lack of 
>> support for SQLite. One approach would be to note in the documentation on 
>> how to write the equivalent using NOT EXISTS.
>>
>> What do folks think?
>>
>> Cheers,
>> David
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1188e085-b99e-4735-a2b9-0b951ad2abd5n%40googlegroups.com.

Re: Feature Request (have sample implemenation): ALL subquery

Reply via email to