Re: "Unify by values" setting in Oracle's base.py

charettes Mon, 12 Apr 2021 07:48:56 -0700

Hello Nigel,

> I wonder if it would be possible to catch cases where a bind variable 
would normally be included in both the SELECT list and the GROUP BY.


It should be possible at the compiler level which is overriddable by 
backend (e.g. here's how it's done for a different purpose on MySQL backend 
[0]).

Thinking more about it while writing this response I do wonder if changing 
BaseExpression.get_group_by_cols to return Ref(alias) when an alias is 
specified could make things easier[1].

diff --git a/django/db/models/expressions.py 
b/django/db/models/expressions.py
index bdd23617ac..8dbfaa13fe 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -348,6 +348,8 @@ def copy(self):

     def get_group_by_cols(self, alias=None):
         if not self.contains_aggregate:
+            if alias:
+                return [Ref(alias, self)]
             return [self]
         cols = []
         for source in self.get_source_expressions():

Taking the ticket's query as an example

Book.objects.annotate(
    discount_price=F('price') * 0.75
).values(
    'discount_price'
).annotate(sum_discount=Sum('price'))

It should change its generated SQL from

SELECT (book.price * :arg0) discount_price, SUM(book.price) sum_discount
FROM book
GROUP BY (book.price * :arg1)

to

SELECT (book.price * :arg0) discount_price, SUM(book.price) sum_discount
FROM book
GROUP BY discount_price

Which circumvents the reported issue.

I'm not sure Oracle supports or require a subquery pushdown as you 
demonstrated in your response. If it does require adjustment though they'll 
be way easier to make if a proper list of references need to be pushed down 
is already provided by the expression layer.

I cannot commit to writing a full patch myself but hopefully this gives you 
enough pointers to give it a stab yourself.

FWIW the queries, aggregation, and aggregation_regress parts of the suite 
all pass with the proposed patch on both SQLite and PostgreSQL. I don't 
have a local Oracle setup anymore but I spun up a PR to give our CI a run 
against it[2].

Cheers,
Simon

[0] 
https://github.com/django/django/blob/e4430f22c8e3d29ce5d9d0263fba57121938d06d/django/db/backends/mysql/compiler.py#L5
[1] 
https://github.com/django/django/blob/e4430f22c8e3d29ce5d9d0263fba57121938d06d/django/db/models/expressions.py#L349-L351
[2] https://github.com/django/django/pull/14251
Le lundi 12 avril 2021 à 07:15:25 UTC-4, NPB a écrit :

> Hi Simon,
>
> Thanks for tracking that down.  I see - an interesting problem that lacks 
> an obvious solution. Nevertheless, there is a generic way to deal with this 
> and it will optimize OK in Oracle (at least in the vast majority of cases). 
>
> select x, sum(sal) from (select :arg0 * deptno as x, sal from emp) group 
> by x;
>
> There would be a need to generate suitably obscure column aliases. So - 
> not "x"! Also, more than one alias per SQL statement might be needed too of 
> course.
>
> I don't think this should be used for all cases of aggregation. IMHO it 
> would be best to target it based on how the Django model is used. I wonder 
> if it would be possible to catch cases where a bind variable would normally 
> be included in both the SELECT list and the GROUP BY.
>
> Regards,
> Nigel
>
> On Monday, 5 April 2021 at 15:45:31 UTC+1 charettes wrote:
>
>> Hello Nigel,
>>
>> Through git blame for unify_by_values I figured it was introduced[0] to 
>> deal with an issue during aggregation[1].
>>
>> Mariusz might be able to provide more context here as I don't have much 
>> knowledge around Oracle cursor cache but it seems you'll have to find 
>> another way to address the aggregation issue[1] if you'd like to change 
>> this behaviour.
>>
>> Cheers,
>> Simon
>>
>> [0] 
>> https://github.com/django/django/commit/6dbe56ed7855f34585884a2381fb1cec22ddc824
>> [1] https://code.djangoproject.com/ticket/27632
>>
>> Le samedi 3 avril 2021 à 18:30:28 UTC-4, NPB a écrit :
>>
>>> Hi,
>>>
>>> Can you tell me why *execute* in .../backends/oracle/base.py sets 
>>> *unify_by_values=True* when it calls *_fix_for_params*? It has an 
>>> interesting effect on the Oracle cursor cache.
>>>
>>> For example, if I use a Django model called Logger like this:
>>>
>>> from . import models
>>> ...
>>>    a = models.Logger(t1="1", t2="2", t3="3", i1=1, i2=2, i3=3)
>>>    a.save()
>>>    b = models.Logger(t3="3", t2="2", t1="1", i3=3, i2=2, i1=1)
>>>    b.save()
>>>    c = models.Logger(t1="1", t2="2", t3="3", i1=1,i2=1,i3=3)
>>>    c.save()
>>>    d = models.Logger(t1="1", t2="2", t3="3", i1=3,i2=1,i3=3)
>>>    d.save()
>>>    e = models.Logger(t1="1", t2="1", t3="1", i1=1,i2=1,i3=1)
>>>    e.save()
>>>
>>> It results in the following SQL statements in the Oracle cursor cache:
>>>
>>> INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES 
>>> (:arg0, :arg1, :arg2, :arg3, :arg4, :arg3) RETURNING "POLLS_LOGGER"."ID" 
>>> INTO :arg5
>>> INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES 
>>> (:arg0, :arg0, :arg0, :arg1, :arg1, :arg1) RETURNING "POLLS_LOGGER"."ID" 
>>> INTO :arg2
>>> INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES 
>>> (:arg0, :arg1, :arg2, :arg3, :arg4, :arg5) RETURNING "POLLS_LOGGER"."ID" 
>>> INTO :arg6
>>> INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES 
>>> (:arg0, :arg1, :arg2, :arg3, :arg3, :arg4) RETURNING "POLLS_LOGGER"."ID" 
>>> INTO :arg5
>>>
>>> Bind variable names are re-used and shuffled around if any share the 
>>> same value. This results in multiple SQL IDs.
>>>
>>> If, instead, execute used  *unify_by_values=False*, then the Oracle 
>>> buffer cache would have a single version of the statement irrespective of 
>>> bind values:
>>>
>>> INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES 
>>> (:arg0, :arg1, :arg2, :arg3, :arg4, :arg5) RETURNING "POLLS_LOGGER"."ID" 
>>> INTO :arg6
>>>
>>> On the face of it, the use of unify_by_values=False is more efficient 
>>> (at least from the database's perspective) because there is only one hard 
>>> parse. I guess it doesn't look like a big deal in this example, but I have 
>>> seen cases where the cursor cache has (literally) tens of thousands of 
>>> repeated cursor cache entries where one would have been used if bind names 
>>> were kept consistent and independent of bind value.
>>>
>>> Do you think there is a valid case for this default behavior to be 
>>> changed?
>>>
>>> Regards,
>>> Nigel
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9c688e9f-09d3-4b4d-b058-0fe396598a21n%40googlegroups.com.

Re: "Unify by values" setting in Oracle's base.py

Reply via email to