Subquery join support

2020-04-06 Thread Alexandr Tatarinov
Hello folks,

Following the discussion 
https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, I 
would like to suggest adding the ability to join QuerySet with a subquery 
clause to Django.
The benefits are pretty much described and discussed in the linked topic, 
the only one I would like to add (my use case) is the ability to select 
multiple columns from a Subquery without the need to repeat the subquery 
clause, which hurts performance.
As Anssi Kääriäinen mentioned, generic subquery supports seems to be a 
better idea than CTEs, and simple implementation as a starting point will 
allow moving gradually and add RETURNING support later on (for nested 
creates and updates).

I have created a working prototype and successfully using it in one of my 
projects in production. After reading Daniel Miller's 
https://github.com/dimagi/django-cte, I've rewritten the API inspired by 
his ideas, plus tried to keep it as Django-style as possible.
Here is the usage example.

taxes = JoinedSubquery(Tax.objects.all())
products = Product.objects.all()

# .join is probably better be placed in QuerySet class, so it will be 
products.join(taxes, ...)
products_with_taxes = taxes.join(
products,
join_type=LOUTER,  # Optional argument, LOUTER by default
country=taxes.F('product_country'),  # Also Q objects are supported 
type=taxes.F('product_type')
).annotate(
tax_amount=taxes.F('amount')  # .join() and .annotate() calls needn't 
to be chained
)


I am not sure about named joins discussed in the CTE topic
qs.attach(q1=some_qs).filter(a=F('q1__b'))

It seems to be more implicit and harder to implement, requiring changes to 
existing functionality and thus introducing a big patch which I would like 
to avoid.

It is not possible to follow relations in *join.F* expressions, but it is 
easily achievable by firstly fetching needed columns by *annotate*(), so I 
don't see the need to implement it.

Source code is available here 
https://gist.github.com/tatarinov1997/126c49c4b1bb44ae6c57afbc5f43f58d , 
tested with Django 2.2. Feel free to copy and play around.

So, please let me know what do you think about it. I am willing to continue 
working on it, and any help is appreciated.

Thanks,
Alexandr.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8c41e40f-876c-46c0-9e24-09e8c72592ce%40googlegroups.com.


Re: Subquery join support

2020-04-06 Thread Alexandr Tatarinov
Accidentally removed gist, new one is here 
https://gist.github.com/tatarinov1997/068fe786366401ed640dcbbbe5d959e4

On Monday, 6 April 2020 16:34:55 UTC+3, Alexandr Tatarinov wrote:
>
> Hello folks,
>
> Following the discussion 
> https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, I 
> would like to suggest adding the ability to join QuerySet with a subquery 
> clause to Django.
> The benefits are pretty much described and discussed in the linked topic, 
> the only one I would like to add (my use case) is the ability to select 
> multiple columns from a Subquery without the need to repeat the subquery 
> clause, which hurts performance.
> As Anssi Kääriäinen mentioned, generic subquery supports seems to be a 
> better idea than CTEs, and simple implementation as a starting point will 
> allow moving gradually and add RETURNING support later on (for nested 
> creates and updates).
>
> I have created a working prototype and successfully using it in one of my 
> projects in production. After reading Daniel Miller's 
> https://github.com/dimagi/django-cte, I've rewritten the API inspired by 
> his ideas, plus tried to keep it as Django-style as possible.
> Here is the usage example.
>
> taxes = JoinedSubquery(Tax.objects.all())
> products = Product.objects.all()
>
> # .join is probably better be placed in QuerySet class, so it will be 
> products.join(taxes, ...)
> products_with_taxes = taxes.join(
> products,
> join_type=LOUTER,  # Optional argument, LOUTER by default
> country=taxes.F('product_country'),  # Also Q objects are supported 
> type=taxes.F('product_type')
> ).annotate(
> tax_amount=taxes.F('amount')  # .join() and .annotate() calls needn't 
> to be chained
> )
>
>
> I am not sure about named joins discussed in the CTE topic
> qs.attach(q1=some_qs).filter(a=F('q1__b'))
>
> It seems to be more implicit and harder to implement, requiring changes to 
> existing functionality and thus introducing a big patch which I would like 
> to avoid.
>
> It is not possible to follow relations in *join.F* expressions, but it is 
> easily achievable by firstly fetching needed columns by *annotate*(), so 
> I don't see the need to implement it.
>
> Source code is available here 
> https://gist.github.com/tatarinov1997/126c49c4b1bb44ae6c57afbc5f43f58d , 
> tested with Django 2.2. Feel free to copy and play around.
>
> So, please let me know what do you think about it. I am willing to 
> continue working on it, and any help is appreciated.
>
> Thanks,
> Alexandr.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7935e9cd-d0f0-4680-be29-36aacce15004%40googlegroups.com.


Re: Subquery join support

2020-04-07 Thread Alexandr Tatarinov
Thanks, folks, that's a very valuable insight. 
I really love the idea of resolving *OuterRef* via .*join()*, it may help 
overcome problems I had with reusing *Subquery*.
However, I believe *Subquery* exists apart from *QuerySet* for a reason - 
to separate responsibilities, so should we mix responsibilities now with 
Subquery / JoinedSubquery? 

If we reuse Subquery for *QuerySet.join()*, we need a way to refer to the 
columns from that Subquery, and either current *.F(...) *method will be 
added to Subquery, or we need to use named joins ( 
*QuerySet.join(taxes=Subquery(...))* ).
Named joins are problematic, because current implementation of Query can .
*resolve_ref()* (*setup_joins* / *names_to_path*) only to related table OR 
*filtered_relation*. And handling *yet* another case (OR *joined_subquery)* 
will make the branching even more complex.
We could refactor somehow to reduce conditionals usage, but it is not 
something I believe I am capable of.

Yes, it is probably possible to modify existing code, but I am trying to 
avoid adding new paths and conditions here and there - because over time 
such changes will lead to hard to understand codebase no one wants to 
touch. I believe that new behavior is better added via extending, not 
modifications (open-closed principle). There is already some conditional 
handling in ORM (i.e. *filtered_relation*, which can be found in many 
places across whole ORM code, yet it is just one feature (don't get me 
wrong, it's a valuable contribution, I am just worried of a general 
direction), and this way of introducing changes will complicate ORM more 
and more. 

Instead, the suggested solution relies on existing code, but does not 
modify it and is relatively small. 

I will definitely try to create something with *OuterRef*, but I am afraid 
we will end up with: Subquery which actually contains code of two separate 
classes, and several conditional branches (i.e. *if subquery.is_joined*) in 
the QuerySet/Query code.
I will come back with new information as soon as I have time (probably by 
the weekend).

Have a nice day,
Alexandr.

On Tuesday, 7 April 2020 08:39:47 UTC+3, schinckel wrote:
>
> Great work on this, Alexandr. I've been thinking a lot about doing joins 
> in Django to subqueries. As you point out, when you are doing several 
> subquery annotations of the same subquery, just with different columns, 
> that can really hurt performance. Currently, I've been recommending doing a 
> JSON object as the subquery column.
>
> I really like the idea of an explicit join to a subquery, as it can 
> perform significantly better - in the case where most of the rows share the 
> same subquery, the database can perform the subquery once per distinct row 
> joined, rather than once for each row.
>
> I think, instead of using a `JoinedSubquery`, we should just make it that 
> a `Subquery`, containing unresolved OuterRef instances, can also be 
> resolved by doing a `QuerySet.join(Subquery(...))`.
>
> We could also have a lateral argument that allowed a LATERAL JOIN to a 
> subquery, but perhaps that's later down the track.
>
> (Actually, now I think about it, a QuerySet should probably just act as a 
> Subquery when it's used as one, but perhaps that's too magic).
>
> Matt.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/f32053b3-f4e9-4736-ba64-c15d9a62ba25%40googlegroups.com.


Re: Subquery join support

2020-04-12 Thread Alexandr Tatarinov
Hello everyone.
So far, I've managed to implement joins via Subquery and OuterRef with some 
caveats.

To begin with, the API looks like this

Product.objects.create(name='Pizza', country='USA', type='FOOD')
Tax.objects.create(name='USA Food Tax', amount=10, product_country='USA', 
product_type='FOOD')

taxes = Tax.objects.filter(
product_type=OuterRef('type'),
product_country=OuterRef('country')
)
products = Product.objects.annotate(taxes=Subquery(taxes, join=True, 
join_type=LOUTER))
product = products.annotate(tax=F('taxes__amount'), tax_name=F('taxes__name'
))

The generated SQL is
SELECT "core_product"."id", "core_product"."name", "core_product"."country", 
"core_product"."type", "taxes"."__col3" AS "tax", "taxes"."__col4" AS 
"tax_name" 
FROM "core_product" 
LEFT OUTER JOIN 
(
SELECT 
"core_tax"."id", "core_tax"."product_country", "core_tax".
"product_type", "core_tax"."name", "core_tax"."amount", 
"core_tax"."product_country" AS "__col1", "core_tax"."product_type" 
AS "__col2", "core_tax"."amount" AS "__col3", "core_tax"."name" AS "__col4"
FROM "core_tax"
) "taxes" ON ("taxes"."__col1" = ("core_product"."country") AND "taxes".
"__col2" = ("core_product"."type"))

We use *.annotate* for joins to avoid adding new methods, also *.annotate* has 
some useful checks to avoid name collisions.
To distinguish plain subquery from the joined subquery, a *join *flag is 
added. To specify join type, there is an optional *join_type* param to 
Subquery.

Then, one can refer to joined subquery columns by the given alias, but only 
via *F()* expression.
This means it is not possible to write *.values('taxes__name') *directly*, *but 
can be achieved via *.annotate(**tax_name=F('taxes__name'))* 
*.values('tax_name').*
Other examples are *.order_by(F('taxes__name')) *and* 
.values_list(F('taxes__name')), *they work without *.annotate() *but 
require *F() *wrapper. 

The reasons why the string fields and *F* expressions are treated 
separately and differently in various QuerySet methods are still unclear 
for me.
The *F *expression works because the implementation relies on 
*Query.resolve_ref 
*to be called to short-circuit columns from the joined subqueries.
Maybe, we will be able to fix this later or leave as-is with the 
documentation in place to always use *.annotate*.

The implementation is ~170 lines long, but I think some corner cases are 
yet to be discovered.
For now, it would be great to hear your feedback on the API and the feature 
itself, so we can move on with the ticket and discuss the implementation.

Thanks, Alexandr.


On Monday, 6 April 2020 16:34:55 UTC+3, Alexandr Tatarinov wrote:
>
> Hello folks,
>
> Following the discussion 
> https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, I 
> would like to suggest adding the ability to join QuerySet with a subquery 
> clause to Django.
> The benefits are pretty much described and discussed in the linked topic, 
> the only one I would like to add (my use case) is the ability to select 
> multiple columns from a Subquery without the need to repeat the subquery 
> clause, which hurts performance.
> As Anssi Kääriäinen mentioned, generic subquery supports seems to be a 
> better idea than CTEs, and simple implementation as a starting point will 
> allow moving gradually and add RETURNING support later on (for nested 
> creates and updates).
>
> I have created a working prototype and successfully using it in one of my 
> projects in production. After reading Daniel Miller's 
> https://github.com/dimagi/django-cte, I've rewritten the API inspired by 
> his ideas, plus tried to keep it as Django-style as possible.
> Here is the usage example.
>
> taxes = JoinedSubquery(Tax.objects.all())
> products = Product.objects.all()
>
> # .join is probably better be placed in QuerySet class, so it will be 
> products.join(taxes, ...)
> products_with_taxes = taxes.join(
> products,
> join_type=LOUTER,  # Optional argument, LOUTER by default
> country=taxes.F('product_country'),  # Also Q objects are supported 
> type=taxes.F('product_type')
> ).annotate(
> tax_amount=taxes.F('amount')  # .join() and .annotate() calls needn't 
> to 

Re: Subquery join support

2020-04-15 Thread Alexandr Tatarinov
This has not received a lot of feedback so far, so I think some code can 
help. I am not sure what is the next steps; as I've understood, we need 
some kind of consensus to open a ticket.

Please, don't hesitate to express your opinion on the feature and the API, 
as well as the implementation to move this forward.
https://github.com/tatarinov1997/django/pull/1/

Cheers,
Alexandr.



On Sunday, 12 April 2020 19:35:55 UTC+3, Alexandr Tatarinov wrote:
>
> Hello everyone.
> So far, I've managed to implement joins via Subquery and OuterRef with 
> some caveats.
>
> To begin with, the API looks like this
>
> Product.objects.create(name='Pizza', country='USA', type='FOOD')
> Tax.objects.create(name='USA Food Tax', amount=10, product_country='USA', 
> product_type='FOOD')
> 
> taxes = Tax.objects.filter(
> product_type=OuterRef('type'),
> product_country=OuterRef('country')
> )
> products = Product.objects.annotate(taxes=Subquery(taxes, join=True, 
> join_type=LOUTER))
> product = products.annotate(tax=F('taxes__amount'), tax_name=F(
> 'taxes__name'))
>
> The generated SQL is
> SELECT "core_product"."id", "core_product"."name", "core_product".
> "country", "core_product"."type", "taxes"."__col3" AS "tax", "taxes".
> "__col4" AS "tax_name" 
> FROM "core_product" 
> LEFT OUTER JOIN 
> (
> SELECT 
> "core_tax"."id", "core_tax"."product_country", "core_tax".
> "product_type", "core_tax"."name", "core_tax"."amount", 
> "core_tax"."product_country" AS "__col1", "core_tax".
> "product_type" AS "__col2", "core_tax"."amount" AS "__col3", "core_tax".
> "name" AS "__col4"
> FROM "core_tax"
> ) "taxes" ON ("taxes"."__col1" = ("core_product"."country") AND "taxes".
> "__col2" = ("core_product"."type"))
>
> We use *.annotate* for joins to avoid adding new methods, also *.annotate* 
> has 
> some useful checks to avoid name collisions.
> To distinguish plain subquery from the joined subquery, a *join *flag is 
> added. To specify join type, there is an optional *join_type* param to 
> Subquery.
>
> Then, one can refer to joined subquery columns by the given alias, but 
> only via *F()* expression.
> This means it is not possible to write *.values('taxes__name') *directly*, 
> *but can be achieved via *.annotate(**tax_name=F('taxes__name'))* 
> *.values('tax_name').*
> Other examples are *.order_by(F('taxes__name')) *and* 
> .values_list(F('taxes__name')), *they work without *.annotate() *but 
> require *F() *wrapper. 
>
> The reasons why the string fields and *F* expressions are treated 
> separately and differently in various QuerySet methods are still unclear 
> for me.
> The *F *expression works because the implementation relies on 
> *Query.resolve_ref 
> *to be called to short-circuit columns from the joined subqueries.
> Maybe, we will be able to fix this later or leave as-is with the 
> documentation in place to always use *.annotate*.
>
> The implementation is ~170 lines long, but I think some corner cases are 
> yet to be discovered.
> For now, it would be great to hear your feedback on the API and the 
> feature itself, so we can move on with the ticket and discuss the 
> implementation.
>
> Thanks, Alexandr.
>
>
> On Monday, 6 April 2020 16:34:55 UTC+3, Alexandr Tatarinov wrote:
>>
>> Hello folks,
>>
>> Following the discussion 
>> https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, I 
>> would like to suggest adding the ability to join QuerySet with a subquery 
>> clause to Django.
>> The benefits are pretty much described and discussed in the linked topic, 
>> the only one I would like to add (my use case) is the ability to select 
>> multiple columns from a Subquery without the need to repeat the subquery 
>> clause, which hurts performance.
>> As Anssi Kääriäinen mentioned, generic subquery supports seems to be a 
>> better idea than CTEs, and simple implementation as a starting point will 
>> allow moving gradually and add RETURNING support later on (for nested 
>> creates and updates).
>>
>> I have created a worki

Re: The blacklist / master issue

2020-06-21 Thread Alexandr Tatarinov
I would like to share this article 
 which 
has pretty compelling arguments, especially regarding the feelings (point 
4).

On Monday, 15 June 2020 19:28:23 UTC+3, Tom Carrick wrote:
>
> This ticket was closed wontfix 
>  as requiring a 
> discussion here.
>
> David Smith mentioned this Tox issue 
>  stating it had been closed, 
> but to me it seems like it hasn't been closed (maybe there's something I 
> can't see) and apparently a PR would be accepted to add aliases at the 
> least (this is more recent than the comment on the Django ticket).
>
> My impetus to bring this up mostly comes from reading this ZDNet article 
> 
>  
> - it seems like Google have already made moves in this direction and GitHub 
> is also planning to. Usually Django is somewhere near the front for these 
> types of changes.
>
> I'm leaning towards renaming the master branch and wherever else we use 
> that terminology, but I'm less sure about black/whitelist, though right now 
> it seems more positive than negative. Most arguments against use some kind 
> of etymological argument, but I don't think debates about historical terms 
> are as interesting as how they affect people in the here and now.
>
> I don't think there is an easy answer here, and I open this can of worms 
> somewhat reluctantly. I do think Luke is correct that we should be 
> concerned with our credibility if we wrongly change this, but I'm also 
> worried about our credibility if we don't.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1dae409c-bef0-404a-988d-89e548fbdfaao%40googlegroups.com.