Hello everyone.
So far, I've managed to implement joins via Subquery and OuterRef with some 
caveats.

To begin with, the API looks like this

Product.objects.create(name='Pizza', country='USA', type='FOOD')
Tax.objects.create(name='USA Food Tax', amount=10, product_country='USA', 
product_type='FOOD')
        
taxes = Tax.objects.filter(
    product_type=OuterRef('type'),
    product_country=OuterRef('country')
)
products = Product.objects.annotate(taxes=Subquery(taxes, join=True, 
join_type=LOUTER))
product = products.annotate(tax=F('taxes__amount'), tax_name=F('taxes__name'
))

The generated SQL is
SELECT "core_product"."id", "core_product"."name", "core_product"."country", 
"core_product"."type", "taxes"."__col3" AS "tax", "taxes"."__col4" AS 
"tax_name" 
FROM "core_product" 
LEFT OUTER JOIN 
(
    SELECT 
        "core_tax"."id", "core_tax"."product_country", "core_tax".
"product_type", "core_tax"."name", "core_tax"."amount", 
        "core_tax"."product_country" AS "__col1", "core_tax"."product_type" 
AS "__col2", "core_tax"."amount" AS "__col3", "core_tax"."name" AS "__col4"
    FROM "core_tax"
) "taxes" ON ("taxes"."__col1" = ("core_product"."country") AND "taxes".
"__col2" = ("core_product"."type"))

We use *.annotate* for joins to avoid adding new methods, also *.annotate* has 
some useful checks to avoid name collisions.
To distinguish plain subquery from the joined subquery, a *join *flag is 
added. To specify join type, there is an optional *join_type* param to 
Subquery.

Then, one can refer to joined subquery columns by the given alias, but only 
via *F()* expression.
This means it is not possible to write *.values('taxes__name') *directly*, *but 
can be achieved via *.annotate(**tax_name=F('taxes__name'))* 
*.values('tax_name').*
Other examples are *.order_by(F('taxes__name')) *and* 
.values_list(F('taxes__name')), *they work without *.annotate() *but 
require *F() *wrapper. 

The reasons why the string fields and *F* expressions are treated 
separately and differently in various QuerySet methods are still unclear 
for me.
The *F *expression works because the implementation relies on 
*Query.resolve_ref 
*to be called to short-circuit columns from the joined subqueries.
Maybe, we will be able to fix this later or leave as-is with the 
documentation in place to always use *.annotate*.

The implementation is ~170 lines long, but I think some corner cases are 
yet to be discovered.
For now, it would be great to hear your feedback on the API and the feature 
itself, so we can move on with the ticket and discuss the implementation.

Thanks, Alexandr.


On Monday, 6 April 2020 16:34:55 UTC+3, Alexandr Tatarinov wrote:
>
> Hello folks,
>
> Following the discussion 
> https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, I 
> would like to suggest adding the ability to join QuerySet with a subquery 
> clause to Django.
> The benefits are pretty much described and discussed in the linked topic, 
> the only one I would like to add (my use case) is the ability to select 
> multiple columns from a Subquery without the need to repeat the subquery 
> clause, which hurts performance.
> As Anssi Kääriäinen mentioned, generic subquery supports seems to be a 
> better idea than CTEs, and simple implementation as a starting point will 
> allow moving gradually and add RETURNING support later on (for nested 
> creates and updates).
>
> I have created a working prototype and successfully using it in one of my 
> projects in production. After reading Daniel Miller's 
> https://github.com/dimagi/django-cte, I've rewritten the API inspired by 
> his ideas, plus tried to keep it as Django-style as possible.
> Here is the usage example.
>
> taxes = JoinedSubquery(Tax.objects.all())
> products = Product.objects.all()
>
> # .join is probably better be placed in QuerySet class, so it will be 
> products.join(taxes, ...)
> products_with_taxes = taxes.join(
>     products,
>     join_type=LOUTER,  # Optional argument, LOUTER by default
>     country=taxes.F('product_country'),  # Also Q objects are supported 
>     type=taxes.F('product_type')
> ).annotate(
>     tax_amount=taxes.F('amount')  # .join() and .annotate() calls needn't 
> to be chained
> )
>
>
> I am not sure about named joins discussed in the CTE topic
> qs.attach(q1=some_qs).filter(a=F('q1__b'))
>
> It seems to be more implicit and harder to implement, requiring changes to 
> existing functionality and thus introducing a big patch which I would like 
> to avoid.
>
> It is not possible to follow relations in *join.F* expressions, but it is 
> easily achievable by firstly fetching needed columns by *annotate*(), so 
> I don't see the need to implement it.
>
> Source code is available here 
> https://gist.github.com/tatarinov1997/126c49c4b1bb44ae6c57afbc5f43f58d , 
> tested with Django 2.2. Feel free to copy and play around.
>
> So, please let me know what do you think about it. I am willing to 
> continue working on it, and any help is appreciated.
>
> Thanks,
> Alexandr.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/021ace4e-a066-46ec-aefc-1478f2d8c127%40googlegroups.com.

Reply via email to