Thanks, Alexandr, for your effort in addressing what I personally consider a much needed feature in the ORM. I gave your PR a try and noticed that evaluating the same Product.objects.annotate line below twice in a row returns two different SQL queries, the second of which results in an error.
Product.objects.annotate(taxes=Subquery(taxes, join=True, join_type=LOUTER)) I'm using different sample models in my project since I didn't have your models handy to test, but here's a gist with the terminal session and traceback: https://gist.github.com/airstrike/e6ac4feb96136b71dda5e3fcfa32418e You should be able to recreate the error easily with your own models by just running the line above twice. Seems like the Subquery() instantiation is having some unintended side-effects... Now, more broadly, it seems like we currently have various initial attempts at expanding the ability to handle subqueries in the ORM but with a few different purposes – I count at least 5: - Improving support for subqueries in UPDATE statements (https://code.djangoproject.com/ticket/29214) - Adding support for CTEs (https://github.com/ashleywaite/django/tree/cte-dev) - Adding support for *recursive* CTEs for hierarchical / tree data (https://code.djangoproject.com/ticket/28919 and https://github.com/matthiask/django-cte-forest) - Alexandr's suggested approach of joining tables with Subqueries (this thread) - My personal desire / need for derived tables for advanced annotation / aggregation, i.e. subqueries in the FROM clause (https://www.essentialsql.com/get-ready-to-learn-sql-server-22-using-subqueries-in-the-from-clause/) which to me feels essentially equivalent to non-recursive CTE Rather than letting each of these efforts operate (and for the most part, die) independently, I believe Django would benefit from a concerted approach for a few reasons: 1. leveraging each other's expertise 2. leveraging code between those different efforts and minimizing complexity / churn 3. creating a consistent API 4. actually getting buy-in from the core developers to have this merged at some point I'm not close enough to Django or sufficiently well-versed in the ORM to know exactly what the next steps are, but it feels to me that we need a DEP that starts by reviewing all of the past efforts, existing needs and outlines a consistent API along with the equivalent SQL for clarity. So I invite others to provide feedback on whether they believe a DEP is the right path forward – though I'll note that I don't have the time or, most importantly, the expertise to be its Author <https://github.com/django/deps/blob/master/final/0001-dep-process.rst#forming-the-team> Best, Andy On Wednesday, April 15, 2020 at 3:52:03 AM UTC-4, Alexandr Tatarinov wrote: > This has not received a lot of feedback so far, so I think some code can > help. I am not sure what is the next steps; as I've understood, we need > some kind of consensus to open a ticket. > > Please, don't hesitate to express your opinion on the feature and the API, > as well as the implementation to move this forward. > https://github.com/tatarinov1997/django/pull/1/ > > Cheers, > Alexandr. > > > > On Sunday, 12 April 2020 19:35:55 UTC+3, Alexandr Tatarinov wrote: >> >> Hello everyone. >> So far, I've managed to implement joins via Subquery and OuterRef with >> some caveats. >> >> To begin with, the API looks like this >> >> Product.objects.create(name='Pizza', country='USA', type='FOOD') >> Tax.objects.create(name='USA Food Tax', amount=10, product_country='USA', >> product_type='FOOD') >> >> taxes = Tax.objects.filter( >> product_type=OuterRef('type'), >> product_country=OuterRef('country') >> ) >> products = Product.objects.annotate(taxes=Subquery(taxes, join=True, >> join_type=LOUTER)) >> product = products.annotate(tax=F('taxes__amount'), tax_name=F( >> 'taxes__name')) >> >> The generated SQL is >> SELECT "core_product"."id", "core_product"."name", "core_product". >> "country", "core_product"."type", "taxes"."__col3" AS "tax", "taxes". >> "__col4" AS "tax_name" >> FROM "core_product" >> LEFT OUTER JOIN >> ( >> SELECT >> "core_tax"."id", "core_tax"."product_country", "core_tax". >> "product_type", "core_tax"."name", "core_tax"."amount", >> "core_tax"."product_country" AS "__col1", "core_tax". >> "product_type" AS "__col2", "core_tax"."amount" AS "__col3", "core_tax". >> "name" AS "__col4" >> FROM "core_tax" >> ) "taxes" ON ("taxes"."__col1" = ("core_product"."country") AND "taxes". >> "__col2" = ("core_product"."type")) >> >> We use *.annotate* for joins to avoid adding new methods, also >> *.annotate* has some useful checks to avoid name collisions. >> To distinguish plain subquery from the joined subquery, a *join *flag is >> added. To specify join type, there is an optional *join_type* param to >> Subquery. >> >> Then, one can refer to joined subquery columns by the given alias, but >> only via *F()* expression. >> This means it is not possible to write *.values('taxes__name') *directly*, >> *but can be achieved via *.annotate(**tax_name=F('taxes__name'))* >> *.values('tax_name').* >> Other examples are *.order_by(F('taxes__name')) *and* >> .values_list(F('taxes__name')), *they work without *.annotate() *but >> require *F() *wrapper. >> >> The reasons why the string fields and *F* expressions are treated >> separately and differently in various QuerySet methods are still unclear >> for me. >> The *F *expression works because the implementation relies on >> *Query.resolve_ref >> *to be called to short-circuit columns from the joined subqueries. >> Maybe, we will be able to fix this later or leave as-is with the >> documentation in place to always use *.annotate*. >> >> The implementation is ~170 lines long, but I think some corner cases are >> yet to be discovered. >> For now, it would be great to hear your feedback on the API and the >> feature itself, so we can move on with the ticket and discuss the >> implementation. >> >> Thanks, Alexandr. >> >> >> On Monday, 6 April 2020 16:34:55 UTC+3, Alexandr Tatarinov wrote: >>> >>> Hello folks, >>> >>> Following the discussion >>> https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, >>> I would like to suggest adding the ability to join QuerySet with a subquery >>> clause to Django. >>> The benefits are pretty much described and discussed in the linked >>> topic, the only one I would like to add (my use case) is the ability to >>> select multiple columns from a Subquery without the need to repeat the >>> subquery clause, which hurts performance. >>> As Anssi Kääriäinen mentioned, generic subquery supports seems to be a >>> better idea than CTEs, and simple implementation as a starting point will >>> allow moving gradually and add RETURNING support later on (for nested >>> creates and updates). >>> >>> I have created a working prototype and successfully using it in one of >>> my projects in production. After reading Daniel Miller's >>> https://github.com/dimagi/django-cte, I've rewritten the API inspired >>> by his ideas, plus tried to keep it as Django-style as possible. >>> Here is the usage example. >>> >>> taxes = JoinedSubquery(Tax.objects.all()) >>> products = Product.objects.all() >>> >>> # .join is probably better be placed in QuerySet class, so it will be >>> products.join(taxes, ...) >>> products_with_taxes = taxes.join( >>> products, >>> join_type=LOUTER, # Optional argument, LOUTER by default >>> country=taxes.F('product_country'), # Also Q objects are supported >>> type=taxes.F('product_type') >>> ).annotate( >>> tax_amount=taxes.F('amount') # .join() and .annotate() calls >>> needn't to be chained >>> ) >>> >>> >>> I am not sure about named joins discussed in the CTE topic >>> qs.attach(q1=some_qs).filter(a=F('q1__b')) >>> >>> It seems to be more implicit and harder to implement, requiring changes >>> to existing functionality and thus introducing a big patch which I would >>> like to avoid. >>> >>> It is not possible to follow relations in *join.F* expressions, but it >>> is easily achievable by firstly fetching needed columns by *annotate*(), >>> so I don't see the need to implement it. >>> >>> Source code is available here >>> https://gist.github.com/tatarinov1997/126c49c4b1bb44ae6c57afbc5f43f58d , >>> tested with Django 2.2. Feel free to copy and play around. >>> >>> So, please let me know what do you think about it. I am willing to >>> continue working on it, and any help is appreciated. >>> >>> Thanks, >>> Alexandr. >>> >> -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/e78e87f6-ab88-4922-9d3d-a4970ecadc1f%40googlegroups.com.