Hello everyone. So far, I've managed to implement joins via Subquery and OuterRef with some caveats.
To begin with, the API looks like this Product.objects.create(name='Pizza', country='USA', type='FOOD') Tax.objects.create(name='USA Food Tax', amount=10, product_country='USA', product_type='FOOD') taxes = Tax.objects.filter( product_type=OuterRef('type'), product_country=OuterRef('country') ) products = Product.objects.annotate(taxes=Subquery(taxes, join=True, join_type=LOUTER)) product = products.annotate(tax=F('taxes__amount'), tax_name=F('taxes__name' )) The generated SQL is SELECT "core_product"."id", "core_product"."name", "core_product"."country", "core_product"."type", "taxes"."__col3" AS "tax", "taxes"."__col4" AS "tax_name" FROM "core_product" LEFT OUTER JOIN ( SELECT "core_tax"."id", "core_tax"."product_country", "core_tax". "product_type", "core_tax"."name", "core_tax"."amount", "core_tax"."product_country" AS "__col1", "core_tax"."product_type" AS "__col2", "core_tax"."amount" AS "__col3", "core_tax"."name" AS "__col4" FROM "core_tax" ) "taxes" ON ("taxes"."__col1" = ("core_product"."country") AND "taxes". "__col2" = ("core_product"."type")) We use *.annotate* for joins to avoid adding new methods, also *.annotate* has some useful checks to avoid name collisions. To distinguish plain subquery from the joined subquery, a *join *flag is added. To specify join type, there is an optional *join_type* param to Subquery. Then, one can refer to joined subquery columns by the given alias, but only via *F()* expression. This means it is not possible to write *.values('taxes__name') *directly*, *but can be achieved via *.annotate(**tax_name=F('taxes__name'))* *.values('tax_name').* Other examples are *.order_by(F('taxes__name')) *and* .values_list(F('taxes__name')), *they work without *.annotate() *but require *F() *wrapper. The reasons why the string fields and *F* expressions are treated separately and differently in various QuerySet methods are still unclear for me. The *F *expression works because the implementation relies on *Query.resolve_ref *to be called to short-circuit columns from the joined subqueries. Maybe, we will be able to fix this later or leave as-is with the documentation in place to always use *.annotate*. The implementation is ~170 lines long, but I think some corner cases are yet to be discovered. For now, it would be great to hear your feedback on the API and the feature itself, so we can move on with the ticket and discuss the implementation. Thanks, Alexandr. On Monday, 6 April 2020 16:34:55 UTC+3, Alexandr Tatarinov wrote: > > Hello folks, > > Following the discussion > https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, I > would like to suggest adding the ability to join QuerySet with a subquery > clause to Django. > The benefits are pretty much described and discussed in the linked topic, > the only one I would like to add (my use case) is the ability to select > multiple columns from a Subquery without the need to repeat the subquery > clause, which hurts performance. > As Anssi Kääriäinen mentioned, generic subquery supports seems to be a > better idea than CTEs, and simple implementation as a starting point will > allow moving gradually and add RETURNING support later on (for nested > creates and updates). > > I have created a working prototype and successfully using it in one of my > projects in production. After reading Daniel Miller's > https://github.com/dimagi/django-cte, I've rewritten the API inspired by > his ideas, plus tried to keep it as Django-style as possible. > Here is the usage example. > > taxes = JoinedSubquery(Tax.objects.all()) > products = Product.objects.all() > > # .join is probably better be placed in QuerySet class, so it will be > products.join(taxes, ...) > products_with_taxes = taxes.join( > products, > join_type=LOUTER, # Optional argument, LOUTER by default > country=taxes.F('product_country'), # Also Q objects are supported > type=taxes.F('product_type') > ).annotate( > tax_amount=taxes.F('amount') # .join() and .annotate() calls needn't > to be chained > ) > > > I am not sure about named joins discussed in the CTE topic > qs.attach(q1=some_qs).filter(a=F('q1__b')) > > It seems to be more implicit and harder to implement, requiring changes to > existing functionality and thus introducing a big patch which I would like > to avoid. > > It is not possible to follow relations in *join.F* expressions, but it is > easily achievable by firstly fetching needed columns by *annotate*(), so > I don't see the need to implement it. > > Source code is available here > https://gist.github.com/tatarinov1997/126c49c4b1bb44ae6c57afbc5f43f58d , > tested with Django 2.2. Feel free to copy and play around. > > So, please let me know what do you think about it. I am willing to > continue working on it, and any help is appreciated. > > Thanks, > Alexandr. > -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/021ace4e-a066-46ec-aefc-1478f2d8c127%40googlegroups.com.