#32801: Support for non-aggregation subqueries in FROM clause
------------------------------------------+------------------------
               Reporter:  Jameel Al-Aziz  |          Owner:  nobody
                   Type:  New feature     |         Status:  new
              Component:  Uncategorized   |        Version:  3.2
               Severity:  Normal          |       Keywords:
           Triage Stage:  Unreviewed      |      Has patch:  0
    Needs documentation:  0               |    Needs tests:  0
Patch needs improvement:  0               |  Easy pickings:  0
                  UI/UX:  0               |
------------------------------------------+------------------------
 This relates to https://code.djangoproject.com/ticket/24462 and is perhaps
 a duplicate, but the underlying issue is a bit different.

 When using JSONB columns in Postgres, it is sometimes useful to query JSON
 array data as if they're rows. This can be accomplished with the
 `jsob_array_elements` function. The problem with this function is that the
 resulting row cannot be filtered directly and must first be wrapped in a
 subquery.

 In particular, I would like to be able to generate a query such as:
 {{{
 select foo_w_elems.id
 from (
         select id, jsonb_array_elements(json_data->'some_array') as elem
         from foo as foo1
         union
         select id, jsonb_array_elements(json_data->'other_array') as elem
         from foo as foo2
 ) as foo_w_elems
 where (elem->>'bar_id')::int in (
         select id
         from bar
         where expires_at >= CURRENT_TIMESTAMP
 )
 }}}

 Even without the union, I could not find a way to generate such a query
 with Django today.

 It appears that Django doesn't support querying FROM a subquery. While
 `AggregateQuery` gets us part of the way there, it doesn't support
 filtering or any other operation applied after a `FROM` clause.

 I have managed to hack together support with something similar to:
 {{{
 class SubqueryTable:
     join_type = None
     parent_alias = None

     def __init__(self, query, alias):
         self.subquery = query
         self.subquery_alias = alias

     def as_sql(self, compiler, connection):
         alias_str = f' {self.subquery_alias}'
         base_sql, params = self.subquery.as_sql(compiler, connection)
         return base_sql + alias_str, params

     def relabeled_clone(self, change_map):
         return self.__class__(
             self.subquery,
             change_map.get(self.subquery_alias, self.subquery_alias),
         )


 class JsonbArrayElements(Func):
     function = 'jsonb_array_elements'


 class FooQuerySet(models.QuerySet):
     def has_expired_bar(self):
         obj = self._chain()

         # Expand each leg into a separate row
         foo_with_some_array = self.annotate(
             elem=JsonbArrayElements(KeyTransform('some_array',
 'json_data'))
         )
         foo_with_other_array = self.annotate(
             elem=JsonbArrayElements(KeyTransform('other_array',
 'json_data'))
         )
         foo_with_array_elems =
 foo_with_some_array.union(foo_with_other_array)

         # Convert the query into a subquery
         subquery = foo_with_array_elems.query
         subquery.subquery = True

         # Exclude foo with active bars
         # The use of "Ref" is a bit of a hack and an abuse of the API.
         # Unfortunately, I couldn't find a better way to reference
         # fields defined in the subquery.
         obj = obj.annotate(
             elem_bar_id=Cast(
                 KeyTextTransform('bar_id', Ref('elem', subquery)),
                 models.IntegerField()
             ),
         )
         bars = Bar.objects.active()
         obj = obj.exclude(elem_bat_id__in=bars.values('pk'))

         # This is a hack to force Django to relabel a field's table name
         # to match the FROM clause table alias
         relabels = {t: 'subquery' for t in subquery.alias_map}
         relabels[None] = 'subquery'
         obj.query.change_aliases(relabels)

         # Override the FROM clause by aliasing to the subquery
         obj.query.alias_map['subquery'] = SubqueryTable(subquery,
 'subquery')

         # Remove the annotation from the select clause to avoid
         # introducing a hidden extra column
         annotation_mask = obj.query.annotation_select_mask
         annotation_mask = annotation_mask.remove('elem_bar_id') if
 annotation_mask else []
         obj.query.set_annotation_mask(annotation_mask)

         return obj.distinct()
 }}}

 While implementing this workaround, I found a few issues:
 1. I was unable to find a good way to have the query layer automatically
 resolve a reference to an annotation defined in a subquery without
 redefining it. While aggregated queries technically seem to have support
 for this, it appears they "lift" the annotation into the outer query
 instead of referring to the annotation within the subquery. This
 distinction is important for `jsonb_array_elements`.
 2. Because of #1, I had to annotate the outer query and then find a way to
 trick the query later to use a field name that would resolve when executed
 against the DB.
 3. While the query layer seems to handle renaming aliases in subqueries
 for aggregate queries, I had to manually change the outer query table
 alias to properly refer to the subquery fields.

 I realize that full support for subqueries is likely far more challenging
 than my workaround, but support would be appreciated as filtering
 subqueries in an outer query is sometimes necessary to make use of more
 advanced DB features.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/32801>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/050.fab4d31aabf9a6c3f59fb2741c73c8a5%40djangoproject.com.

Reply via email to