#32801: Support for non-aggregation subqueries in FROM clause
--------------------------------+--------------------------------------
     Reporter:  Jameel Al-Aziz  |                    Owner:  nobody
         Type:  New feature     |                   Status:  new
    Component:  Uncategorized   |                  Version:  3.2
     Severity:  Normal          |               Resolution:
     Keywords:                  |             Triage Stage:  Unreviewed
    Has patch:  0               |      Needs documentation:  0
  Needs tests:  0               |  Patch needs improvement:  0
Easy pickings:  0               |                    UI/UX:  0
--------------------------------+--------------------------------------
Description changed by Jameel Al-Aziz:

Old description:

> This relates to https://code.djangoproject.com/ticket/24462 and is
> perhaps a duplicate, but the underlying issue is a bit different.
>
> When using JSONB columns in Postgres, it is sometimes useful to query
> JSON array data as if they're rows. This can be accomplished with the
> `jsob_array_elements` function. The problem with this function is that
> the resulting row cannot be filtered directly and must first be wrapped
> in a subquery.
>
> In particular, I would like to be able to generate a query such as:
> {{{
> select foo_w_elems.id
> from (
>         select id, jsonb_array_elements(json_data->'some_array') as elem
>         from foo as foo1
>         union
>         select id, jsonb_array_elements(json_data->'other_array') as elem
>         from foo as foo2
> ) as foo_w_elems
> where (elem->>'bar_id')::int in (
>         select id
>         from bar
>         where expires_at >= CURRENT_TIMESTAMP
> )
> }}}
>
> Even without the union, I could not find a way to generate such a query
> with Django today.
>
> It appears that Django doesn't support querying FROM a subquery. While
> `AggregateQuery` gets us part of the way there, it doesn't support
> filtering or any other operation applied after a `FROM` clause.
>
> I have managed to hack together support with something similar to:
> {{{
> class SubqueryTable:
>     join_type = None
>     parent_alias = None
>
>     def __init__(self, query, alias):
>         self.subquery = query
>         self.subquery_alias = alias
>
>     def as_sql(self, compiler, connection):
>         alias_str = f' {self.subquery_alias}'
>         base_sql, params = self.subquery.as_sql(compiler, connection)
>         return base_sql + alias_str, params
>
>     def relabeled_clone(self, change_map):
>         return self.__class__(
>             self.subquery,
>             change_map.get(self.subquery_alias, self.subquery_alias),
>         )
>

> class JsonbArrayElements(Func):
>     function = 'jsonb_array_elements'
>

> class FooQuerySet(models.QuerySet):
>     def has_expired_bar(self):
>         obj = self._chain()
>
>         # Expand each leg into a separate row
>         foo_with_some_array = self.annotate(
>             elem=JsonbArrayElements(KeyTransform('some_array',
> 'json_data'))
>         )
>         foo_with_other_array = self.annotate(
>             elem=JsonbArrayElements(KeyTransform('other_array',
> 'json_data'))
>         )
>         foo_with_array_elems =
> foo_with_some_array.union(foo_with_other_array)
>
>         # Convert the query into a subquery
>         subquery = foo_with_array_elems.query
>         subquery.subquery = True
>
>         # Exclude foo with active bars
>         # The use of "Ref" is a bit of a hack and an abuse of the API.
>         # Unfortunately, I couldn't find a better way to reference
>         # fields defined in the subquery.
>         obj = obj.annotate(
>             elem_bar_id=Cast(
>                 KeyTextTransform('bar_id', Ref('elem', subquery)),
>                 models.IntegerField()
>             ),
>         )
>         bars = Bar.objects.active()
>         obj = obj.exclude(elem_bat_id__in=bars.values('pk'))
>
>         # This is a hack to force Django to relabel a field's table name
>         # to match the FROM clause table alias
>         relabels = {t: 'subquery' for t in subquery.alias_map}
>         relabels[None] = 'subquery'
>         obj.query.change_aliases(relabels)
>
>         # Override the FROM clause by aliasing to the subquery
>         obj.query.alias_map['subquery'] = SubqueryTable(subquery,
> 'subquery')
>
>         # Remove the annotation from the select clause to avoid
>         # introducing a hidden extra column
>         annotation_mask = obj.query.annotation_select_mask
>         annotation_mask = annotation_mask.remove('elem_bar_id') if
> annotation_mask else []
>         obj.query.set_annotation_mask(annotation_mask)
>
>         return obj.distinct()
> }}}
>
> While implementing this workaround, I found a few issues:
> 1. I was unable to find a good way to have the query layer automatically
> resolve a reference to an annotation defined in a subquery without
> redefining it. While aggregated queries technically seem to have support
> for this, it appears they "lift" the annotation into the outer query
> instead of referring to the annotation within the subquery. This
> distinction is important for `jsonb_array_elements`.
> 2. Because of (1), I had to annotate the outer query and then find a way
> to trick the query later to use a field name that would resolve when
> executed against the DB.
> 3. While the query layer seems to handle renaming aliases in subqueries
> for aggregate queries, I had to manually change the outer query table
> alias to properly refer to the subquery fields.
>
> I realize that full support for subqueries is likely far more challenging
> than my workaround, but support would be appreciated as filtering
> subqueries in an outer query is sometimes necessary to make use of more
> advanced DB features.

New description:

 This relates to #24462 and is perhaps a duplicate, but the underlying
 issue is a bit different.

 When using JSONB columns in Postgres, it is sometimes useful to query JSON
 array data as if they're rows. This can be accomplished with the
 `jsob_array_elements` function. The problem with this function is that the
 resulting row cannot be filtered directly and must first be wrapped in a
 subquery.

 In particular, I would like to be able to generate a query such as:
 {{{
 select foo_w_elems.id
 from (
         select id, jsonb_array_elements(json_data->'some_array') as elem
         from foo as foo1
         union
         select id, jsonb_array_elements(json_data->'other_array') as elem
         from foo as foo2
 ) as foo_w_elems
 where (elem->>'bar_id')::int in (
         select id
         from bar
         where expires_at >= CURRENT_TIMESTAMP
 )
 }}}

 Even without the union, I could not find a way to generate such a query
 with Django today.

 It appears that Django doesn't support querying FROM a subquery. While
 `AggregateQuery` gets us part of the way there, it doesn't support
 filtering or any other operation applied after a `FROM` clause.

 I have managed to hack together support with something similar to:
 {{{
 class SubqueryTable:
     join_type = None
     parent_alias = None

     def __init__(self, query, alias):
         self.subquery = query
         self.subquery_alias = alias

     def as_sql(self, compiler, connection):
         alias_str = f' {self.subquery_alias}'
         base_sql, params = self.subquery.as_sql(compiler, connection)
         return base_sql + alias_str, params

     def relabeled_clone(self, change_map):
         return self.__class__(
             self.subquery,
             change_map.get(self.subquery_alias, self.subquery_alias),
         )


 class JsonbArrayElements(Func):
     function = 'jsonb_array_elements'


 class FooQuerySet(models.QuerySet):
     def has_expired_bar(self):
         obj = self._chain()

         # Expand each leg into a separate row
         foo_with_some_array = self.annotate(
             elem=JsonbArrayElements(KeyTransform('some_array',
 'json_data'))
         )
         foo_with_other_array = self.annotate(
             elem=JsonbArrayElements(KeyTransform('other_array',
 'json_data'))
         )
         foo_with_array_elems =
 foo_with_some_array.union(foo_with_other_array)

         # Convert the query into a subquery
         subquery = foo_with_array_elems.query
         subquery.subquery = True

         # Exclude foo with active bars
         # The use of "Ref" is a bit of a hack and an abuse of the API.
         # Unfortunately, I couldn't find a better way to reference
         # fields defined in the subquery.
         obj = obj.annotate(
             elem_bar_id=Cast(
                 KeyTextTransform('bar_id', Ref('elem', subquery)),
                 models.IntegerField()
             ),
         )
         bars = Bar.objects.active()
         obj = obj.exclude(elem_bat_id__in=bars.values('pk'))

         # This is a hack to force Django to relabel a field's table name
         # to match the FROM clause table alias
         relabels = {t: 'subquery' for t in subquery.alias_map}
         relabels[None] = 'subquery'
         obj.query.change_aliases(relabels)

         # Override the FROM clause by aliasing to the subquery
         obj.query.alias_map['subquery'] = SubqueryTable(subquery,
 'subquery')

         # Remove the annotation from the select clause to avoid
         # introducing a hidden extra column
         annotation_mask = obj.query.annotation_select_mask
         annotation_mask = annotation_mask.remove('elem_bar_id') if
 annotation_mask else []
         obj.query.set_annotation_mask(annotation_mask)

         return obj.distinct()
 }}}

 While implementing this workaround, I found a few issues:
 1. I was unable to find a good way to have the query layer automatically
 resolve a reference to an annotation defined in a subquery without
 redefining it. While aggregated queries technically seem to have support
 for this, it appears they "lift" the annotation into the outer query
 instead of referring to the annotation within the subquery. This
 distinction is important for `jsonb_array_elements`.
 2. Because of (1), I had to annotate the outer query and then find a way
 to trick the query later to use a field name that would resolve when
 executed against the DB.
 3. While the query layer seems to handle renaming aliases in subqueries
 for aggregate queries, I had to manually change the outer query table
 alias to properly refer to the subquery fields.

 I realize that full support for subqueries is likely far more challenging
 than my workaround, but support would be appreciated as filtering
 subqueries in an outer query is sometimes necessary to make use of more
 advanced DB features.

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/32801#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.e775b36668fc8ab91e296a53b783a017%40djangoproject.com.

Reply via email to