#32801: Support for non-aggregation subqueries in FROM clause
------------------------------------------+------------------------
Reporter: Jameel Al-Aziz | Owner: nobody
Type: New feature | Status: new
Component: Uncategorized | Version: 3.2
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------+------------------------
This relates to https://code.djangoproject.com/ticket/24462 and is perhaps
a duplicate, but the underlying issue is a bit different.
When using JSONB columns in Postgres, it is sometimes useful to query JSON
array data as if they're rows. This can be accomplished with the
`jsob_array_elements` function. The problem with this function is that the
resulting row cannot be filtered directly and must first be wrapped in a
subquery.
In particular, I would like to be able to generate a query such as:
{{{
select foo_w_elems.id
from (
select id, jsonb_array_elements(json_data->'some_array') as elem
from foo as foo1
union
select id, jsonb_array_elements(json_data->'other_array') as elem
from foo as foo2
) as foo_w_elems
where (elem->>'bar_id')::int in (
select id
from bar
where expires_at >= CURRENT_TIMESTAMP
)
}}}
Even without the union, I could not find a way to generate such a query
with Django today.
It appears that Django doesn't support querying FROM a subquery. While
`AggregateQuery` gets us part of the way there, it doesn't support
filtering or any other operation applied after a `FROM` clause.
I have managed to hack together support with something similar to:
{{{
class SubqueryTable:
join_type = None
parent_alias = None
def __init__(self, query, alias):
self.subquery = query
self.subquery_alias = alias
def as_sql(self, compiler, connection):
alias_str = f' {self.subquery_alias}'
base_sql, params = self.subquery.as_sql(compiler, connection)
return base_sql + alias_str, params
def relabeled_clone(self, change_map):
return self.__class__(
self.subquery,
change_map.get(self.subquery_alias, self.subquery_alias),
)
class JsonbArrayElements(Func):
function = 'jsonb_array_elements'
class FooQuerySet(models.QuerySet):
def has_expired_bar(self):
obj = self._chain()
# Expand each leg into a separate row
foo_with_some_array = self.annotate(
elem=JsonbArrayElements(KeyTransform('some_array',
'json_data'))
)
foo_with_other_array = self.annotate(
elem=JsonbArrayElements(KeyTransform('other_array',
'json_data'))
)
foo_with_array_elems =
foo_with_some_array.union(foo_with_other_array)
# Convert the query into a subquery
subquery = foo_with_array_elems.query
subquery.subquery = True
# Exclude foo with active bars
# The use of "Ref" is a bit of a hack and an abuse of the API.
# Unfortunately, I couldn't find a better way to reference
# fields defined in the subquery.
obj = obj.annotate(
elem_bar_id=Cast(
KeyTextTransform('bar_id', Ref('elem', subquery)),
models.IntegerField()
),
)
bars = Bar.objects.active()
obj = obj.exclude(elem_bat_id__in=bars.values('pk'))
# This is a hack to force Django to relabel a field's table name
# to match the FROM clause table alias
relabels = {t: 'subquery' for t in subquery.alias_map}
relabels[None] = 'subquery'
obj.query.change_aliases(relabels)
# Override the FROM clause by aliasing to the subquery
obj.query.alias_map['subquery'] = SubqueryTable(subquery,
'subquery')
# Remove the annotation from the select clause to avoid
# introducing a hidden extra column
annotation_mask = obj.query.annotation_select_mask
annotation_mask = annotation_mask.remove('elem_bar_id') if
annotation_mask else []
obj.query.set_annotation_mask(annotation_mask)
return obj.distinct()
}}}
While implementing this workaround, I found a few issues:
1. I was unable to find a good way to have the query layer automatically
resolve a reference to an annotation defined in a subquery without
redefining it. While aggregated queries technically seem to have support
for this, it appears they "lift" the annotation into the outer query
instead of referring to the annotation within the subquery. This
distinction is important for `jsonb_array_elements`.
2. Because of #1, I had to annotate the outer query and then find a way to
trick the query later to use a field name that would resolve when executed
against the DB.
3. While the query layer seems to handle renaming aliases in subqueries
for aggregate queries, I had to manually change the outer query table
alias to properly refer to the subquery fields.
I realize that full support for subqueries is likely far more challenging
than my workaround, but support would be appreciated as filtering
subqueries in an outer query is sometimes necessary to make use of more
advanced DB features.
--
Ticket URL: <https://code.djangoproject.com/ticket/32801>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/050.fab4d31aabf9a6c3f59fb2741c73c8a5%40djangoproject.com.