Order by in a sub query when aggregating the main query
Hi all, I have a question related to the order by clause used in a subquery of a main query that uses one or more aggregation functions with a group by. A basic example of the type of query in question is the following (see below for the actual query): select w, array_agg(x) from ( select v, v / 10 as w from pg_catalog.generate_series(25, 0, -1) as t(v) order by v ) as t(x) group by w This query will return an ordered array as specified by the order by clause.in the subquery. Can this behaviour be relied upon? >From what I could find from searching in SQL the order by in a subquery could be ignored by the engines, but I've found that postgresql will always respect it. The context of the question is the updated reflection logic that will be introduced in version 2 of SQLAlchemy, that makes use of orderby in subqueries to, for example, match column index of a constraint with the column name of a table. This query and other similar one return the correct result, and they seem stable in their output (ie the CI is not randomly failing because the order has changed). For more information this potential issue with the current query is traket in the issue https://github.com/sqlalchemy/sqlalchemy/issues/8561 Below is the full query that will be used in sqlalchemy to reflect constraints given the constraint type and on a list of table oids: select attr.conrelid, array_agg(attr.attname) as cols, attr.conname, min(attr.description) as description from ( select con.conrelid as conrelid, con.conname as conname, con.description as description, pg_catalog.pg_attribute.attname as attname from pg_catalog.pg_attribute join ( select pg_catalog.pg_constraint.conrelid as conrelid, pg_catalog.pg_constraint.conname as conname, unnest(pg_catalog.pg_constraint.conkey) as attnum, generate_subscripts(pg_catalog.pg_constraint.conkey, %(generate_subscripts_1)s) as ord, pg_catalog.pg_description.description as description from pg_catalog.pg_constraint left outer join pg_catalog.pg_description on pg_catalog.pg_description.objoid = pg_catalog.pg_constraint.oid where pg_catalog.pg_constraint.contype = :contype and pg_catalog.pg_constraint.conrelid in (:oids) ) as con on pg_catalog.pg_attribute.attnum = con.attnum and pg_catalog.pg_attribute.attrelid = con.conrelid order by con.conname, con.ord ) as attr group by attr.conrelid, attr.conname order by attr.conrelid, attr.conname The other reflection queries that use order by in subqueries are similar to the above, I can post them here if they may prove useful. Thank you Federico
Re: Order by in a sub query when aggregating the main query
Understood, thanks for the explanation. I'll work on updating the queries used by sqlalchemy to do array_agg(x order by x) instead of the order by in the subquery. > I think that right now that'd > incur additional sorting overhead, which is annoying. But work is > ongoing to recognize when the input is already correctly sorted > for an aggregate, so it should get better in PG 16 or so. Nice to know, hopefully it's too bad for this use case Thanks, Federico Caselli On Sun, 25 Sept 2022 at 00:20, Tom Lane wrote: > > Federico writes: > > A basic example of the type of query in question is the following (see > > below for the actual query): > > > select w, array_agg(x) > > from ( > > select v, v / 10 as w > > from pg_catalog.generate_series(25, 0, -1) as t(v) > > order by v > > ) as t(x) > > group by w > > > This query will return an ordered array as specified by the order by > > clause.in the subquery. > > Can this behaviour be relied upon? > > No, not really. It might always work given a particular set of > circumstances. As long as the planner chooses to do the outer > query's grouped aggregation as a HashAgg, there'd be no reason > for it to reshuffle the subquery output before feeding that to > array_agg. However, if it decided that sort-group-and-aggregate > was better, it'd insert a sort by w above the subquery, and then > you'd lose any certainty of the ordering by v continuing to hold. > (Maybe the sort by w would be stable for equal keys, but that's > not guaranteed.) > > What you really ought to do is write > > select w, array_agg(x order by x) > from ... > > to be in the clear per SQL standard. I think that right now that'd > incur additional sorting overhead, which is annoying. But work is > ongoing to recognize when the input is already correctly sorted > for an aggregate, so it should get better in PG 16 or so. > > regards, tom lane
Re: Order by in a sub query when aggregating the main query
I've changed the code to use order by in the aggregate and it seems there are no noticeable changes in the query performance. Thanks for the help. Best, Federico Caselli On Sun, 25 Sept 2022 at 00:30, Federico wrote: > > Understood, thanks for the explanation. > I'll work on updating the queries used by sqlalchemy to do array_agg(x > order by x) instead of the order by in the subquery. > > > I think that right now that'd > > incur additional sorting overhead, which is annoying. But work is > > ongoing to recognize when the input is already correctly sorted > > for an aggregate, so it should get better in PG 16 or so. > > Nice to know, hopefully it's too bad for this use case > > Thanks, Federico Caselli > > On Sun, 25 Sept 2022 at 00:20, Tom Lane wrote: > > > > Federico writes: > > > A basic example of the type of query in question is the following (see > > > below for the actual query): > > > > > select w, array_agg(x) > > > from ( > > > select v, v / 10 as w > > > from pg_catalog.generate_series(25, 0, -1) as t(v) > > > order by v > > > ) as t(x) > > > group by w > > > > > This query will return an ordered array as specified by the order by > > > clause.in the subquery. > > > Can this behaviour be relied upon? > > > > No, not really. It might always work given a particular set of > > circumstances. As long as the planner chooses to do the outer > > query's grouped aggregation as a HashAgg, there'd be no reason > > for it to reshuffle the subquery output before feeding that to > > array_agg. However, if it decided that sort-group-and-aggregate > > was better, it'd insert a sort by w above the subquery, and then > > you'd lose any certainty of the ordering by v continuing to hold. > > (Maybe the sort by w would be stable for equal keys, but that's > > not guaranteed.) > > > > What you really ought to do is write > > > > select w, array_agg(x order by x) > > from ... > > > > to be in the clear per SQL standard. I think that right now that'd > > incur additional sorting overhead, which is annoying. But work is > > ongoing to recognize when the input is already correctly sorted > > for an aggregate, so it should get better in PG 16 or so. > > > > regards, tom lane
Guidance on INSERT RETURNING order
Hello list, I have a few clarification questions regarding using insert with returning. The use case is SQLAlchemy development, where the orm wants to insert a list of rows, get back the generated ids, defaults, etc, and match the returned values with the original list of orm objects. The following assumes a table like this CREATE TABLE t( id SERIAL, data TEXT -- type here can be anything ) On PostgreSQL, since SQLAlchemy version 1.4 (Mar 2021) this operation has used INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING id but we were recently made aware that there is no guarantee on the order of the returned columns. Looking at the documentation there is no mention of the order of the RETURNING clause, but searching past discussion there are multiple indication that the order is not guaranteed, like https://www.postgresql.org/message-id/19445.1350482182%40sss.pgh.pa.us . I think the docs should mention this, similar to what the sqlite docs do at https://www.sqlite.org/lang_returning.html#limitations_and_caveats Searching the archive seems that a using the INSERT SELECT ORDER BY form should be a better solution, so the above insert should be rewritten as INSERT INTO t(data) SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num RETURNING id to ensure that the id are created in the order specified by num. The returned id can again be in arbitrary order, but sorting them should enable correctly matching the orm object so that they can be properly updated. Is this correct? The documentation does not say anything about this, and looking at the archive it seems that it's mostly correct but not 100% guaranteed, as stated here https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us . The MSSQL docs, for example, clearly state that this is the case https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver16#limitations-and-restrictions , so it would be helpful if something similar were mentioned in the PostgreSQL docs. The above insert form (INSERT SELECT ORDER BY) can be used when the primary key is an auto incrementing value, in case it isn't (such as when it's an UUID), another solution must be used. Since there does not seem to be any way of getting the position of the original row inside the VALUES clause with RETURNING, the solution SQLAlchemy is implementing is to either degrade to inserts with a single value or to optionally allow the user to add a "sentinel" column to the table, so that a sequential value can be inserted into it and then returned allowing the ordering of the RETURNING clause rows: ALTER TABLE t ADD COLUMN sentinel SMALLINT INSERT INTO t(data, sentinel) VALUES ('a', 1), ('b', 2), ('c', 3) RETURNING id, sentinel Is there any better solution to achieve this? (For reference this feature is tracked in SQLAlchemy by https://github.com/sqlalchemy/sqlalchemy/issues/9618) >From an ORM standpoint it would be very useful having a way of forcing the order of RETURNING to be the same as the one in VALUES, maybe with an additional keyword. Alternatively having a system column or other function that can be placed into the returning clause to return the output row position wrt the input values list, similar to what the sentinel column above does. At the very least I think the documentation could do a better job at mentioning that RETURNING order is arbitrary, and documenting that INSERT SELECT ORDER BY precesses the default in select order (if that's indeed the case) Sorry for the long email, Thanks Federico
Re: Guidance on INSERT RETURNING order
Thanks for the ansers > 2) What would you order by, id or data or both? by values order, (that incidentally seems to be what PG does) > with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3) > returning id) > select i.id from i order by id; The problem here is not having the auto increment id in a particular order, is that there is apparently no correlation with the position of an element in the values clause with the id generated. That's the reason for using the sentinel column in the general solution in the previous message. The extend on the use case, SQLAlchemy has 3 objects T that have T(data='a'), T(data='b'), T(data='c') but no value for the id column. The objective is to insert the 3 data values, get back the ids and correctly match them with the correct 3 objects. > No. Sadly, adding that ORDER BY is just voodoo programming, because > it applies to the result of the SELECT while promising nothing about > the order in which INSERT/RETURNING will act on those rows. I wasn't probably clear, it's fine if INSERT/RETURNING order is arbitrary, what matters is that the autoincementing values is executed in the same order as select, like mentioned in this previous message https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us Is that not the case? > Re-reading that 2012 thread, the main new observation I'd make today > is that parallel operation is a thing now, and it's not hard to foresee > that sometime soon we'll want to parallelize INSERTs. Which'd make it > *really* hard to promise anything about the order of RETURNING output. I think it's fine not promising anything about the order of RETURNING, but it would be very helpful having a way of tracking what input row generated a particular output row. Basically the sentinel case in the original post, without actually having to insert the sentinel into the table. > I think if you want to use RETURNING with multi-row inserts, the > thing to do is more like > > INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id > > and then explicitly match up the returned "data" values rather than > presuming they appear in the same order you wrote them in in VALUES. > Admittedly this might be problematic if some of the VALUES rows > are identical, but how much should you care? Well, the example is very easy, but it's hard to generalize when inserting multiple columns with possible complex values in them, since it would mean matching on possibly large json values, arrays, etc. So definitely not ideal Thanks, Federico On Tue, 11 Apr 2023 at 22:06, Adrian Klaver wrote: > > On 4/11/23 12:47, Federico wrote: > > Hello list, > > > https://www.sqlite.org/lang_returning.html#limitations_and_caveats > > > > Searching the archive seems that a using the INSERT SELECT ORDER BY > > form should be a better solution, > > so the above insert should be rewritten as > > > > INSERT INTO t(data) > > SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, > > num) ORDER BY num > > RETURNING id > > Or > > with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3) > returning id) > select i.id from i order by id; > > > Sorry for the long email, > > Thanks > > > > Federico > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 22:59, Thorsten Glaser wrote: > > On Tue, 11 Apr 2023, Federico wrote: > > >The problem here is not having the auto increment id in a particular > > The id might not even be auto-increment but UUID or something… > (I am surprised you would even try to insert multiple rows at once.) Well the documentation makes no mention of any limitation on returning and the observed behaviour has consistently been that returning is in values order. Again, that was SQLAlchemy's fault for assuming this (but the docs surely did not help). Also re-reading my reply, I've made a typo there, sorry. What it should have read is: The problem here is not having the returned ids in a particular order, is that there is apparently no correlation with the position of an element in the values clause with the id generated. Of course sorting the returned ids is only viable when using a serial or identity column, that's why in the general case I've mentioned the insert with sentinel column to ask if there are better or alternative solutions. Thanks for the reply, best Federico > > bye, > //mirabilos > -- > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:22, Thorsten Glaser wrote: > > On Tue, 11 Apr 2023, Federico wrote: > > >Of course sorting the returned ids is only viable when using a serial > > Yes, which is why I pointed out it doesn’t have to be. > > >or identity column, that's why in the general case I've mentioned the > >insert with sentinel column > > But it was pointed out that that’s not guaranteed either, unless you add > that sentinel column to the table itself… I was under the impression that when using INSERT SELECT ORDER BY the sequence ids were generated using the select order. That has been mentioned in multiple other previous messages, like https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us The above does not cover all cases, but in practice serial or identity are very common, so it would be nice if at least in these cases a sentinel is not needed Thanks, Federico > bye, > //mirabilos > -- > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:31, Rob Sargent wrote: > > On 4/11/23 14:37, Federico wrote: > > > > The problem here is not having the auto increment id in a particular > > order, is that there > > is apparently no correlation with the position of an element in the > > values clause with the > > id generated. That's the reason for using the sentinel column in the > > general solution in the previous message. > > > > The extend on the use case, SQLAlchemy has 3 objects T that have > > T(data='a'), T(data='b'), T(data='c') but no > > value for the id column. The objective is to insert the 3 data values, > > get back the ids and correctly match them with > > the correct 3 objects. > > > >> No. Sadly, adding that ORDER BY is just voodoo programming, because > >> it applies to the result of the SELECT while promising nothing about > >> the order in which INSERT/RETURNING will act on those rows. > > I wasn't probably clear, it's fine if INSERT/RETURNING order is > > arbitrary, what matters is that the > > autoincementing values is executed in the same order as select, like > > mentioned in this > > previous message > > https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us > > > > Is that not the case? > > > >> Re-reading that 2012 thread, the main new observation I'd make today > >> is that parallel operation is a thing now, and it's not hard to foresee > >> that sometime soon we'll want to parallelize INSERTs. Which'd make it > >> *really* hard to promise anything about the order of RETURNING output. > > I think it's fine not promising anything about the order of RETURNING, but > > it would be very helpful having a way of tracking what input row > > generated a particular > > output row. Basically the sentinel case in the original post, > > without actually having to insert the sentinel into the table. > > > >> I think if you want to use RETURNING with multi-row inserts, the > >> thing to do is more like > >> > >>INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id > >> > >> and then explicitly match up the returned "data" values rather than > >> presuming they appear in the same order you wrote them in in VALUES. > >> Admittedly this might be problematic if some of the VALUES rows > >> are identical, but how much should you care? > > Well, the example is very easy, but it's hard to generalize when > > inserting multiple columns > > with possible complex values in them, since it would mean matching on > > possibly large json values, > > arrays, etc. So definitely not ideal > > > > Thanks, > > Federico > > > Can your client retain a hashmap of md5,data pairings, allowing the > lookup on the way back using the returned data and supplied id? > When using unique columns or similar, that's something that is done, but if there are no unique columns in the value no match can be done reliably with the source data, since sqlalchemy is a library that allows arbitrary schemas to be generated. Thanks for the reply, Federico
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:44, Thorsten Glaser wrote: > > On Tue, 11 Apr 2023, Federico wrote: > > >I was under the impression that when using INSERT SELECT ORDER BY the > >sequence > >ids were generated using the select order. > > But someone said that’s not guaranteed, especially when INSERT will > be parallelised later. It was Tom Lane's message that said > Re-reading that 2012 thread, the main new observation I'd make today > is that parallel operation is a thing now, and it's not hard to foresee > that sometime soon we'll want to parallelize INSERTs. Which'd make it > *really* hard to promise anything about the order of RETURNING output. My reading of it is that we are talking about RETURNING, not about the order in which the serial ids are generated. My understanding was that they are generated in select order, then the rows are inserted in any arbitrary order the planner may choose and returned again in any arbitrary order. If my understanding is incorrect, would this alternative guarantee the above (that nextval is called in the order set by ORDER BY), again re-using the table in the original message? INSERT INTO t(id, data) SELECT nextval(pg_get_serial_sequence('t', 'id')) data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num RETURNING id best, Federico > bye, > //mirabilos > -- > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:46, Rob Sargent wrote: > > > >> Can your client retain a hashmap of md5,data pairings, allowing the > >> lookup on the way back using the returned data and supplied id? > >> > > When using unique columns or similar, that's something that is done, > > but if there are no unique columns in the value no match can be done > > reliably with the source data, since sqlalchemy is a library that > > allows arbitrary schemas to be generated. > > > > Thanks for the reply, > >Federico > So you're returned data is not what was sent to the server? Otherwise it > should generate the same md5, as I understand it. Identical data would > of course be a problem. > That should be the case, yes. If a table has a non-nullable unique key, it should be possible to use a hashmap and perform that lockup. We are planning on implementing something like this to cover the cases where it can be used. Thanks for the reply, Federico
Re: Guidance on INSERT RETURNING order
On Wed, 12 Apr 2023 at 11:46, Thorsten Glaser wrote: > > On Tue, 11 Apr 2023, Federico wrote: > > >My understanding was that they are generated in select order > > But are they? (I don’t know, but I’d not assume that.) That's kind of the point for this question, to see if that's correct or not. > >If my understanding is incorrect, would this alternative guarantee the above > > >INSERT INTO t(id, data) > >SELECT nextval(pg_get_serial_sequence('t', 'id')) data > >FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) > >ORDER BY num > >RETURNING id > > Wouldn’t, at that point, it be better to just send multiple > individual INSERT statements? The overhead (on both sides) > for all mentioned… workarounds… surely is larger than that? No, not by a long shot. Sending thousands of single inserts sequentially over the network requires a lot more time even when doing that on localhost. Using a single statement is many times faster. Federico > bye, > //mirabilos > -- > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > >
Re: Guidance on INSERT RETURNING order
On Wed, 12 Apr 2023 at 23:49, Adrian Klaver wrote: > > On 4/12/23 2:35 PM, Kirk Wolak wrote: > > On Tue, Apr 11, 2023 at 4:38 PM Federico > > > > A couple of comments. For the more generic, I prefer RETURNING * > > you get back all the columns for matching. To me, this solves the > > problem in a very generic way. > > From what I gather from the conversation RETURNING is the red herring. > > The request is that for: > > INSERT INTO some_table(char_fld) VALUES('a'), ('b'), ('c') > > where some_table has an auto increment field that the values created for > said field will always be done in the order that VALUES data was > presented so: > > SELECT id, char_fld from some_table will always return: > > (1, 'a') > (2, 'b') > (3, 'c') It's actually for an insert query like this INSERT INTO t(data) SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num Or even (but the above would be nicer) INSERT INTO t(id, data) SELECT nextval(pg_get_serial_sequence('t', 'id')) data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num Federico > > > But SQL (and SET THEORY) basically imply you cannot trust the sequencing > > of a set of transactions. Parallel execution is just a great simple > > example. > > > > Secondarily, many frameworks I've worked with (and custom ones > > developed) would actually call the SEQUENCE.NEXTVAL, and assign the IDs, > > in memory, accepting that we would have gaping holes if some > > transactions were never actually sent to the server. We did this a lot > > in master-detail GUI type stuff. It's just easier. The children knew > > their parent ID, and all the children ID's were effectively known before > > committing. It made for simple code that never failed. > > (for large datasets we would want one query that returned a set of IDs, > > we could order that. And apply it to the records we were about to > > insert). [Be Careful with GENERATED ALWAYS pks to OVERRIDE] > > > > HTH > > > -- > Adrian Klaver > adrian.kla...@aklaver.com
Re: Guidance on INSERT RETURNING order
On Fri, 14 Apr 2023 at 21:37, David G. Johnston wrote: > > On Fri, Apr 14, 2023 at 11:42 AM John Howroyd > wrote: >> >> @PostgreSQL: Might I ask if this is still being actively considered or >> should we repost to another mailing list (perhaps pgsql-hackers or any other >> you might suggest)? > > > This is the right place for such a discussion. Unless you think you've come > up with a design that you want to propose and that, if accepted, you would > then write up a patch for. For that you'd probably want to move the > discussion to -hackers with an outline of the patch you plan to write. > > David J. > Would something like what was proposed by Mike Bayer be considered? > A new token called "tuple_order" or something > > INSERT INTO table (a, b, c) VALUES ((1, 2, 3), (4, 5, 6), ...) RETURNING > table.id, inserted.tuple_order > > tuple_order would be incrementing values 1, 2, 3, 4, 5, ... which correlate > the each row delivered by RETURNING to each entry in the VALUES clause, in > the order they were stated in that VALUES clause, that is entry (1, 2, 3) > would be tuple_order 1, entry (4, 5, 6) would be tuple order 2, etc. This would allow easy reordering of the RETURNING clause, either client side or moving the INSERT into a CTE and ordering the outside select. I also don't think it would have any impact on parallel processing of the INSERT, since RETURNING could output rows in any arbitrary order. Best, Federico
Re: Guidance on INSERT RETURNING order
On Sat, 15 Apr 2023 at 05:17, Tom Lane wrote: > > Federico writes: > > Would something like what was proposed by Mike Bayer be considered? > > >> A new token called "tuple_order" or something > >> > >> INSERT INTO table (a, b, c) VALUES ((1, 2, 3), (4, 5, 6), ...) RETURNING > >> table.id, inserted.tuple_order > >> > >> tuple_order would be incrementing values 1, 2, 3, 4, 5, ... which > >> correlate the each row delivered by RETURNING to each entry in the VALUES > >> clause, in the order they were stated in that VALUES clause, that is entry > >> (1, 2, 3) would be tuple_order 1, entry (4, 5, 6) would be tuple order 2, > >> etc. > > As proposed, I don't think so. Something over in the RETURNING clause has > exactly no connection to VALUES. What do you do if it's INSERT ... SELECT > and there are several VALUES clauses down inside the SELECT? I think the tuple order would not be connected to the values, but be determined by the input order of the rows in the insert. So when using INSERT ... SELECT the tuple_order value would be determined by the output of the select, using the same logic as ROW_NUMBER. > There is some prior art in this area, though. See the more-or-less > SQL-standard WITH ORDINALITY option for functions-in-FROM. It seems to me > that it could be plausible to attach WITH ORDINALITY to a VALUES clause, > which would give you a rock-solid connection between the VALUES rows and > the ordinality-column values, and then you could include that column in > RETURNING. Well if returning could output rows that are not in the inserted table, like mentioned also by Devid Johnston, it would open to alternatives to tuple_order, like using INSERT INTO tbl(data) SELECT v.d FROM (VALUES ('x', 1), ('y', 2)) AS v(d, num) RETURNING tbl.id, v.num > regards, tom lane I'm not familiar with the internals of Postgresql, so I don't know what would be more complex to implement in this case. I think both solutions would work equally well from the client point of view. Thanks for the reply, Federico
Re: Guidance on INSERT RETURNING order
On Sat, 15 Apr 2023 at 15:40, Tom Lane wrote: > > Federico writes: > > I think the tuple order would not be connected to the values, but be > > determined by the input order of the rows in the insert. So when using > > INSERT ... SELECT the tuple_order value would be determined by the > > output of the select, using the same logic as ROW_NUMBER. > > We already rejected the idea that INSERT must preserve the order of > the incoming tuples. Please don't re-propose it with different > wording. > > regards, tom lane That was not my intention, what I meant is that the hypothetical inserted.tuple_order in case of insert .. select would be determined by the output order of the select. Insert would be free to insert the row as it pleases. Sorry if it wasn't clear. Best Federico
Re: Guidance on INSERT RETURNING order
On Tue, 18 Apr 2023 at 00:21, John Howroyd wrote: > > May I clarify the ideas being discussed so far, perhaps with a view to make a > relevant proposal. My apologies if I get anything wrong or go too far. > > As I understand it the proposal is to supplement the syntax to something like: > > INSERT INTO table (a, b, c) > VALUES ((1,2,3), (4,5,6), ...) > WITH ORDINALITY > RETURNING table.id, ordinality > ; > > The meaning of which is to adjoin an ordinality column to the output > reflecting the declaration order in the values clause. So an output of (not > necessarily in any order): > (1001, 1) > (1003, 2) > means that table.id = 1001 was assigned to the inserted row from tuple > (1,2,3) (from VALUES, because that table.id is associated to ordinality = 1) > and table.id = 1003 was assigned to the inserted row from tuple (4,5,6). > The output being ordered as determined by the internals of query execution > (not necessarily the one shown). > > Is that correct? That would work as syntax for the task of tracking what id or other server default is generated by a value clause tuple. > I presume (although, not quite so clear) that one would have: > > INSERT INTO table (a, b, c) > SELECT a_val, b_val, c_val > FROM joined_tables > WHERE some_condition > ORDER BY something_relevant > WITH ORDINALITY > RETURNING table.id, ordinality > ; > > The meaning being very much as before replacing 'declaration order' by 'row > order of the SELECT statement as defined by the ORDER BY clause'; so pretty > much like a row_number() but in the output of the RETURNING clause (and > without an OVER modification). I added the ORDER BY clause as I don't really > see what this would mean without it; but this (presumably) does not affect > output order only the order of the incoming rows (and hence the generation of > the ordinality output). > > Is that correct? This would not be needed if the syntax with VALUES WITH ORDINALITY is added in sqlalchemy. So fine either way. If "WITH ORDINALITY" is a feature of VALUES this syntax would not be allowed though. I'm personally ok limiting WITH ORDINALITY only to VALUES. > Might there be a natural syntax to label the 'ordinality' output column? > Perhaps something like: > > ... > WITH ORDINALITY (col_name) > RETURNING table.id, col_name > ; > > I don't want to clash with the syntax for Table Functions. > > Is it a step too far to propose allowing an additional ORDER BY clause after > the RETURNING clause (a specific declaration for the query execution to > assign cpu cycles; especially if the WITH ORDINALITY is not tied to output > order)? > > Personally, I didn't see Frederico's comment as anything to do with order; > just how one could output additional values in the RETURNING clause (namely, > v.num from a subexpression of the SELECT but in whatever order it comes). On > the other hand, that seems a lot more complicated to me because it is not an > expression in the overall SELECT feeding the INSERT, whereas the WITH > ORDINALITY is a specific declaration to match input order with output order > by inserting a counter. I didn't mean to suggest any particular order should be kept by insert or by returning. I was merely commenting on the David G. Johnston reply I suppose breaking the restriction that only columns present on the insertion-table can be returned is a possible option that also solves another infrequent request. > Apologies, if I have misunderstood or invented something that's not possible! Thanks for the recap. I'm hoping this can become a proposal. Best, Federico
Re: Guidance on INSERT RETURNING order
On Tue, 18 Apr 2023 at 11:53, John Howroyd wrote: > > Sorry, I may have jumped to a conclusion that's not quite correct. Not at all, thanks for moving this along > On Mon, 17 Apr 2023 at 23:58, Federico wrote: >> >> On Tue, 18 Apr 2023 at 00:21, John Howroyd wrote: >> > ... >> > >> > Personally, I didn't see Frederico's comment as anything to do with order; >> > just how one could output additional values in the RETURNING clause >> > (namely, v.num from a subexpression of the SELECT but in whatever order it >> > comes). On the other hand, that seems a lot more complicated to me >> > because it is not an expression in the overall SELECT feeding the INSERT, >> > whereas the WITH ORDINALITY is a specific declaration to match input order >> > with output order by inserting a counter. >> >> I didn't mean to suggest any particular order should be kept by insert >> or by returning. I was merely commenting on the David G. Johnston >> reply >> >> I suppose breaking the restriction that only columns present on >> the insertion-table can be returned is a possible option that also >> solves another infrequent request. >> >> >> ... >> Best, >> Federico > > > This might be a possibility. The v.num (from the original example) is > accessible in the outer select, so one can a envisage a system to handle > this, but at (presumably) much greater expense: preparation of the SELECT, > orchestration of the output rows (some fields for INSERT some for RETURNING) > and (presumably) a whole load of mem copies to RETURNING. Is this something > to consider (perhaps just for feasibility while writing an initial patch > based on WITH ORDINALITY)? > > To put it another way, v.num is a valid expression in the overall SELECT, but > still a lot more complicated (and presumably expensive). I think it's probably best to focus only on INSERT VALUES WITH ORDINALITY. Further enhancements can be added later if they seem useful. Best, Federico
R: Re: License Question
Steve Crawford ha scritto > On Thu, May 31, 2018 at 7:35 AM Jamie Specter wrote: > > ... > > I would love to use it in an Apache-licensed project but unfortunately, LGPL > licenses are not compatible with Apache 2.0. > ... > > Is there any chance the project can be dual-licensed with a permissive option > like MIT or BSD? ... > > > That is a question better suited for the psycopg2 developers: The psycopg developers are on this list. :) And we asked the OP to write here so that everybody could follow the discussion (initially he opened a bug report). federico
R: Re: License Question
Federico Di Gregorio ha scritto > Steve Crawford ha scritto > > > > On Thu, May 31, 2018 at 7:35 AM Jamie Specter > > wrote: > > > > > > ... > > > > > > I would love to use it in an Apache-licensed project but unfortunately, > > LGPL licenses are not compatible with Apache 2.0. > > > ... > > > > > > Is there any chance the project can be dual-licensed with a permissive > > option like MIT or BSD? ... > > > > > > > > > That is a question better suited for the psycopg2 developers: > > > The psycopg developers are on this list. :) > > > And we asked the OP to write here so that everybody could follow the > discussion (initially he opened a bug report). > > > federico Obviously not on "general". Sorry for the noiose. :/ federico