Order by in a sub query when aggregating the main query

2022-09-24 Thread Federico
Hi all,

I have a question related to the order by clause used in a subquery of
a main query that uses one or more aggregation functions with a group
by.
A basic example of the type of query in question is the following (see
below for the actual query):

  select w, array_agg(x)
  from (
select v, v / 10 as w
from pg_catalog.generate_series(25, 0, -1) as t(v)
order by v
  ) as t(x)
  group by w

This query will return an ordered array as specified by the order by
clause.in the subquery.
Can this behaviour be relied upon?
>From what I could find from searching in SQL the order by in a
subquery could be ignored by the engines, but I've found that
postgresql will always respect it.

The context of the question is the updated reflection logic that will
be introduced in version 2 of SQLAlchemy, that makes use of orderby in
subqueries to, for example, match column index of a constraint with
the column name of a table. This query and other similar one return
the correct result, and they seem stable in their output (ie the CI is
not randomly failing because the order has changed). For more
information this potential issue with the current query is traket in
the issue https://github.com/sqlalchemy/sqlalchemy/issues/8561
Below is the full query that will be used in sqlalchemy to reflect
constraints given the constraint type and on a list of table oids:

  select
attr.conrelid,
array_agg(attr.attname) as cols,
attr.conname,
min(attr.description) as description
  from (
select
  con.conrelid as conrelid,
  con.conname as conname,
  con.description as description,
  pg_catalog.pg_attribute.attname as attname
from pg_catalog.pg_attribute
join (
  select
pg_catalog.pg_constraint.conrelid as conrelid,
pg_catalog.pg_constraint.conname as conname,
unnest(pg_catalog.pg_constraint.conkey) as attnum,
generate_subscripts(pg_catalog.pg_constraint.conkey,
%(generate_subscripts_1)s) as ord,
pg_catalog.pg_description.description as description
  from pg_catalog.pg_constraint
  left outer join pg_catalog.pg_description on
pg_catalog.pg_description.objoid = pg_catalog.pg_constraint.oid
  where
pg_catalog.pg_constraint.contype = :contype
and pg_catalog.pg_constraint.conrelid in (:oids)
  ) as con on
  pg_catalog.pg_attribute.attnum = con.attnum
  and pg_catalog.pg_attribute.attrelid = con.conrelid
order by con.conname, con.ord
  ) as attr
  group by attr.conrelid, attr.conname
  order by attr.conrelid, attr.conname

The other reflection queries that use order by in subqueries are
similar to the above, I can post them here if they may prove useful.

Thank you
Federico




Re: Order by in a sub query when aggregating the main query

2022-09-24 Thread Federico
Understood, thanks for the explanation.
I'll work on updating the queries used by sqlalchemy to do array_agg(x
order by x) instead of the order by in the subquery.

> I think that right now that'd
> incur additional sorting overhead, which is annoying.  But work is
> ongoing to recognize when the input is already correctly sorted
> for an aggregate, so it should get better in PG 16 or so.

Nice to know, hopefully it's too bad for this use case

Thanks, Federico Caselli

On Sun, 25 Sept 2022 at 00:20, Tom Lane  wrote:
>
> Federico  writes:
> > A basic example of the type of query in question is the following (see
> > below for the actual query):
>
> >   select w, array_agg(x)
> >   from (
> > select v, v / 10 as w
> > from pg_catalog.generate_series(25, 0, -1) as t(v)
> > order by v
> >   ) as t(x)
> >   group by w
>
> > This query will return an ordered array as specified by the order by
> > clause.in the subquery.
> > Can this behaviour be relied upon?
>
> No, not really.  It might always work given a particular set of
> circumstances.  As long as the planner chooses to do the outer
> query's grouped aggregation as a HashAgg, there'd be no reason
> for it to reshuffle the subquery output before feeding that to
> array_agg.  However, if it decided that sort-group-and-aggregate
> was better, it'd insert a sort by w above the subquery, and then
> you'd lose any certainty of the ordering by v continuing to hold.
> (Maybe the sort by w would be stable for equal keys, but that's
> not guaranteed.)
>
> What you really ought to do is write
>
>   select w, array_agg(x order by x)
>   from ...
>
> to be in the clear per SQL standard.  I think that right now that'd
> incur additional sorting overhead, which is annoying.  But work is
> ongoing to recognize when the input is already correctly sorted
> for an aggregate, so it should get better in PG 16 or so.
>
> regards, tom lane




Re: Order by in a sub query when aggregating the main query

2022-09-27 Thread Federico
I've changed the code to use order by in the aggregate and it seems
there are no noticeable changes in the query performance.
Thanks for the help.

Best,
Federico Caselli

On Sun, 25 Sept 2022 at 00:30, Federico  wrote:
>
> Understood, thanks for the explanation.
> I'll work on updating the queries used by sqlalchemy to do array_agg(x
> order by x) instead of the order by in the subquery.
>
> > I think that right now that'd
> > incur additional sorting overhead, which is annoying.  But work is
> > ongoing to recognize when the input is already correctly sorted
> > for an aggregate, so it should get better in PG 16 or so.
>
> Nice to know, hopefully it's too bad for this use case
>
> Thanks, Federico Caselli
>
> On Sun, 25 Sept 2022 at 00:20, Tom Lane  wrote:
> >
> > Federico  writes:
> > > A basic example of the type of query in question is the following (see
> > > below for the actual query):
> >
> > >   select w, array_agg(x)
> > >   from (
> > > select v, v / 10 as w
> > > from pg_catalog.generate_series(25, 0, -1) as t(v)
> > > order by v
> > >   ) as t(x)
> > >   group by w
> >
> > > This query will return an ordered array as specified by the order by
> > > clause.in the subquery.
> > > Can this behaviour be relied upon?
> >
> > No, not really.  It might always work given a particular set of
> > circumstances.  As long as the planner chooses to do the outer
> > query's grouped aggregation as a HashAgg, there'd be no reason
> > for it to reshuffle the subquery output before feeding that to
> > array_agg.  However, if it decided that sort-group-and-aggregate
> > was better, it'd insert a sort by w above the subquery, and then
> > you'd lose any certainty of the ordering by v continuing to hold.
> > (Maybe the sort by w would be stable for equal keys, but that's
> > not guaranteed.)
> >
> > What you really ought to do is write
> >
> >   select w, array_agg(x order by x)
> >   from ...
> >
> > to be in the clear per SQL standard.  I think that right now that'd
> > incur additional sorting overhead, which is annoying.  But work is
> > ongoing to recognize when the input is already correctly sorted
> > for an aggregate, so it should get better in PG 16 or so.
> >
> > regards, tom lane




Guidance on INSERT RETURNING order

2023-04-11 Thread Federico
Hello list,

I have a few clarification questions regarding using insert with returning.
The use case is SQLAlchemy development, where the orm wants to insert
a list of rows,
get back the generated ids, defaults, etc, and match the returned values with
the original list of orm objects.

The following assumes a table like this

CREATE TABLE t(
id SERIAL,
data TEXT -- type here can be anything
)

On PostgreSQL, since SQLAlchemy version 1.4 (Mar 2021) this operation has used

  INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING id

but we were recently made aware that there is no guarantee on the
order of the returned columns.
Looking at the documentation there is no mention of the order of the
RETURNING clause, but searching
past discussion there are multiple indication that the order is not
guaranteed, like
https://www.postgresql.org/message-id/19445.1350482182%40sss.pgh.pa.us
. I think the docs
should mention this, similar to what the sqlite docs do at
https://www.sqlite.org/lang_returning.html#limitations_and_caveats

Searching the archive seems that a using the INSERT SELECT ORDER BY
form should be a better solution,
so the above insert should be rewritten as

INSERT INTO t(data)
SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num
RETURNING id

to ensure that the id are created in the order specified by num. The
returned id can again be in
arbitrary order, but sorting them should enable correctly matching the
orm object so that they can
be properly updated.
Is this correct?
The documentation does not say anything about this, and looking at the
archive it seems that
it's mostly correct but not 100% guaranteed, as stated here
https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us .
The MSSQL docs, for example, clearly state that this is the case
https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver16#limitations-and-restrictions
,
so it would be helpful if something similar were mentioned in the
PostgreSQL docs.

The above insert form (INSERT SELECT ORDER BY) can be used when the
primary key is an auto incrementing value,
in case it isn't (such as when it's an UUID), another solution must be used.
Since there does not seem to be any way of getting the position of the
original row inside
the VALUES clause with RETURNING, the solution SQLAlchemy is
implementing is to either degrade to
inserts with a single value or to optionally allow the user to add a
"sentinel" column to the table,
so that a sequential value can be inserted into it and then returned
allowing the ordering of the
RETURNING clause rows:

ALTER TABLE t ADD COLUMN sentinel SMALLINT

INSERT INTO t(data, sentinel) VALUES ('a', 1), ('b', 2), ('c', 3)
RETURNING id, sentinel

Is there any better solution to achieve this? (For reference this
feature is tracked in SQLAlchemy by
https://github.com/sqlalchemy/sqlalchemy/issues/9618)

>From an ORM standpoint it would be very useful having a way of forcing
the order of RETURNING
to be the same as the one in VALUES, maybe with an additional keyword.
Alternatively having a system column
or other function that can be placed into the returning clause to
return the output row position wrt the
input values list, similar to what the sentinel column above does.

At the very least I think the documentation could do a better job at
mentioning that RETURNING order is
arbitrary, and documenting that INSERT SELECT ORDER BY precesses the
default in select order
(if that's indeed the case)

Sorry for the long email,
Thanks

Federico




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Federico
Thanks for the ansers

> 2) What would you order by, id or data or both?

by values order, (that incidentally seems to be what PG does)

> with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
> returning id)
> select i.id from i order by id;

The problem here is not having the auto increment id in a particular
order, is that there
is apparently no correlation with the position of an element in the
values clause with the
id generated. That's the reason for using the sentinel column in the
general solution in the previous message.

The extend on the use case, SQLAlchemy has 3 objects T that have
T(data='a'), T(data='b'), T(data='c') but no
value for the id column. The objective is to insert the 3 data values,
get back the ids and correctly match them with
the correct 3 objects.

> No.  Sadly, adding that ORDER BY is just voodoo programming, because
> it applies to the result of the SELECT while promising nothing about
> the order in which INSERT/RETURNING will act on those rows.

I wasn't probably clear, it's fine if INSERT/RETURNING order is
arbitrary, what matters is that the
autoincementing values is executed in the same order as select, like
mentioned in this
previous message
https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us

Is that not the case?

> Re-reading that 2012 thread, the main new observation I'd make today
> is that parallel operation is a thing now, and it's not hard to foresee
> that sometime soon we'll want to parallelize INSERTs.  Which'd make it
> *really* hard to promise anything about the order of RETURNING output.

I think it's fine not promising anything about the order of RETURNING, but
it would be very helpful having a way of tracking what input row
generated a particular
output row. Basically the sentinel case in the original post,
without actually having to insert the sentinel into the table.

> I think if you want to use RETURNING with multi-row inserts, the
> thing to do is more like
>
>   INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id
>
> and then explicitly match up the returned "data" values rather than
> presuming they appear in the same order you wrote them in in VALUES.
> Admittedly this might be problematic if some of the VALUES rows
> are identical, but how much should you care?

Well, the example is very easy, but it's hard to generalize when
inserting multiple columns
with possible complex values in them, since it would mean matching on
possibly large json values,
arrays, etc. So definitely not ideal

Thanks,
Federico

On Tue, 11 Apr 2023 at 22:06, Adrian Klaver  wrote:
>
> On 4/11/23 12:47, Federico wrote:
> > Hello list,
> >
> https://www.sqlite.org/lang_returning.html#limitations_and_caveats
> >
> > Searching the archive seems that a using the INSERT SELECT ORDER BY
> > form should be a better solution,
> > so the above insert should be rewritten as
> >
> >  INSERT INTO t(data)
> >  SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
> > num) ORDER BY num
> >  RETURNING id
>
> Or
>
> with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
> returning id)
> select i.id from i order by id;
>
> > Sorry for the long email,
> > Thanks
> >
> >  Federico
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Federico
On Tue, 11 Apr 2023 at 22:59, Thorsten Glaser  wrote:
>
> On Tue, 11 Apr 2023, Federico wrote:
>
> >The problem here is not having the auto increment id in a particular
>
> The id might not even be auto-increment but UUID or something…
> (I am surprised you would even try to insert multiple rows at once.)

Well the documentation makes no mention of any limitation on returning
and the observed behaviour has consistently been that returning is in
values order.
Again, that was SQLAlchemy's fault for assuming this (but the docs
surely did not help).

Also re-reading my reply, I've made a typo there, sorry. What it
should have read is:
  The problem here is not having the returned ids in a particular
order, is that there is apparently no correlation with the position of
an element in the values clause with the id generated.

Of course sorting the returned ids is only viable when using a serial
or identity column, that's why in the general case I've mentioned the
insert with sentinel column to ask if there are better or alternative
solutions.

Thanks for the reply, best
Federico

>
> bye,
> //mirabilos
> --
> 15:41⎜ Somebody write a testsuite for helloworld :-)
>
>
>
>




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Federico
On Tue, 11 Apr 2023 at 23:22, Thorsten Glaser  wrote:
>
> On Tue, 11 Apr 2023, Federico wrote:
>
> >Of course sorting the returned ids is only viable when using a serial
>
> Yes, which is why I pointed out it doesn’t have to be.
>
> >or identity column, that's why in the general case I've mentioned the
> >insert with sentinel column
>
> But it was pointed out that that’s not guaranteed either, unless you add
> that sentinel column to the table itself…

I was under the impression that when using INSERT SELECT ORDER BY the sequence
ids were generated using the select order.
That has been mentioned in multiple other previous messages, like
https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us

The above does not cover all cases, but in practice serial or identity
are very common,
so it would be nice if at least in these cases a sentinel is not needed

Thanks,
  Federico

> bye,
> //mirabilos
> --
> 15:41⎜ Somebody write a testsuite for helloworld :-)
>
>
>
>




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Federico
On Tue, 11 Apr 2023 at 23:31, Rob Sargent  wrote:
>
> On 4/11/23 14:37, Federico wrote:
> >
> > The problem here is not having the auto increment id in a particular
> > order, is that there
> > is apparently no correlation with the position of an element in the
> > values clause with the
> > id generated. That's the reason for using the sentinel column in the
> > general solution in the previous message.
> >
> > The extend on the use case, SQLAlchemy has 3 objects T that have
> > T(data='a'), T(data='b'), T(data='c') but no
> > value for the id column. The objective is to insert the 3 data values,
> > get back the ids and correctly match them with
> > the correct 3 objects.
> >
> >> No.  Sadly, adding that ORDER BY is just voodoo programming, because
> >> it applies to the result of the SELECT while promising nothing about
> >> the order in which INSERT/RETURNING will act on those rows.
> > I wasn't probably clear, it's fine if INSERT/RETURNING order is
> > arbitrary, what matters is that the
> > autoincementing values is executed in the same order as select, like
> > mentioned in this
> > previous message
> > https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us
> >
> > Is that not the case?
> >
> >> Re-reading that 2012 thread, the main new observation I'd make today
> >> is that parallel operation is a thing now, and it's not hard to foresee
> >> that sometime soon we'll want to parallelize INSERTs.  Which'd make it
> >> *really* hard to promise anything about the order of RETURNING output.
> > I think it's fine not promising anything about the order of RETURNING, but
> > it would be very helpful having a way of tracking what input row
> > generated a particular
> > output row. Basically the sentinel case in the original post,
> > without actually having to insert the sentinel into the table.
> >
> >> I think if you want to use RETURNING with multi-row inserts, the
> >> thing to do is more like
> >>
> >>INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id
> >>
> >> and then explicitly match up the returned "data" values rather than
> >> presuming they appear in the same order you wrote them in in VALUES.
> >> Admittedly this might be problematic if some of the VALUES rows
> >> are identical, but how much should you care?
> > Well, the example is very easy, but it's hard to generalize when
> > inserting multiple columns
> > with possible complex values in them, since it would mean matching on
> > possibly large json values,
> > arrays, etc. So definitely not ideal
> >
> > Thanks,
> > Federico
> >
> Can your client retain a hashmap of md5,data pairings, allowing the
> lookup on the way back using the returned data and supplied id?
>

When using unique columns or similar, that's something that is done,
but if there are no unique columns in the value no match can be done
reliably with the source data, since sqlalchemy is a library that
allows arbitrary schemas to be generated.

Thanks for the reply,
  Federico




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Federico
On Tue, 11 Apr 2023 at 23:44, Thorsten Glaser  wrote:
>
> On Tue, 11 Apr 2023, Federico wrote:
>
> >I was under the impression that when using INSERT SELECT ORDER BY the 
> >sequence
> >ids were generated using the select order.
>
> But someone said that’s not guaranteed, especially when INSERT will
> be parallelised later.

It was Tom Lane's message that said
> Re-reading that 2012 thread, the main new observation I'd make today
> is that parallel operation is a thing now, and it's not hard to foresee
> that sometime soon we'll want to parallelize INSERTs.  Which'd make it
> *really* hard to promise anything about the order of RETURNING output.

My reading of it is that we are talking about RETURNING, not about the order
in which the serial ids are generated. My understanding was that they
are generated
in select order, then the rows are inserted in any arbitrary order the
planner may choose
and returned again in any arbitrary order.

If my understanding is incorrect, would this alternative guarantee the above
(that nextval is called in the order set by ORDER BY),
again re-using the table in the original message?

INSERT INTO t(id, data)
SELECT nextval(pg_get_serial_sequence('t', 'id')) data
FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
ORDER BY num
RETURNING id

best,
  Federico


> bye,
> //mirabilos
> --
> 15:41⎜ Somebody write a testsuite for helloworld :-)
>
>
>
>




Re: Guidance on INSERT RETURNING order

2023-04-11 Thread Federico
On Tue, 11 Apr 2023 at 23:46, Rob Sargent  wrote:
>
>
> >> Can your client retain a hashmap of md5,data pairings, allowing the
> >> lookup on the way back using the returned data and supplied id?
> >>
> > When using unique columns or similar, that's something that is done,
> > but if there are no unique columns in the value no match can be done
> > reliably with the source data, since sqlalchemy is a library that
> > allows arbitrary schemas to be generated.
> >
> > Thanks for the reply,
> >Federico
> So you're returned data is not what was sent to the server? Otherwise it
> should generate the same md5, as I understand it. Identical data would
> of course be a problem.
>

That should be the case, yes.
If a table has a non-nullable unique key, it should be possible to use
a hashmap and perform that lockup. We are planning on implementing
something like this to cover the cases where it can be used.

Thanks for the reply,
  Federico




Re: Guidance on INSERT RETURNING order

2023-04-12 Thread Federico
On Wed, 12 Apr 2023 at 11:46, Thorsten Glaser  wrote:
>
> On Tue, 11 Apr 2023, Federico wrote:
>
> >My understanding was that they are generated in select order
>
> But are they? (I don’t know, but I’d not assume that.)

That's kind of the point for this question, to see if that's correct or not.

> >If my understanding is incorrect, would this alternative guarantee the above
>
> >INSERT INTO t(id, data)
> >SELECT nextval(pg_get_serial_sequence('t', 'id')) data
> >FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
> >ORDER BY num
> >RETURNING id
>
> Wouldn’t, at that point, it be better to just send multiple
> individual INSERT statements? The overhead (on both sides)
> for all mentioned… workarounds… surely is larger than that?

No, not by a long shot. Sending thousands of single inserts
sequentially over the network requires a lot more time even when doing
that on localhost.
Using a single statement is many times faster.

Federico

> bye,
> //mirabilos
> --
> 15:41⎜ Somebody write a testsuite for helloworld :-)
>
>
>
>




Re: Guidance on INSERT RETURNING order

2023-04-12 Thread Federico
On Wed, 12 Apr 2023 at 23:49, Adrian Klaver  wrote:
>
> On 4/12/23 2:35 PM, Kirk Wolak wrote:
> > On Tue, Apr 11, 2023 at 4:38 PM Federico 
> >
> > A couple of comments.  For the more generic, I prefer RETURNING *
> > you get back all the columns for matching.  To me, this solves the
> > problem in a very generic way.
>
>  From what I gather from the conversation RETURNING is the red herring.
>
> The request is that for:
>
> INSERT INTO some_table(char_fld) VALUES('a'), ('b'), ('c')
>
> where some_table has an auto increment field that the values created for
> said field will always be done in the order that VALUES data was
> presented so:
>
> SELECT id, char_fld from some_table will always return:
>
> (1, 'a')
> (2, 'b')
> (3, 'c')

It's actually for an insert query like this

INSERT INTO t(data)
SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
num) ORDER BY num

Or even (but the above would be nicer)

INSERT INTO t(id, data)
SELECT nextval(pg_get_serial_sequence('t', 'id')) data
FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
ORDER BY num

  Federico
>
> > But SQL (and SET THEORY) basically imply you cannot trust the sequencing
> > of a set of transactions.  Parallel execution is just a great simple
> > example.
> >
> > Secondarily, many frameworks I've worked with (and custom ones
> > developed) would actually call the SEQUENCE.NEXTVAL, and assign the IDs,
> > in memory, accepting that we would have gaping holes if some
> > transactions were never actually sent to the server.  We did this a lot
> > in master-detail GUI type stuff.  It's just easier.  The children knew
> > their parent ID, and all the children ID's were effectively known before
> > committing.  It made for simple code that never failed.
> > (for large datasets we would want one query that returned a set of IDs,
> > we could order that.  And apply it to the records we were about to
> > insert). [Be Careful with GENERATED ALWAYS pks to OVERRIDE]
> >
> > HTH
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com




Re: Guidance on INSERT RETURNING order

2023-04-14 Thread Federico
On Fri, 14 Apr 2023 at 21:37, David G. Johnston
 wrote:
>
> On Fri, Apr 14, 2023 at 11:42 AM John Howroyd  
> wrote:
>>
>> @PostgreSQL:  Might I ask if this is still being actively considered or 
>> should we repost to another mailing list (perhaps pgsql-hackers or any other 
>> you might suggest)?
>
>
> This is the right place for such a discussion.  Unless you think you've come 
> up with a design that you want to propose and that, if accepted, you would 
> then write up a patch for.  For that you'd probably want to move the 
> discussion to -hackers with an outline of the patch you plan to write.
>
> David J.
>

Would something like what was proposed by Mike Bayer be considered?

> A new token called "tuple_order" or something
>
>  INSERT INTO table (a, b, c) VALUES ((1, 2, 3), (4, 5, 6), ...) RETURNING 
> table.id, inserted.tuple_order
>
> tuple_order would be incrementing values 1, 2, 3, 4, 5, ... which correlate 
> the each row delivered by RETURNING to each entry in the VALUES clause, in 
> the order they were stated in that VALUES clause, that is entry (1, 2, 3) 
> would be tuple_order 1, entry (4, 5, 6) would be tuple order 2, etc.

This would allow easy reordering of the RETURNING clause, either
client side or moving the INSERT into a CTE and ordering the outside
select. I also don't think it would have any impact on parallel
processing of the INSERT, since RETURNING could output rows in any
arbitrary order.

Best,
Federico




Re: Guidance on INSERT RETURNING order

2023-04-14 Thread Federico
On Sat, 15 Apr 2023 at 05:17, Tom Lane  wrote:
>
> Federico  writes:
> > Would something like what was proposed by Mike Bayer be considered?
>
> >> A new token called "tuple_order" or something
> >>
> >> INSERT INTO table (a, b, c) VALUES ((1, 2, 3), (4, 5, 6), ...) RETURNING 
> >> table.id, inserted.tuple_order
> >>
> >> tuple_order would be incrementing values 1, 2, 3, 4, 5, ... which 
> >> correlate the each row delivered by RETURNING to each entry in the VALUES 
> >> clause, in the order they were stated in that VALUES clause, that is entry 
> >> (1, 2, 3) would be tuple_order 1, entry (4, 5, 6) would be tuple order 2, 
> >> etc.
>
> As proposed, I don't think so.  Something over in the RETURNING clause has
> exactly no connection to VALUES.  What do you do if it's INSERT ... SELECT
> and there are several VALUES clauses down inside the SELECT?

I think the tuple order would not be connected to the values, but be
determined by the input order of the rows in the insert. So when using
INSERT ... SELECT the tuple_order value would be determined by the
output of the select, using the same logic as ROW_NUMBER.

> There is some prior art in this area, though.  See the more-or-less
> SQL-standard WITH ORDINALITY option for functions-in-FROM.  It seems to me
> that it could be plausible to attach WITH ORDINALITY to a VALUES clause,
> which would give you a rock-solid connection between the VALUES rows and
> the ordinality-column values, and then you could include that column in
> RETURNING.

Well if returning could output rows that are not in the inserted
table, like mentioned also by Devid Johnston, it would open to
alternatives to tuple_order, like using

  INSERT INTO tbl(data)
  SELECT v.d FROM (VALUES ('x', 1), ('y', 2)) AS v(d, num)
  RETURNING tbl.id, v.num

> regards, tom lane

I'm not familiar with the internals of Postgresql, so I don't know
what would be more complex to implement in this case. I think both
solutions would work equally well from the client point of view.

Thanks for the reply,
 Federico




Re: Guidance on INSERT RETURNING order

2023-04-15 Thread Federico
On Sat, 15 Apr 2023 at 15:40, Tom Lane  wrote:
>
> Federico  writes:
> > I think the tuple order would not be connected to the values, but be
> > determined by the input order of the rows in the insert. So when using
> > INSERT ... SELECT the tuple_order value would be determined by the
> > output of the select, using the same logic as ROW_NUMBER.
>
> We already rejected the idea that INSERT must preserve the order of
> the incoming tuples.  Please don't re-propose it with different
> wording.
>
> regards, tom lane

That was not my intention, what I meant is that the hypothetical
inserted.tuple_order in case of insert .. select would be determined
by the output order of the select. Insert would be free to insert the
row as it pleases.
Sorry if it wasn't clear.

Best
  Federico




Re: Guidance on INSERT RETURNING order

2023-04-17 Thread Federico
On Tue, 18 Apr 2023 at 00:21, John Howroyd  wrote:
>
> May I clarify the ideas being discussed so far, perhaps with a view to make a 
> relevant proposal.  My apologies if I get anything wrong or go too far.
>
> As I understand it the proposal is to supplement the syntax to something like:
>
> INSERT INTO table (a, b, c)
> VALUES ((1,2,3), (4,5,6), ...)
> WITH ORDINALITY
> RETURNING table.id, ordinality
> ;
>
> The meaning of which is to adjoin an ordinality column to the output 
> reflecting the declaration order in the values clause.  So an output of (not 
> necessarily in any order):
> (1001, 1)
> (1003, 2)
> means that table.id = 1001 was assigned to the inserted row from tuple 
> (1,2,3) (from VALUES, because that table.id is associated to ordinality = 1) 
> and table.id = 1003 was assigned to the inserted row  from tuple (4,5,6).  
> The output being ordered as determined by the internals of query execution 
> (not necessarily the one shown).
>
> Is that correct?

That would work as syntax for the task of tracking what id or other
server default is generated by a value clause tuple.

> I presume (although, not quite so clear) that one would have:
>
> INSERT INTO table (a, b, c)
> SELECT a_val, b_val, c_val
> FROM joined_tables
> WHERE some_condition
> ORDER BY something_relevant
> WITH ORDINALITY
> RETURNING table.id, ordinality
> ;
>
> The meaning being very much as before replacing 'declaration order' by 'row 
> order of the SELECT statement as defined by the ORDER BY clause'; so pretty 
> much like a row_number() but in the output of the RETURNING clause (and 
> without an OVER modification).  I added the ORDER BY clause as I don't really 
> see what this would mean without it; but this (presumably) does not affect 
> output order only the order of the incoming rows (and hence the generation of 
> the ordinality output).
>
> Is that correct?

This would not be needed if the syntax with VALUES WITH ORDINALITY is
added in sqlalchemy. So fine either way.
If "WITH ORDINALITY" is a feature of VALUES this syntax would not be
allowed though. I'm personally ok limiting WITH ORDINALITY only to
VALUES.

> Might there be a natural syntax to label the 'ordinality' output column?  
> Perhaps something like:
>
> ...
> WITH ORDINALITY (col_name)
> RETURNING table.id, col_name
> ;
>
> I don't want to clash with the syntax for Table Functions.
>
> Is it a step too far to propose allowing an additional ORDER BY clause after 
> the RETURNING clause (a specific declaration for the query execution to 
> assign cpu cycles; especially if the WITH ORDINALITY is not tied to output 
> order)?
>
> Personally, I didn't see Frederico's comment as anything to do with order; 
> just how one could output additional values in the RETURNING clause (namely, 
> v.num from a subexpression of the SELECT but in whatever order it comes).  On 
> the other hand, that seems a lot more complicated to me because it is not an 
> expression in the overall SELECT feeding the INSERT, whereas the WITH 
> ORDINALITY is a specific declaration to match input order with output order 
> by inserting a counter.

I didn't mean to suggest any particular order should be kept by insert
or by returning. I was merely commenting on the David G. Johnston
reply

 I suppose breaking the restriction that only columns present on
the insertion-table can be returned is a possible option that also
solves another infrequent request.

> Apologies, if I have misunderstood or invented something that's not possible!

Thanks for the recap. I'm hoping this can become a proposal.

Best,
  Federico




Re: Guidance on INSERT RETURNING order

2023-04-19 Thread Federico
On Tue, 18 Apr 2023 at 11:53, John Howroyd  wrote:
>
> Sorry, I may have jumped to a conclusion that's not quite correct.

Not at all, thanks for moving this along

> On Mon, 17 Apr 2023 at 23:58, Federico  wrote:
>>
>> On Tue, 18 Apr 2023 at 00:21, John Howroyd  wrote:
>> > ...
>> >
>> > Personally, I didn't see Frederico's comment as anything to do with order; 
>> > just how one could output additional values in the RETURNING clause 
>> > (namely, v.num from a subexpression of the SELECT but in whatever order it 
>> > comes).  On the other hand, that seems a lot more complicated to me 
>> > because it is not an expression in the overall SELECT feeding the INSERT, 
>> > whereas the WITH ORDINALITY is a specific declaration to match input order 
>> > with output order by inserting a counter.
>>
>> I didn't mean to suggest any particular order should be kept by insert
>> or by returning. I was merely commenting on the David G. Johnston
>> reply
>>
>>  I suppose breaking the restriction that only columns present on
>> the insertion-table can be returned is a possible option that also
>> solves another infrequent request.
>>
>>
>> ...
>> Best,
>>   Federico
>
>
> This might be a possibility.  The v.num (from the original example) is 
> accessible in the outer select, so one can a envisage a system to handle 
> this, but at (presumably) much greater expense: preparation of the SELECT, 
> orchestration of the output rows (some fields for INSERT some for RETURNING) 
> and (presumably) a whole load of mem copies to RETURNING.  Is this something 
> to consider (perhaps just for feasibility while writing an initial patch 
> based on WITH ORDINALITY)?
>
> To put it another way, v.num is a valid expression in the overall SELECT, but 
> still a lot more complicated (and presumably expensive).

I think it's probably best to focus only on INSERT VALUES WITH
ORDINALITY. Further enhancements can be added later if they seem
useful.

Best,
  Federico




R: Re: License Question

2018-05-31 Thread Federico Di Gregorio
 Steve Crawford ha scritto 

> On Thu, May 31, 2018 at 7:35 AM Jamie Specter  wrote:
> 
> ...
> 
> I would love to use it in an Apache-licensed project but unfortunately, LGPL 
> licenses are not compatible with Apache 2.0.  
> ...
> 
> Is there any chance the project can be dual-licensed with a permissive option 
> like MIT or BSD? ...
> 
> 
> That is a question better suited for the psycopg2 developers:

The psycopg developers are on this list. :) 

And we asked the OP to write here so that everybody could follow the discussion 
(initially he opened a bug report). 

federico


R: Re: License Question

2018-05-31 Thread Federico Di Gregorio
 Federico Di Gregorio ha scritto 

>  Steve Crawford ha scritto 
> 
> 
> > On Thu, May 31, 2018 at 7:35 AM Jamie Specter  
> > wrote:
> 
> > 
> 
> > ...
> 
> > 
> 
> > I would love to use it in an Apache-licensed project but unfortunately, 
> > LGPL licenses are not compatible with Apache 2.0.  
> 
> > ...
> 
> > 
> 
> > Is there any chance the project can be dual-licensed with a permissive 
> > option like MIT or BSD? ...
> 
> > 
> 
> > 
> 
> > That is a question better suited for the psycopg2 developers:
> 
> 
> The psycopg developers are on this list. :) 
> 
> 
> And we asked the OP to write here so that everybody could follow the 
> discussion (initially he opened a bug report). 
> 
> 
> federico

Obviously not on "general". Sorry for the noiose. :/

federico