date:20230227

Re: Interval in hours but not in days Leap second not taken into account

2023-02-27 Thread Laurenz Albe

On Mon, 2023-02-27 at 07:26 +, PALAYRET Jacques wrote:
> # An interval in " years months ... seconds " given in seconds by 
> EXTRACT(EPOCH ...) transtyped into INTERVAL :
> SELECT (EXTRACT(EPOCH FROM ('3 years 2 months 1 day 10 hours 11 minutes 12 
> seconds'::interval) ) || ' seconds')::interval ;
>   interval
> -
>  27772:11:12
> 
> # The same interval in seconds formated with TO_CHAR() :
> SELECT TO_CHAR((EXTRACT(EPOCH FROM ('3 years 2 months 1 day 10 hours 11 
> minutes 12 seconds'::interval) ) || ' seconds')::interval, '  mm dd_ 
> hh24-mi-ss ') ;
>   to_char
> ---
>    00 00_ 27754-11-12
> 
> => The result is given in hours ... (not in days ...).
> 
> It is logical that there are neither years nor months because they are not 
> constant
> (leap year or not; a month can contain 31 30 ... days).
> I thought that days were eliminated because of the leap second (extra seconds
> inserted in the UTC time scale); obviously, this is not the case.
> 
> # PostgreSQL does not take into account the additional second (leap second) 
> in some calendar days ; eg. 2016, 31 dec. :
> SELECT to_timestamp('20170102 10:11:12','mmdd hh24:mi:ss') - 
> to_timestamp('20161230 00:00:00','mmdd hh24:mi:ss') intervalle ;
>    intervalle
> -
>  3 days 10:11:12
> --> With postgreSQL, a calendar day is always 86,400 seconds long.
> 
> So, is there a reason for this (interval in hours ...) ?

The best explanation I have is "daylight savings time".
One day is not always 24 hours long.
If you keep the interval in hours, the result is always correct (if you
ignore leap seconds, which PostgreSQL doesn't account for).

Yours,
Laurenz Albe

Re: Interval in hours but not in days Leap second not taken into account

2023-02-27 Thread PALAYRET Jacques



Does PostgreSQL take into account daylight saving time in its calendar?

For the last summer hour of the spring (Daylight Saving Time), on Sunday March 
27, 2022:
SELECT to_timestamp('20220329 00:00:00','mmdd hh24:mi:ss') - 
to_timestamp('20220320 00:00:00','mmdd hh24:mi:ss') intervalle ;
 intervalle

 9 days


Regards
- Mail original -
De: "Laurenz Albe" 
À: "PALAYRET Jacques" , pgsql-gene...@postgresql.org
Envoyé: Lundi 27 Février 2023 09:23:37
Objet: Re: Interval in hours but not in days  Leap second not taken into account

On Mon, 2023-02-27 at 07:26 +, PALAYRET Jacques wrote:
> # An interval in " years months ... seconds " given in seconds by 
> EXTRACT(EPOCH ...) transtyped into INTERVAL :
> SELECT (EXTRACT(EPOCH FROM ('3 years 2 months 1 day 10 hours 11 minutes 12 
> seconds'::interval) ) || ' seconds')::interval ;
>   interval
> -
>  27772:11:12
> 
> # The same interval in seconds formated with TO_CHAR() :
> SELECT TO_CHAR((EXTRACT(EPOCH FROM ('3 years 2 months 1 day 10 hours 11 
> minutes 12 seconds'::interval) ) || ' seconds')::interval, '  mm dd_ 
> hh24-mi-ss ') ;
>   to_char
> ---
>    00 00_ 27754-11-12
> 
> => The result is given in hours ... (not in days ...).
> 
> It is logical that there are neither years nor months because they are not 
> constant
> (leap year or not; a month can contain 31 30 ... days).
> I thought that days were eliminated because of the leap second (extra seconds
> inserted in the UTC time scale); obviously, this is not the case.
> 
> # PostgreSQL does not take into account the additional second (leap second) 
> in some calendar days ; eg. 2016, 31 dec. :
> SELECT to_timestamp('20170102 10:11:12','mmdd hh24:mi:ss') - 
> to_timestamp('20161230 00:00:00','mmdd hh24:mi:ss') intervalle ;
>    intervalle
> -
>  3 days 10:11:12
> --> With postgreSQL, a calendar day is always 86,400 seconds long.
> 
> So, is there a reason for this (interval in hours ...) ?

The best explanation I have is "daylight savings time".
One day is not always 24 hours long.
If you keep the interval in hours, the result is always correct (if you
ignore leap seconds, which PostgreSQL doesn't account for).

Yours,
Laurenz Albe

Re: Interval in hours but not in days Leap second not taken into account

2023-02-27 Thread PALAYRET Jacques



Laurenz Albe, you are right, thank you; actually, it depends (of course) on the 
time zone:

# With GMT (no Daylight Saving Time):

SHOW timezone ;
 TimeZone
--
 GMT

SELECT timestamp with time zone '2022-03-29 12:00:00' - timestamp with time 
zone '2022-03-26 12:00:00' ;
 ?column?
--
 3 days

# With a time zone that IS dst (Daylight Saving Time):
SET timezone='Africa/Casablanca' ;

SHOW timezone ;
 TimeZone
---
 Africa/Casablanca

SELECT timestamp with time zone '2022-03-29 12:00:00' - timestamp with time 
zone '2022-03-26 12:00:00' ;
?column?
-
 3 days 01:00:00


Regards
- Mail original -
De: "PALAYRET Jacques" 
À: "Laurenz Albe" 
Cc: pgsql-gene...@postgresql.org
Envoyé: Lundi 27 Février 2023 09:50:02
Objet: Re: Interval in hours but not in days  Leap second not taken into account

Does PostgreSQL take into account daylight saving time in its calendar?

For the last summer hour of the spring (Daylight Saving Time), on Sunday March 
27, 2022:
SELECT to_timestamp('20220329 00:00:00','mmdd hh24:mi:ss') - 
to_timestamp('20220320 00:00:00','mmdd hh24:mi:ss') intervalle ;
 intervalle

 9 days


Regards
- Mail original -
De: "Laurenz Albe" 
À: "PALAYRET Jacques" , pgsql-gene...@postgresql.org
Envoyé: Lundi 27 Février 2023 09:23:37
Objet: Re: Interval in hours but not in days  Leap second not taken into account

On Mon, 2023-02-27 at 07:26 +, PALAYRET Jacques wrote:
> # An interval in " years months ... seconds " given in seconds by 
> EXTRACT(EPOCH ...) transtyped into INTERVAL :
> SELECT (EXTRACT(EPOCH FROM ('3 years 2 months 1 day 10 hours 11 minutes 12 
> seconds'::interval) ) || ' seconds')::interval ;
>   interval
> -
>  27772:11:12
> 
> # The same interval in seconds formated with TO_CHAR() :
> SELECT TO_CHAR((EXTRACT(EPOCH FROM ('3 years 2 months 1 day 10 hours 11 
> minutes 12 seconds'::interval) ) || ' seconds')::interval, '  mm dd_ 
> hh24-mi-ss ') ;
>   to_char
> ---
>    00 00_ 27754-11-12
> 
> => The result is given in hours ... (not in days ...).
> 
> It is logical that there are neither years nor months because they are not 
> constant
> (leap year or not; a month can contain 31 30 ... days).
> I thought that days were eliminated because of the leap second (extra seconds
> inserted in the UTC time scale); obviously, this is not the case.
> 
> # PostgreSQL does not take into account the additional second (leap second) 
> in some calendar days ; eg. 2016, 31 dec. :
> SELECT to_timestamp('20170102 10:11:12','mmdd hh24:mi:ss') - 
> to_timestamp('20161230 00:00:00','mmdd hh24:mi:ss') intervalle ;
>    intervalle
> -
>  3 days 10:11:12
> --> With postgreSQL, a calendar day is always 86,400 seconds long.
> 
> So, is there a reason for this (interval in hours ...) ?

The best explanation I have is "daylight savings time".
One day is not always 24 hours long.
If you keep the interval in hours, the result is always correct (if you
ignore leap seconds, which PostgreSQL doesn't account for).

Yours,
Laurenz Albe

Repear operations on 50 tables of the same schema?

2023-02-27 Thread celati Laurent

Good morning,
I am new to Postgresql. I have 50 tables into a "ign" schema (schema other
than public).
I would like for each of these 50 tables:

- Add a prefix to the name of the table: "IGN_bdTopo_"
- Add a suffix to the table name: "_V1"
- create a new "date" column of date type. And populate this field with the
value: 06/15/2021
- create a new "source" column of type varchar (length 50). And populate
this field with the value: 'ign'.
- move all the elements of these 50 tables (including all). from the "ign"
schema to the "ign_v2" schema. Whether data, constraints, indexes.

If someone could  help me? Thank you so much.

Re: ERROR: unsupported Unicode escape sequence - in JSON-type column

2023-02-27 Thread Laurenz Albe

On Mon, 2023-02-27 at 06:28 +, Jan Bilek wrote:
> Our customer was able to sneak in an Unicode data into a column of a JSON 
> Type and now that record fails on select.
> Would you be able to suggest any way out of this? E.g. finding infringing 
> row, updating its data ... ?

I'd be curious to know how the customer managed to do that.
Perhaps there is a loophole in PostgreSQL that needs to be fixed.

First, find the table that contains the column.
Then you can try something like

  DO
  $$DECLARE
 pkey bigint;
  BEGIN
 FOR pkey IN SELECT id FROM jsontab LOOP
BEGIN  -- starts block with exception handler
   PERFORM jsoncol -> 'creationDateTime'
   FROM jsontab
   WHERE id = pkey;
EXCEPTION
   WHEN untranslatable_character THEN
  RAISE NOTICE 'bad character in line with id = %', pkey;
END;
 END LOOP;
  END;$$;

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Event Triggers unable to capture the DDL script executed

2023-02-27 Thread Laurenz Albe

On Mon, 2023-02-27 at 03:52 +, Neethu P wrote:
> Is it possible to access the pg_ddl_command using C function? In the shared 
> links,
> I couldn't find one. Can you please share an example for the same?

No.  You could hire a professional.

Yours,
Laurenz Albe

Re: Repear operations on 50 tables of the same schema?

2023-02-27 Thread Ron


On 2/27/23 05:53, celati Laurent wrote:


Good morning,
I am new to Postgresql. I have 50 tables into a "ign" schema (schema other 
than public).

I would like for each of these 50 tables:

- Add a prefix to the name of the table: "IGN_bdTopo_"
- Add a suffix to the table name: "_V1"
- create a new "date" column of date type. And populate this field with 
the value: 06/15/2021
- create a new "source" column of type varchar (length 50). And populate 
this field with the value: 'ign'.
- move all the elements of these 50 tables (including all). from the "ign" 
schema to the "ign_v2" schema. Whether data, constraints, indexes.


If someone could  help me? Thank you so much.


This is what I'd do, just to get it done.  It presumes you know bash 
scripting, and how to use psql.


https://www.postgresql.org/docs/13/sql-createschema.html
https://www.postgresql.org/docs/13/sql-altertable.html

First, CREATE SCHEMA ign_v2;

Write a bash script that uses psql queries information_schema.tables.  
There's be a for loop for all the tables.


Inside the loop, still using bash, and using psql:
1. create the new table name then execute "ALTER TABLE ... RENAME TO ...;",
2. ALTER TABLE (new_name) ADD COLUMN some_date DATE, ADD COLUMN some_source 
varchar(50);

3. UPDATE (new_name) SET some_date = '2021-06-15'::date, some_source = 'ign';
4. ALTER TABLE (new_name) SET SCHEMA ign_v2;


--
Born in Arizona, moved to Babylonia.

Re: ERROR: unsupported Unicode escape sequence - in JSON-type column

2023-02-27 Thread Erik Wienhold

> On 27/02/2023 13:13 CET Laurenz Albe  wrote:
>
> I'd be curious to know how the customer managed to do that.
> Perhaps there is a loophole in PostgreSQL that needs to be fixed.

Probably via some data access layer and not directly via Postgres.  It's easy
to reproduce with psycopg:

import psycopg

with psycopg.connect() as con:
con.execute('create temp table jsontab (jsoncol json)')
con.execute(
'insert into jsontab (jsoncol) values (%s)',
[psycopg.types.json.Json('\0')],
)

with con.execute('select jsoncol from jsontab') as cur:
print(cur.fetchall())

try:
with con.execute('select jsoncol::jsonb from jsontab') as cur:
pass
raise AssertionError("jsonb should fail")
except psycopg.errors.UntranslatableCharacter:
pass

Another reason to prefer jsonb over json to reject such inputs right away.
The documentation states that json does not validate inputs in constrast to
jsonb.

Of course the OP now has to deal with json.  The data can be sanitized by
replacing all null character escape sequences:

update jsontab
set jsoncol = replace(jsoncol::text, '\u', '')::json
where strpos(jsoncol::text, '\u') > 0;

But the data access layer (or whatever got the json into the database) must be
fixed as well to reject or sanitize those inputs in the future.

--
Erik

pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Dávid Suchan

Hello, I tried upgrading pg db from version 9.6 to 14 by using
pg_upgradecluster command. I freshly installed pg 14 -> ran pg_dropcluster
14 main --stop -> and then upgraded using pg_upgradecluster 9.6 main.
After a successful prompt finished, I checked the database and the size
went from originally 20gb (in 9.6) to 700~ mb (in 14) while the disk space
available shrank by about 2gb meaning that there is still the 20gb of data.
I tried the entire process twice (since I had created an AWS EC2 snapshot
for this) and the result was the same.
Is my solution to migrating old pg version to the new one wrong? Before
this I tried the same process with around 300mb of data and all of that
transferred successfully. If I did not understand the pg_upgradecluster
command, what would be the best practice when upgrading pg version with
huge amounts of data(could be a terabyte)?

Re: ERROR: unsupported Unicode escape sequence - in JSON-type column

2023-02-27 Thread Tom Lane

Erik Wienhold  writes:
>> On 27/02/2023 13:13 CET Laurenz Albe  wrote:
>> I'd be curious to know how the customer managed to do that.
>> Perhaps there is a loophole in PostgreSQL that needs to be fixed.

> Another reason to prefer jsonb over json to reject such inputs right away.
> The documentation states that json does not validate inputs in constrast to
> jsonb.

It's not that it doesn't validate, it's that the validation rules are
different.  Per the manual [1]:

RFC 7159 permits JSON strings to contain Unicode escape sequences
denoted by \u. In the input function for the json type, Unicode
escapes are allowed regardless of the database encoding, and are
checked only for syntactic correctness (that is, that four hex digits
follow \u). However, the input function for jsonb is stricter: it
disallows Unicode escapes for characters that cannot be represented in
the database encoding. The jsonb type also rejects \u (because
that cannot be represented in PostgreSQL's text type), and it insists
that any use of Unicode surrogate pairs to designate characters
outside the Unicode Basic Multilingual Plane be correct.

You can certainly quibble with our decisions here, but I think they
are reasonably consistent.  json is for data that you'd like a syntax
check on (else you might as well store it as "text"), but no more than
a syntax check, because you're going to do the actual JSON processing
elsewhere and you don't want Postgres opining on what semi-standard
JSON constructs mean.  If you're actually going to process the data
inside the database, jsonb is a better choice.  The extra restrictions
in jsonb are to ensure that a string value represented in JSON can be
extracted into a valid string of our text datatype.

Storing data in json and then casting to jsonb on-the-fly seems like
about the worst possible combination of choices.

regards, tom lane

[1] https://www.postgresql.org/docs/current/datatype-json.html

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Adrian Klaver


On 2/27/23 07:44, Dávid Suchan wrote:
Hello, I tried upgrading pg db from version 9.6 to 14 by using 
pg_upgradecluster command. I freshly installed pg 14 -> ran 
pg_dropcluster 14 main --stop -> and then upgraded using 
pg_upgradecluster 9.6 main.
After a successful prompt finished, I checked the database and the size 
went from originally 20gb (in 9.6) to 700~ mb (in 14) while the disk 
space available shrank by about 2gb meaning that there is still the 20gb 
of data. I tried the entire process twice (since I had created an AWS 


1) How did you measure the size of the database clusters?

2) pg_upgrade will not remove the old cluster automatically so it not 
surprising the overall disk usage increased.


3) Did you see any messages at end of upgrade mentioning issues?

4) Have you connected to new cluster to see if everything is there?

5) Does the Postgres log provide any relevant information?


EC2 snapshot for this) and the result was the same.
Is my solution to migrating old pg version to the new one wrong? Before 
this I tried the same process with around 300mb of data and all of that 
transferred successfully. If I did not understand the pg_upgradecluster 
command, what would be the best practice when upgrading pg version with 
huge amounts of data(could be a terabyte)?


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Tom Lane

Adrian Klaver  writes:
> On 2/27/23 07:44, Dávid Suchan wrote:
>> After a successful prompt finished, I checked the database and the size 
>> went from originally 20gb (in 9.6) to 700~ mb (in 14) while the disk 
>> space available shrank by about 2gb meaning that there is still the 20gb 
>> of data. I tried the entire process twice (since I had created an AWS 

> 1) How did you measure the size of the database clusters?

If it was based on something like "du", perhaps the measurement
was fooled by the fact that most of the data files will be hard-linked
between the old and new clusters.

regards, tom lane

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Adrian Klaver


On 2/27/23 08:36, Tom Lane wrote:

Adrian Klaver  writes:

On 2/27/23 07:44, Dávid Suchan wrote:

After a successful prompt finished, I checked the database and the size
went from originally 20gb (in 9.6) to 700~ mb (in 14) while the disk
space available shrank by about 2gb meaning that there is still the 20gb
of data. I tried the entire process twice (since I had created an AWS



1) How did you measure the size of the database clusters?


If it was based on something like "du", perhaps the measurement
was fooled by the fact that most of the data files will be hard-linked
between the old and new clusters.


Does that happen without the --link option?



regards, tom lane


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Tom Lane

Adrian Klaver  writes:
> On 2/27/23 08:36, Tom Lane wrote:
>> If it was based on something like "du", perhaps the measurement
>> was fooled by the fact that most of the data files will be hard-linked
>> between the old and new clusters.

> Does that happen without the --link option?

No, but the OP didn't mention whether he used that, and even if he
didn't say it explicitly the pg_upgradecluster wrapper might've
supplied it.

regards, tom lane

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Adrian Klaver


On 2/27/23 08:48, Tom Lane wrote:

Adrian Klaver  writes:

On 2/27/23 08:36, Tom Lane wrote:

If it was based on something like "du", perhaps the measurement
was fooled by the fact that most of the data files will be hard-linked
between the old and new clusters.



Does that happen without the --link option?


No, but the OP didn't mention whether he used that, and even if he
didn't say it explicitly the pg_upgradecluster wrapper might've
supplied it.


Alright the OP's command was:

pg_upgradecluster 9.6 main

pg_upgradecluster needs the --link(-k) option to use link instead of copy.



regards, tom lane


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Adrian Klaver

On 2/27/23 08:49, Dávid Suchan wrote:

Reply to list
Ccing list for real this time.

1) I used \l+ in psql and then counted rows- millions were missing

\l lists databases.

Are you saying there are millions of database?

Otherwise what rows where you counting?

3) nothing at all, everything was "success"
4) I did not, I presume it is there, the question is why only 700 mb was 
transferred

If you have not connected how could you do the \l and row count?

5) would it be inside main pg log? Or some special one?

Dňa po 27. 2. 2023, 17:14 Adrian Klaver > napísal(a):

On 2/27/23 07:44, Dávid Suchan wrote:
 > Hello, I tried upgrading pg db from version 9.6 to 14 by using
 > pg_upgradecluster command. I freshly installed pg 14 -> ran
 > pg_dropcluster 14 main --stop -> and then upgraded using
 > pg_upgradecluster 9.6 main.
 > After a successful prompt finished, I checked the database and
the size
 > went from originally 20gb (in 9.6) to 700~ mb (in 14) while the disk
 > space available shrank by about 2gb meaning that there is still
the 20gb
 > of data. I tried the entire process twice (since I had created an
AWS

1) How did you measure the size of the database clusters?

2) pg_upgrade will not remove the old cluster automatically so it not
surprising the overall disk usage increased.

3) Did you see any messages at end of upgrade mentioning issues?

4) Have you connected to new cluster to see if everything is there?

5) Does the Postgres log provide any relevant information?

 > EC2 snapshot for this) and the result was the same.
 > Is my solution to migrating old pg version to the new one wrong?
Before
 > this I tried the same process with around 300mb of data and all
of that
 > transferred successfully. If I did not understand the
pg_upgradecluster
 > command, what would be the best practice when upgrading pg
version with
 > huge amounts of data(could be a terabyte)?

-- 
Adrian Klaver

adrian.kla...@aklaver.com 

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Adrian Klaver


On 2/27/23 09:05, Dávid Suchan wrote:

Please use Reply All
Ccing list


My bad,
\l+ lists databases and their respective sizes- I used that and also 
pg_size_pretty(), the result size was the same - before it was 20gb for 
the biggest db, after it was 700mb.
I counted rows before the upgrade in one of the biggest and most 
important table that I was watching and comparing - before there were 
hundreds of thousands or millions of rows(not sure about the exact 
number, just a lot), after the upgrade only like 15, but the first 15 
rows matched after the upgrade when I checked with select of that table.






--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Dávid Suchan

I did not use the -k --link argument while upgrading as that I presume does
not copy the data

Dňa po 27. 2. 2023, 18:10 Adrian Klaver 
napísal(a):

> On 2/27/23 09:05, Dávid Suchan wrote:
>
> Please use Reply All
> Ccing list
>
> > My bad,
> > \l+ lists databases and their respective sizes- I used that and also
> > pg_size_pretty(), the result size was the same - before it was 20gb for
> > the biggest db, after it was 700mb.
> > I counted rows before the upgrade in one of the biggest and most
> > important table that I was watching and comparing - before there were
> > hundreds of thousands or millions of rows(not sure about the exact
> > number, just a lot), after the upgrade only like 15, but the first 15
> > rows matched after the upgrade when I checked with select of that table.
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Laurenz Albe

On Mon, 2023-02-27 at 18:18 +0100, Dávid Suchan wrote:
> I did not use the -k --link argument while upgrading as that I presume does 
> not copy the data 

It would be great if you shared the exact command line you used.

The man page of "pg_upgradecluster" says:

 -m, --method=dump|upgrade|link|clone
 Specify the upgrade method.  dump uses pg_dump(1) and pg_restore(1), 
upgrade uses pg_upgrade(1).  The default is dump.

Yours,
Laurenz Albe

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Adrian Klaver


On 2/27/23 09:10, Adrian Klaver wrote:

On 2/27/23 09:05, Dávid Suchan wrote:

Please use Reply All
Ccing list


My bad,
\l+ lists databases and their respective sizes- I used that and also 
pg_size_pretty(), the result size was the same - before it was 20gb 
for the biggest db, after it was 700mb.
I counted rows before the upgrade in one of the biggest and most 
important table that I was watching and comparing - before there were 
hundreds of thousands or millions of rows(not sure about the exact 
number, just a lot), after the upgrade only like 15, but the first 15 
rows matched after the upgrade when I checked with select of that table.


Hmm, I can't see how you got that state without there being some sort of 
error messages.


Just to be clear:

1) The 9.6 and 14 instances where installed from the same source?

2) Are you using tablespaces other then the default?


Scan the Postgres log for the 14 instance at /var/log/postgresql for 
error messages.




--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Dávid Suchan

I used "pg_upgradecluster 9.6 main", all commands are in my first message:
I freshly installed pg 14 -> ran pg_dropcluster 14 main --stop -> and then
upgraded using pg_upgradecluster 9.6 main.

po 27. 2. 2023 o 18:40 Laurenz Albe  napísal(a):

> On Mon, 2023-02-27 at 18:18 +0100, Dávid Suchan wrote:
> > I did not use the -k --link argument while upgrading as that I presume
> does not copy the data
>
> It would be great if you shared the exact command line you used.
>
> The man page of "pg_upgradecluster" says:
>
>  -m, --method=dump|upgrade|link|clone
>  Specify the upgrade method.  dump uses pg_dump(1) and pg_restore(1),
> upgrade uses pg_upgrade(1).  The default is dump.
>
> Yours,
> Laurenz Albe
>

Re: pg_upgradecluster transfering only a portion of the data

2023-02-27 Thread Dávid Suchan

1) i downloaded both versions using apt-get install postgres
2) i will check the tablespace and log files tomorrow, i don't have access
to the workstation right now.

po 27. 2. 2023 o 18:44 Adrian Klaver  napísal(a):

> On 2/27/23 09:10, Adrian Klaver wrote:
> > On 2/27/23 09:05, Dávid Suchan wrote:
> >
> > Please use Reply All
> > Ccing list
> >
> >> My bad,
> >> \l+ lists databases and their respective sizes- I used that and also
> >> pg_size_pretty(), the result size was the same - before it was 20gb
> >> for the biggest db, after it was 700mb.
> >> I counted rows before the upgrade in one of the biggest and most
> >> important table that I was watching and comparing - before there were
> >> hundreds of thousands or millions of rows(not sure about the exact
> >> number, just a lot), after the upgrade only like 15, but the first 15
> >> rows matched after the upgrade when I checked with select of that table.
>
> Hmm, I can't see how you got that state without there being some sort of
> error messages.
>
> Just to be clear:
>
> 1) The 9.6 and 14 instances where installed from the same source?
>
> 2) Are you using tablespaces other then the default?
>
>
> Scan the Postgres log for the 14 instance at /var/log/postgresql for
> error messages.
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Tempory table is not getting created inside Function in postgres.

2023-02-27 Thread nikhil raj

HI Team,

This is the Function I have created successfully but while executing it
throughs an error temp table doesn't exist.

But the same when I execute it not inside the function from **drop temp
table to end insert select ** it work fine

Please can any one help me why in the function i am not able to create the
temp table. what is alternative


`-- FUNCTION: api.post_publish_Roster()

-- DROP FUNCTION IF EXISTS api."post_publish_Roster"();

CREATE OR REPLACE FUNCTION api."post_publish_Roster"(
)
RETURNS void
LANGUAGE 'sql'
COST 100
VOLATILE PARALLEL UNSAFE
AS $BODY$

DROP TABLE IF EXISTS ROSTER_TABLE;


CREATE TEMP TABLE ROSTER_TABLE AS
SELECT ROSTER_ID,
LINK_ID,
PAYNUMBER,
USERNAME,
LINE_POSITION,
CREWNAME,
WEEKNUMBER,
WEEKSTARTDATE,
WEEKENDDATE
FROM CREW_LINKS.LINKS_MAP
CROSS JOIN LATERAL GET_WEEKS('2023-02-12',

'2023-03-04') AS WEEKDATA
WHERE ROSTER_ID = 234
AND WEEKDATA.WEEKNUMBER in
(SELECT MIN(WEEKNUMBER)
FROM GET_WEEKS('2023-02-12',

'2023-03-04'));

DO $$
DECLARE
   weekstart INTEGER;
   weekend INTEGER ;
BEGIN
   select min(weeknumber) into weekstart  from  get_weeks('2023-02-12',
'2023-03-04');
   select max(weeknumber) into weekend  from  get_weeks('2023-02-12',
'2023-03-04') ;

   WHILE weekstart < weekend LOOP
  INSERT INTO roster_table
  SELECT roster_id, link_id, paynumber, username, line_position+1
AS line_position ,  crewname,rt.weeknumber+1 AS weeknumber
,w.weekstartdate,w.weekenddate
FROM roster_table rt
INNER JOIN
(select  * from  get_weeks('2023-02-12', '2023-03-04'))w
ON w.weeknumber=rt.weeknumber+1
WHERE rt.weeknumber=weekstart;

  update roster_table rw
  set line_position=(select min(line_position) from roster_table )
  where weeknumber=weekstart+1 and line_position =(select
MAX(line_position) from roster_table ) ;

  weekstart := weekstart + 1;
   END LOOP;
END $$;

WITH COMBIN AS
(SELECT R.DEPOT,
R.GRADE,
R.VALID_FROM,
R.VALID_TO,
RD.ROWNUMBER,
RD.SUNDAY,
RD.MONDAY,
RD.TUESDAY,
RD.WEDNESDAY,
RD.THURSDAY,
RD.FRIDAY,
RD.SATURDAY,
RD.TOT_DURATION
FROM CREW_ROSTER.ROSTER_NAME R
JOIN CREW_ROSTER.DRAFT RD ON R.R_ID = RD.R_ID
WHERE R.R_ID = 234),
div AS
(SELECT DEPOT,
GRADE,
VALID_FROM,
VALID_TO,
ROWNUMBER,
UNNEST('{sunday,
monday,
tuesday,
wednesday,
thursday,
friday,
saturday }'::text[]) AS COL,
UNNEST(ARRAY[ SUNDAY :: JSON,

MONDAY :: JSON,
TUESDAY :: JSON,
WEDNESDAY :: JSON,
THURSDAY :: JSON,
FRIDAY :: JSON,
SATURDAY:: JSON]) AS COL1
FROM COMBIN),
DAY AS
(SELECT date::date,
TRIM (BOTH TO_CHAR(date, 'day'))AS DAY
FROM GENERATE_SERIES(date '2023-02-12', date '2023-03-04',interval '1
day') AS T(date)), FINAL AS
(SELECT *
FROM div C
JOIN DAY D ON D.DAY = C.COL
ORDER BY date,ROWNUMBER ASC), TT1 AS
(SELECT ROWNUMBER,date,COL,
(C - >> 'dia_id') :: UUID AS DIA_ID,
(C - >> 'book_on') ::TIME AS BOOK_ON,
(C - >> 'turn_no') ::VARCHAR(20) AS TURN_NO,
(C - >> 'Turn_text') ::VARCHAR(20) AS TURN_TEXT,
(C - >> 'book_off') :: TIME AS BOOK_OFF,
(C - >> 'duration') ::interval AS DURATION
FROM FINAL,
JSON_ARRAY_ELEMENTS((COL1)) C),
T1 AS
(SELECT ROW_NUMBER() OVER (ORDER BY F.DATE,F.ROWNUMBER)AS R_NO,
F.DEPOT,
F.GRADE,
F.VALID_FROM,
F.VALID_TO,
F.ROWNUMBER,
F.COL,
F.COL1,
F.DATE,
F.DAY,
T.DIA_ID,
T.BOOK_ON,
T.TURN_NO,
T.TURN_TEXT,
T.BOOK_OFF,
T.DURATION
FROM TT1 T
FULL JOIN FINAL F ON T.ROWNUMBER = F.ROWNUMBER
AND T.DATE = F.DATE
AND T.COL = F.COL),
T2 AS
(SELECT *,
GENERATE_SERIES(WEEKSTARTDATE,

WEEKENDDATE, interval '1 day')::date AS D_DATE
FROM ROSTER_TABLE
ORDER BY D_DATE,
LINE_POSITION)
INSERT INTO CREW_ROSTER.PUBLISH_ROSTER
(PAYNUMBER,DEPOT,GRADE,R_ID,ROSTER_DATE,DAY,TURNNO,TURNNO_TEXT,BOOK_ON,BOOK_OFF,DURATION,DIAGRAM_ID,INSERTION_TIME)
SELECT PAYNUMBER,DEPOT,GRADE,ROSTER_ID, date, DAY,TURN_NO, TURN_TEXT,
BOOK_ON, BOOK_OFF, DURATION, DIA_ID,NOW()
FROM T1
INNER JOIN T2 ON T2.D_DATE = T1.DATE
AND T2.LINE_POSITION = T1.ROWNUMBER
ORDER BY D_DATE,
LINE_POSITION ASC$BODY$;

ALTER FUNCTION api."post_publish_Roster"()
OWNER TO postgres;

GRANT EXECUTE ON FUNCTION api."post_publish_Roster"() TO PUBLIC;

GRANT EXECUTE ON FUNCTION api."post_publish_Roster"() TO postgres;

GRANT EXECUTE ON FUNCTION api."post_publish_Roster"() TO readonlyrole;

GRANT EXECUTE ON FUNCTION api."post_publish_Roster"() TO readwriterole;

`
It throws this error

[image: pgAdmin.png]

pg_get_functiondef(), trailing spaces and + sign

2023-02-27 Thread Ron

Is there any direct way in Postgresql to get rid of the frankly anti-useful 
junk at the end of each line (which also infects pg_stat_activity.query), or 
must I resort to sed post-processing?



test=# select pg_get_functiondef(oid)
test-# from pg_proc
test-# where proname = 'foo';
 pg_get_functiondef

 CREATE OR REPLACE FUNCTION public.foo(bar integer)+
  RETURNS double precision +
  LANGUAGE plpgsql +
 AS $function$ +
 begin +
 return bar * 1.0; +
 end;  +
 $function$    +

(1 row)


--
Born in Arizona, moved to Babylonia.

Re: pg_get_functiondef(), trailing spaces and + sign

2023-02-27 Thread Adrian Klaver


On 2/27/23 11:34 AM, Ron wrote:
Is there any direct way in Postgresql to get rid of the frankly 
anti-useful junk at the end of each line (which also infects 
pg_stat_activity.query), or must I resort to sed post-processing?


\pset format unaligned



test=# select pg_get_functiondef(oid)
test-# from pg_proc
test-# where proname = 'foo';
  pg_get_functiondef

  CREATE OR REPLACE FUNCTION public.foo(bar integer)+
   RETURNS double precision +
   LANGUAGE plpgsql +
  AS $function$ +
  begin +
  return bar * 1.0; +
  end;  +
  $function$    +

(1 row)





--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_get_functiondef(), trailing spaces and + sign

2023-02-27 Thread Adrian Klaver


On 2/27/23 11:34 AM, Ron wrote:
Is there any direct way in Postgresql to get rid of the frankly 
anti-useful junk at the end of each line (which also infects 
pg_stat_activity.query), or must I resort to sed post-processing?





Or

\ef foo


test=# select pg_get_functiondef(oid)
test-# from pg_proc
test-# where proname = 'foo';
  pg_get_functiondef

  CREATE OR REPLACE FUNCTION public.foo(bar integer)+
   RETURNS double precision +
   LANGUAGE plpgsql +
  AS $function$ +
  begin +
  return bar * 1.0; +
  end;  +
  $function$    +

(1 row)





--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Tempory table is not getting created inside Function in postgres.

2023-02-27 Thread Adrian Klaver


On 2/27/23 11:10 AM, nikhil raj wrote:

HI Team,

This is the Function I have created successfully but while executing it 
throughs an error temp table doesn't exist.


But the same when I execute it not inside the function from **drop temp 
table to end insert select ** it work fine


Please can any one help me why in the function i am not able to create 
the temp table. what is alternative



You are running this in pgAdmin4 Query Tool, correct?

Is Autocommit set?

Have you tried this in psql?


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Tempory table is not getting created inside Function in postgres.

2023-02-27 Thread Adrian Klaver


On 2/27/23 11:52 AM, nikhil raj wrote:

Reply to list also
Ccing list.


HI Adrian,

  Yes, I have tried it through the same error.

Orion_db=> select api."post_publish_Roster"()
Orion_db -> ;
ERROR:  relation "roster_table" does not exist
LINE 94: ...           interval '1 day')::date as d_date FROM roster_tab...
                                                               ^
QUERY:

drop table if exists roster_table;
create temp table roster_table as
SELECT roster_id, link_id, paynumber, username, line_position, 
  crewname,weeknumber,weekstartdate,weekenddate

         FROM crew_links.links_map
         CROSS JOIN LATERAL get_weeks( '2023-02-12', '2023-03-04') AS 
weekdata
         WHERE roster_id=234 and weekdata.weeknumber in (select 
min(weeknumber) from  get_weeks('2023-02-12', '2023-03-04') );


         DO $$
DECLARE


I missed it the first time, you are embedding a DO inside the function.

1) Pretty sure that is not going to work. Especially as you are changing 
languages.


2) Why are you doing that?

3) Either incorporate everything into one function or create separate 
stand alone function for the DO portion and use that in the 
post_publish_Roster



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Tempory table is not getting created inside Function in postgres.

2023-02-27 Thread Tom Lane

nikhil raj  writes:
> This is the Function I have created successfully but while executing it
> throughs an error temp table doesn't exist.

You won't be able to do it like that in a SQL-language function, because
the whole function body is parsed and parse-analyzed in one go.  So the
later query referencing ROSTER_TABLE fails because it's parsed before
the CREATE TABLE executes.  (Improving that has been on the to-do list
for a couple of decades, so don't hold your breath...)  I suggest putting
the whole thing, not just part of it, into plpgsql.

regards, tom lane

Re: Tempory table is not getting created inside Function in postgres.

2023-02-27 Thread nikhil raj

Hi Tom,

The same query  is executed outside the function its working properly
means just the qurey from drop temp table to insert but when i keep the
query inside a function its through me the error an after temp ra how
creation only i am using the select statment still its not exectuing the
create  statment or not storing the temp table.

On Tue, 28 Feb, 2023, 1:34 am Tom Lane,  wrote:

> nikhil raj  writes:
> > This is the Function I have created successfully but while executing it
> > throughs an error temp table doesn't exist.
>
> You won't be able to do it like that in a SQL-language function, because
> the whole function body is parsed and parse-analyzed in one go.  So the
> later query referencing ROSTER_TABLE fails because it's parsed before
> the CREATE TABLE executes.  (Improving that has been on the to-do list
> for a couple of decades, so don't hold your breath...)  I suggest putting
> the whole thing, not just part of it, into plpgsql.
>
> regards, tom lane
>

Re: Tempory table is not getting created inside Function in postgres.

2023-02-27 Thread David G. Johnston

On Mon, Feb 27, 2023 at 1:17 PM nikhil raj  wrote:

>
> The same query  is executed outside the function its working properly
> means just the qurey from drop temp table to insert but when i keep the
> query inside a function its through me the error an after temp ra how
> creation only i am using the select statment still its not exectuing the
> create  statment or not storing the temp table.
>
>>
>>
What Tom said is that in the following change 'sql' to plpgsql and move
on.  Your function is not capable of being executed in an SQL language
function.

CREATE OR REPLACE FUNCTION api."post_publish_Roster"(
)
RETURNS void
LANGUAGE 'sql'
COST 100
VOLATILE PARALLEL UNSAFE
AS $BODY$

David J.

Re: pg_get_functiondef(), trailing spaces and + sign

2023-02-27 Thread Kirk Wolak

On Mon, Feb 27, 2023 at 2:40 PM Adrian Klaver 
wrote:

> On 2/27/23 11:34 AM, Ron wrote:
> > Is there any direct way in Postgresql to get rid of the frankly
> > anti-useful junk at the end of each line (which also infects
> > pg_stat_activity.query), or must I resort to sed post-processing?
> >
> >
>
> Or
>
> \ef foo
>

that will edit the function, I think he meant

\sf foo

which will show you the content of the function!



> > test=# select pg_get_functiondef(oid)
> > test-# from pg_proc
> > test-# where proname = 'foo';
> >   pg_get_functiondef
> > 
> >   CREATE OR REPLACE FUNCTION public.foo(bar integer)+
> >RETURNS double precision +
> >LANGUAGE plpgsql +
> >   AS $function$ +
> >   begin +
> >   return bar * 1.0; +
> >   end;  +
> >   $function$+
> >
> > (1 row)
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>

DISTINCT and ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread mirabilos

Hi,

I’ve got a… rather large query (see below), in which I join a complex
data structure (whose exact contents do not, at this point, matter)
together to get some auxiliary data to expose as JSON field.

In this query I can use, for example…

jsonb_build_object('user_permissions',
jsonb_agg(DISTINCT ap.name ORDER BY ap.name))

… to get a distinct, sorted, list of “user permissions” from a table
ap which I joined to the user table which is the main subject of the
query. (For some reason I need to add DISTINCT because else duplica‐
tes are shown.)

Wrapping this as…

jsonb_build_object('user_permissions',
COALESCE(
jsonb_agg(DISTINCT ap.name ORDER BY ap.name)
FILTER (WHERE ap.id IS NOT NULL)))

… gets me the JSON object’s value for the user_permissions set to
null if there’s nothing in the m:n intermediate table for the user
in question.

This works well. However, what I seem to be not allowed to do is
(without the extra COALESCE, to simplify):

jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)
ORDER BY cot.weekday, cot.from_hour, cot.to_hour)
)

This is because, when I use DISTINCT (but only then‽), the ORDER BY
arguments must be… arguments to the function, or something. In the
above case, the sort key is ap.name which is also the argument to
the jsonb_agg function, so no problem there, but here, the jsonb_agg
argument is the return value of a function so… it has no name.

What I’m looking for is something like…

jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour) AS jbo
ORDER BY jbo->>'weekday', jbo->>'from_hour', jbo->>'to_hour')
)

… except I cannot define aliases in that place. Any other syntax
would also work.

The suggested solution for this is apparently to do…

CREATE OR REPLACE FUNCTION core_openingtime_jsonb_sort(JSONB)
RETURNS JSONB AS $$
SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'from_hour', e->>'to_hour')
FROM jsonb_array_elements($1) AS e
$$ LANGUAGE SQL IMMUTABLE;

… and then query as…

jsonb_build_object('opening_times',
core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)))
)

… which involves internally subquerying for each output row (i.e.
row of the user table) times amount of sub-JSONArrays that need
to be sorted like this, which is currently 3.

All other solutions I can find involve subqueries in the first
place; I am somewhat proud I even managed to write this with
JOINs and without any subqueries in the first place so I’m
hesitant to go that route.

Any advice here?

And, somewhat related: when outputting JSONB, the order of
JSONObject elements is indeterminate, which is… okay, but
forcing sorted (ASCIIbetically, i.e. by codepoint order)
keys would be very very welcome, for reproducibility of
the output. (I’m sure the reproducible-builds project would
also love if this could be changed, or at least added, in a
way it can be enabled for queries, as session parameter
perhaps?)

The query in its current incarnanation is as follows:

-- -BEGIN SQL-
CREATE OR REPLACE FUNCTION core_openingtime_jsonb_sort(JSONB)
RETURNS JSONB AS $$
SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'from_hour', e->>'to_hour')
FROM jsonb_array_elements($1) AS e
$$ LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION core_generalworkavailability_jsonb_sort(JSONB)
RETURNS JSONB AS $$
SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'forenoon', e->>'afternoon', 
e->>'evening')
FROM jsonb_array_elements($1) AS e
$$ LANGUAGE SQL IMMUTABLE;

COPY (SELECT cp.email, cp.first_name, cp.last_name, cp.street, cp.number,
jsonb_build_object('groups', COALESCE(jsonb_agg(DISTINCT ag.name ORDER 
BY ag.name)
FILTER (WHERE ag.id IS NOT NULL))) ||
jsonb_build_object('help_operations', COALESCE(jsonb_agg(DISTINCT 
cho.name ORDER BY cho.name)
FILTER (WHERE cho.id IS NOT NULL))) ||
jsonb_build_object('emergency_opening_times', COALESCE(
core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object(
'weekday', ceot.weekday,
'from_hour', ceot.from_hour,
'to_hour', ceot.to_hour))
FILTER (WHERE ceot.id IS NOT NULL ||
jsonb_build_object('opening_times', COALESCE(
core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour))
FILTER (WHERE cot.id IS NOT NULL ||
jsonb_build_object('possible_work_times', COALESCE(
core_generalworkavailability_jsonb_sort(jsonb_agg(DISTINCT 
jsonb_build_object(

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread Tom Lane

mirabilos  writes:
> This works well. However, what I seem to be not allowed to do is
> (without the extra COALESCE, to simplify):
> ...
> This is because, when I use DISTINCT (but only then‽), the ORDER BY
> arguments must be… arguments to the function, or something.

Well, yeah.  Simplify it to

  SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable;

If there are several rows containing the same value of x and different
values of y, which y value are we supposed to sort the unique-ified x
value by?  It's an ill-defined query.

For the specific example you give, it's true that any specific
possible output of jsonb_build_object() would correspond to
a unique set of cot.weekday, cot.from_hour, cot.to_hour values.
But the parser can't be expected to know that.  Many functions
can produce the same output for different sets of inputs.

I'd suggest moving the distinct-ification into an earlier
processing step (i.e. a sub-select), or maybe thinking harder
about why you're getting duplicates in the first place.

regards, tom lane

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread David G. Johnston

On Mon, Feb 27, 2023 at 4:11 PM mirabilos  wrote:

>
> jsonb_build_object('opening_times',
> jsonb_agg(DISTINCT jsonb_build_object(
> 'weekday', cot.weekday,
> 'from_hour', cot.from_hour,
> 'to_hour', cot.to_hour)
> ORDER BY cot.weekday, cot.from_hour, cot.to_hour)
> )
>
> This is because, when I use DISTINCT (but only then‽), the ORDER BY
> arguments must be… arguments to the function, or something.
>

So long as the function call itself is at least stable...:

DISTINCT func_call(...) ORDER BY func_call(...)

Order By is evaluated AFTER and over the contents of the distinct-ified
expression

> All other solutions I can find involve subqueries in the first
> place; I am somewhat proud I even managed to write this with
> JOINs and without any subqueries in the first place so I’m
> hesitant to go that route.
>

That pride seems misplaced.  Related to Tom's comment, the presence of the
DISTINCTs is telling you that what you did is not good.  DISTINCT is almost
always a code smell, and given the prevalence of direct table joins in your
query, it is indeed a valid signal.

Lastly, if you do need to care about normalizing the output of JSON you
should consider writing a function that takes arbitrary json input and
reformats it, rather than trying to build up json from scratch where every
individual component needs to be aware and take action.  i.e., get rid of
the ORDER BY also.  Maybe this belongs in an application layer with tooling
that already provides this capability.

David J.

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread David G. Johnston

On Mon, Feb 27, 2023 at 5:22 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
> Lastly, if you do need to care about normalizing the output of JSON you
> should consider writing a function that takes arbitrary json input and
> reformats it, rather than trying to build up json from scratch where every
> individual component needs to be aware and take action.  i.e., get rid of
> the ORDER BY also.  Maybe this belongs in an application layer with tooling
> that already provides this capability.
>
>
Just to clarify/correct myself - if building up json arrays you'll just
remove the distinct and then do a normal aggregate order by based upon
whatever ordering the base data presents.  I was thinking you were doing
order by to get keys in order (i.e., jsonb_object_agg) but that isn't the
case here.

David J.

Re: ERROR: unsupported Unicode escape sequence - in JSON-type column

2023-02-27 Thread Jan Bilek

On 2/27/23 22:13, Laurenz Albe wrote:

On Mon, 2023-02-27 at 06:28 +, Jan Bilek wrote:


Our customer was able to sneak in an Unicode data into a column of a JSON Type 
and now that record fails on select.
Would you be able to suggest any way out of this? E.g. finding infringing row, 
updating its data ... ?



I'd be curious to know how the customer managed to do that.
Perhaps there is a loophole in PostgreSQL that needs to be fixed.

First, find the table that contains the column.
Then you can try something like

  DO
  $$DECLARE
 pkey bigint;
  BEGIN
 FOR pkey IN SELECT id FROM jsontab LOOP
BEGIN  -- starts block with exception handler
   PERFORM jsoncol -> 'creationDateTime'
   FROM jsontab
   WHERE id = pkey;
EXCEPTION
   WHEN untranslatable_character THEN
  RAISE NOTICE 'bad character in line with id = %', pkey;
END;
 END LOOP;
  END;$$;

Yours,
Laurenz Albe


Hi Laurenz,

Thank you and yes, that's how we managed to go through that - one of our devs 
found similar approach described here: 
https://stackoverflow.com/questions/31671634/handling-unicode-sequences-in-postgresql
 (see the null_if_invalid_string function there + credits to Hendrik) and we 
reapplied it. FYI with a bit of tinkering we've been able to retrieve following 
(corrupted) data:

[cid:part1.XnBLfLs0.ptVnd3Gz@eftlab.com.au]

(It comes from a PROD system so I don't have it in a text form for you to 
experiment on that.)

Anyway, your solution points in exactly same direction.

How'd customer managed to do that? Still no idea ... looks like they restarted 
TCP connection on our middle-ware, but any partial packets should be dropped as 
not matching TCP length header. Also records are deserialized on receive so 
that would fail. Still, that record had to make it somehow in the PostgreSQL. 
We are still looking.

Thanks & Cheers,
Jan

--
Jan Bilek - CTO at EFTlab Pty Ltd.

Re: ERROR: unsupported Unicode escape sequence - in JSON-type column

2023-02-27 Thread Jan Bilek

On 2/28/23 01:17, Erik Wienhold wrote:
>> On 27/02/2023 13:13 CET Laurenz Albe  wrote:
>>
>> I'd be curious to know how the customer managed to do that.
>> Perhaps there is a loophole in PostgreSQL that needs to be fixed.
> Probably via some data access layer and not directly via Postgres.  It's easy
> to reproduce with psycopg:
>
>   import psycopg
>
>   with psycopg.connect() as con:
>   con.execute('create temp table jsontab (jsoncol json)')
>   con.execute(
>   'insert into jsontab (jsoncol) values (%s)',
>   [psycopg.types.json.Json('\0')],
>   )
>
>   with con.execute('select jsoncol from jsontab') as cur:
>   print(cur.fetchall())
>
>   try:
>   with con.execute('select jsoncol::jsonb from jsontab') as cur:
>   pass
>   raise AssertionError("jsonb should fail")
>   except psycopg.errors.UntranslatableCharacter:
>   pass
>
> Another reason to prefer jsonb over json to reject such inputs right away.
> The documentation states that json does not validate inputs in constrast to
> jsonb.
>
> Of course the OP now has to deal with json.  The data can be sanitized by
> replacing all null character escape sequences:
>
>   update jsontab
>   set jsoncol = replace(jsoncol::text, '\u', '')::json
>   where strpos(jsoncol::text, '\u') > 0;
>
> But the data access layer (or whatever got the json into the database) must be
> fixed as well to reject or sanitize those inputs in the future.
>
> --
> Erik

Hi Erik,

No, it didn't go through any foreign data access layer - it went in 
straight through the Postgresql variable bind using pre-cached insert 
statement using PostgreSQL 14.5, connected over UNIX sockets.

Strange thing happened afterwards - that locating that record was on & 
off - I couldn't pin-point it in DB as it seemed to be failing on 
multiple places ... until using that trick from Laurenz. Felt like a 
PostgreSQL memory corruption, but system remained stable without any 
complaints.

Thanks & Cheers,
Jan

-- 
Jan Bilek - CTO at EFTlab Pty Ltd.

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread Thorsten Glaser

On Mon, 27 Feb 2023, Tom Lane wrote:

>Well, yeah.  Simplify it to
>
>  SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable;

That’s… a bit too simple for this case.

>If there are several rows containing the same value of x and different
>values of y, which y value are we supposed to sort the unique-ified x
>value by?  It's an ill-defined query.

The problem here is that I do not have an ‘x’.

>For the specific example you give, it's true that any specific
>possible output of jsonb_build_object() would correspond to
>a unique set of cot.weekday, cot.from_hour, cot.to_hour values.

Not necessarily (see directly below), but why would that matter?
It should sort the generated JSON objects within the array.

>, or maybe thinking harder
>about why you're getting duplicates in the first place.

The application developer informed me that it’s entirely possible
that some user entered matching information twice. I don’t have
the exact query that produced duplicates easily in the history
(the downside of working with \i) and tests on other users didn’t
produce duplicates.

So, yes, filtering them out is indeed part of the task here.

On Mon, 27 Feb 2023, David G. Johnston wrote:

>So long as the function call itself is at least stable...:
>
>DISTINCT func_call(...) ORDER BY func_call(...)

aieee really?

(I’d hope jsonb_build_object to be.)

Is that better or worse than using the extra functions to sort…?

>Order By is evaluated AFTER and over the contents of the distinct-ified
>expression

That’s right and good, but the problem is that I do not seem to
have a syntax with which to refer to the distinct-ified expression
to use in the ORDER BY clause.

>> place; I am somewhat proud I even managed to write this with
>> JOINs and without any subqueries in the first place so I’m
>> hesitant to go that route.
>
>That pride seems misplaced.  Related to Tom's comment, the presence of the
>DISTINCTs is telling you that what you did is not good.  DISTINCT is almost
>always a code smell

Not about the DISTINCTs. I haven’t used JOIN much (usually WHERE
as inner join) nor on tables this massive, and this is my second
foray into aggregate functions only.

>Lastly, if you do need to care about normalizing the output of JSON you
>should consider writing a function that takes arbitrary json input and
>reformats it, rather than trying to build up json from scratch where every
>individual component needs to be aware and take action.  i.e., get rid of
>the ORDER BY also.  Maybe this belongs in an application layer with tooling
>that already provides this capability.

Funnily enough, both here and in the other place where I tried to
use JSON output, PostgreSQL (with COPY TO STDOUT) *is* the application
layer. Here I’m generating a CSV file; in the other situation I was
going to put the generated JSON directly into an HTTP result filehandle.

In the latter, I dropped that approach, output CSV and converted that
(by replacing newlines with “],[” and prepending “[[” and appending
“]]”) to JSON myself, which worked there as it was all-numeric. But
the frustration there was about unnecessary whitespace instead.

Both cases have in common that a, possibly huge, result set can be
directly streamed from PostgreSQL to the consumer, but the former
lacks just that tiny bit of functionality that would make it really
rock :/

I was asking here because perhaps either that missing functionality
can be considered, or to find out if there’s better ways to produce
that output, due to my inexperience with SQL. The ways I’m using do
work, and I’m relatively happy, but…

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread Tom Lane

Thorsten Glaser  writes:
> On Mon, 27 Feb 2023, Tom Lane wrote:
>> Well, yeah.  Simplify it to
>> SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable;

> That’s… a bit too simple for this case.

Sure, I was just trying to explain the rule.

>> For the specific example you give, it's true that any specific
>> possible output of jsonb_build_object() would correspond to
>> a unique set of cot.weekday, cot.from_hour, cot.to_hour values.

> Not necessarily (see directly below), but why would that matter?
> It should sort the generated JSON objects within the array.

Well, that may be what you want, but it's not what you wrote in
the query.  Follow David's advice and do

jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)
  ORDER BY jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour))

I'm pretty sure that this will only incur one evaluation of the
common subexpression, so even though it's tedious to type it's not
inefficient.

regards, tom lane

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)

2023-02-27 Thread David G. Johnston

On Mon, Feb 27, 2023 at 6:22 PM Thorsten Glaser  wrote:

>
> Not about the DISTINCTs. I haven’t used JOIN much (usually WHERE
> as inner join) nor on tables this massive, and this is my second
> foray into aggregate functions only.
>
>
Fair.

Consider this then as a jumping point to a more precise query form:

https://dbfiddle.uk/wz2MKtuF

create table base (base_id integer primary key, base_val text not null);
insert into base values (1, 'one');

create table subone (subone_id serial primary key, base_id integer,
subone_value text not null);
insert into subone (base_id, subone_value) values (1, 'subone-one'), (1,
'subone-two');

create table subtwo (subtwo_id serial primary key, base_id integer,
subtwo_value text not null);
insert into subtwo (base_id, subtwo_value) values (1, 'subtwo-one'), (1,
'subtwo-two');

--BAD cross joining going on with multiple one-to-many relationships
select * from base join subone using (base_id) join subtwo using (base_id);
-- not good

--GOOD, only joining one-to-one relationships

select jsonb_build_object('base_id', base_id, 'subone_arr', subone_arr,
'subtwo_arr', subtwo_arr)
from base
join lateral (select json_agg(jsonb_build_object('key', subone_value) order
by subone_value)
   from subone where subone.base_id = base.base_id) as so
(subone_arr) on true
join (select base_id, json_agg(subtwo_value order by subtwo_value desc)
   from subtwo group by base_id) as st (base_id, subtwo_arr)
using (base_id)

I used a mix of forms in the two joins, and there are other variants, but
the basic concept holds - produce single rows in subqueries then join those
various single rows together to produce your desired json output.

David J.

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)y

2023-02-27 Thread Thorsten Glaser

On Mon, 27 Feb 2023, David G. Johnston wrote:

>Consider this then as a jumping point to a more precise query form:
[…]
>the basic concept holds - produce single rows in subqueries then join those
>various single rows together to produce your desired json output.

Ouch. I’ll have to read up and experiment with that, I guess.
But wouldn’t I rather then do a CTA for each referenced table
that does the aggregating and GROUP BY person_id, then join that
(which has single rows for each user row) to users?

>--BAD cross joining going on with multiple one-to-many relationships
>--GOOD, only joining one-to-one relationships

Not quite. This is many-to-many created by Django…

On Mon, 27 Feb 2023, Tom Lane wrote:

>Sure, I was just trying to explain the rule.

Aaah, okay. Sorry, I misunderstood that, and the… general direction
of the replies, then.

>Well, that may be what you want, but it's not what you wrote in
>the query.  Follow David's advice and do
[…]
>I'm pretty sure that this will only incur one evaluation of the
>common subexpression, so even though it's tedious to type it's not
>inefficient.

Thanks. But I fear it’s not as simple as you wrote. More like:

jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)
ORDER BY
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'weekday',
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'from_hour',
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'to_hour')
)

Isn’t that more like it?

(Though I guess at that point I could just drop at least
the to_hour fallback sort, hoping nobody inserts overlapping
hours with indentical start times…)

Now that I see that, it sort of re-raises a question I had
during developing this but didn’t pursue.

How about, instead of creating a JSONB object here, I create
a (not-JSON) ARRAY['weekday', cot.weekday, 'from_hour',
cot.from_hour, 'to_hour', cot.to_hour] in the aggregate function.
The array is something I could naturally use to sort (its elements
already are in sort key order), and its elements *also* are in
the order jsonb_build_object expects its arguments (assuming I can
do something like passing the array to it instead of “spreading”
the arguments). Pseudocode:

… hmm, not that easy. The aggregate would return e.g. this…

ARRAY[['weekday',1,…],['weekday',2,…]]

… and there’s no array_map function that could be used to pass
each inner array, one by one, to jsonb_build_object; converting
the outer array to JSON gets us json_array_elements (same problem),
other ways I can think of also don’t seem to have matching functions
(jq has… interesting ones).

As I write this, I fear that won’t fly because nōn-JSON arrays
cannot contain mixed types (text and numbers) in PostgreSQL… at
least I ran into that at some point in the past vaguely remembering…

Oh well,
//mirabilos
-- 
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend.   -- Andreas Bogk über boehm-gc in d.a.s.r

42 matches

Mail list logo