Re: Help needed with XMLTABLE

2020-04-15 Thread Pavel Stehule
st 15. 4. 2020 v 7:32 odesílatel Dilip Kumar  napsal:

> One of our customers tried to use XMLTABLE syntax without
> row_expression, which works fine with ORACLE but doesn't work with
> PostgreSQL.  So can anyone suggest what alternative we can use?
>
> CREATE TABLE user_pool_clean (
> fk_user_pool_pk bytea NOT NULL,
> user_id character varying(255) NOT NULL,
> email_address character varying(250),
> is_mil numeric,
> is_civ numeric,
> is_ctr numeric,
> is_gov numeric,
> is_edu numeric,
> role_id character varying(50),
> user_profile_id character varying(50),
> service_branch_id character varying(50),
> mil_pay_grade_id character varying(50),
> my_auds character varying(4000),
> my_orgs character varying(4000),
> processed character(1) DEFAULT 'N'::bpchar NOT NULL
> );
>
> insert into user_pool_clean
> values('995CECDC1881375DE05312A270C7CF56','10015706','
> noem...@fter.dbrefresh.mil
> ',0,0,0,0,0,'1','4','76','3','1706882','1707720','Y');
>
> insert into user_pool_clean
> values('995CECDC1905375DE05312A270C7CF56','10015848','
> noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','3','700015','11','
> 1705562,1708486','1710621','Y');
>
>
> SQL> SELECT upc.is_mil,TRIM(column_value) src
>  FROM   user_pool_clean upc
>,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));
>
> IS_MIL SRC
> -- ---
>  0   1705562  O/P from the oracle database
>  01708486
> 01706882
>
> postgres[7604]=# SELECT upc.is_mil,TRIM(column_value) src
> postgres-#  FROM   user_pool_clean upc
> postgres-#,xmltable(('"'|| REPLACE( upc.my_auds, ',',
> '","')|| '"'));
> ERROR:  syntax error at or near ")"
> LINE 3: ... ,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));
>
>
this example is very classic - I newer see ugly code and dirty data
elsewhere than on Oracle

SELECT upc.is_mil,
   TRIM(column_value) src
  FROM   user_pool_clean upc,
 unnest(string_to_array(replace(my_auds,e'\n',''), ','))
column_value;
┌┬─┐
│ is_mil │   src   │
╞╪═╡
│  0 │ 1706882 │
│  0 │ 1705562 │
│  0 │ 1708486 │
└┴─┘
(3 rows)
SELECT upc.is_mil,
   TRIM(column_value) src
  FROM   user_pool_clean upc,
 regexp_split_to_table(replace(my_auds,e'\n',''), ',') column_value;
┌┬─┐
│ is_mil │   src   │
╞╪═╡
│  0 │ 1706882 │
│  0 │ 1705562 │
│  0 │ 1708486 │
└┴─┘
(3 rows)



Regards

Pavel


> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>
>
>


Re: Help needed with XMLTABLE

2020-04-15 Thread Pavel Stehule
st 15. 4. 2020 v 7:32 odesílatel Dilip Kumar  napsal:

> One of our customers tried to use XMLTABLE syntax without
> row_expression, which works fine with ORACLE but doesn't work with
> PostgreSQL.  So can anyone suggest what alternative we can use?
>
> CREATE TABLE user_pool_clean (
> fk_user_pool_pk bytea NOT NULL,
> user_id character varying(255) NOT NULL,
> email_address character varying(250),
> is_mil numeric,
> is_civ numeric,
> is_ctr numeric,
> is_gov numeric,
> is_edu numeric,
> role_id character varying(50),
> user_profile_id character varying(50),
> service_branch_id character varying(50),
> mil_pay_grade_id character varying(50),
> my_auds character varying(4000),
> my_orgs character varying(4000),
> processed character(1) DEFAULT 'N'::bpchar NOT NULL
> );
>
> insert into user_pool_clean
> values('995CECDC1881375DE05312A270C7CF56','10015706','
> noem...@fter.dbrefresh.mil
> ',0,0,0,0,0,'1','4','76','3','1706882','1707720','Y');
>
> insert into user_pool_clean
> values('995CECDC1905375DE05312A270C7CF56','10015848','
> noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','3','700015','11','
> 1705562,1708486','1710621','Y');
>
>
> SQL> SELECT upc.is_mil,TRIM(column_value) src
>  FROM   user_pool_clean upc
>,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));
>

It's very famous how people are creative - when xmltable has only one
argument, then it is evaluated like XQuery expression.

https://stewashton.wordpress.com/2016/08/01/splitting-strings-surprise/

Unfortunately, Postgres has not support of XQuery, so nothing similar is
possible. Fortunately, Postgres has much better tools for parsing string.

Regards

Pavel


IS_MIL SRC
> -- ---
>  0   1705562  O/P from the oracle database
>  01708486
> 01706882
>
> postgres[7604]=# SELECT upc.is_mil,TRIM(column_value) src
> postgres-#  FROM   user_pool_clean upc
> postgres-#,xmltable(('"'|| REPLACE( upc.my_auds, ',',
> '","')|| '"'));
> ERROR:  syntax error at or near ")"
> LINE 3: ... ,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>
>
>


Re: Help needed with XMLTABLE

2020-04-15 Thread Dilip Kumar
On Wed, Apr 15, 2020 at 12:56 PM Pavel Stehule  wrote:
> st 15. 4. 2020 v 7:32 odesílatel Dilip Kumar  napsal:
>>
>> One of our customers tried to use XMLTABLE syntax without
>> row_expression, which works fine with ORACLE but doesn't work with
>> PostgreSQL.  So can anyone suggest what alternative we can use?
>>
>> CREATE TABLE user_pool_clean (
>> fk_user_pool_pk bytea NOT NULL,
>> user_id character varying(255) NOT NULL,
>> email_address character varying(250),
>> is_mil numeric,
>> is_civ numeric,
>> is_ctr numeric,
>> is_gov numeric,
>> is_edu numeric,
>> role_id character varying(50),
>> user_profile_id character varying(50),
>> service_branch_id character varying(50),
>> mil_pay_grade_id character varying(50),
>> my_auds character varying(4000),
>> my_orgs character varying(4000),
>> processed character(1) DEFAULT 'N'::bpchar NOT NULL
>> );
>>
>> insert into user_pool_clean
>> values('995CECDC1881375DE05312A270C7CF56','10015706','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','4','76','3','1706882','1707720','Y');
>>
>> insert into user_pool_clean
>> values('995CECDC1905375DE05312A270C7CF56','10015848','noem...@fter.dbrefresh.mil',0,0,0,0,0,'1','3','700015','11','
>> 1705562,1708486','1710621','Y');
>>
>>
>> SQL> SELECT upc.is_mil,TRIM(column_value) src
>>  FROM   user_pool_clean upc
>>,xmltable(('"'|| REPLACE( upc.my_auds, ',', '","')|| '"'));
>
>
> It's very famous how people are creative - when xmltable has only one 
> argument, then it is evaluated like XQuery expression.
>
> https://stewashton.wordpress.com/2016/08/01/splitting-strings-surprise/
>
> Unfortunately, Postgres has not support of XQuery, so nothing similar is 
> possible. Fortunately, Postgres has much better tools for parsing string.

Thanks for the help, Pavel.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




pg_restore: could not close data file: Success

2020-04-15 Thread Peter J. Holzer
I'm trying to restore a backup on a different machine and it terminates
with the not really helpful messages:

pg_restore: [directory archiver] could not close data file: Success
pg_restore: [parallel archiver] a worker process died unexpectedly

The backup was made with

pg_dump --compress=5 -v -Fd -f "$dirname" -j 4 $db

(so it's in directory format)

The restore command was

pg_restore -c --if-exists -d $db -j 4 -v  $dirname

(I would use -C, but due to suboptimal partitioning I have to use a
different tablspace, so I need to create $db before the restore)

Both machines are running Ubuntu 18.04 and PostgreSQL is version 11.7
from the pgdg repo.

The error happens while restoring the data for the tables.

My guess is that maybe one of the data files is damaged ("Success"
probably means that errno is 0, so it wasn't a system call that failed,
but something in the application). Does that sound plausible or should I
look somewhere else? A web search returned nothing relevant.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: pg_restore: could not close data file: Success

2020-04-15 Thread Peter J. Holzer
On 2020-04-15 12:01:46 +0200, Peter J. Holzer wrote:
> I'm trying to restore a backup on a different machine and it terminates
> with the not really helpful messages:
> 
> pg_restore: [directory archiver] could not close data file: Success
> pg_restore: [parallel archiver] a worker process died unexpectedly
[...]
> My guess is that maybe one of the data files is damaged

As is often the case the matter became obvious a few minutes after
writing the mail. 

There were indeed two file with length 0 in the dump. That happened
because the backup failed because it couldn't obtain a lock on a table.

I nicer error message (something like "cannot decompress '13503.dat.gz':
Empty file") would have helped.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: pg_basebackup && long time storage of wal_archive/ content

2020-04-15 Thread Adrian Klaver

On 4/14/20 11:16 PM, Matthias Apitz wrote:

El día Dienstag, April 14, 2020 a las 08:28:35 -0700, Adrian Klaver escribió:


On 4/14/20 8:00 AM, Matthias Apitz wrote:


Hello,

The run (as user 'postgres') on the server of the cmd:

pg_basebackup -U ${DBSUSER} -Ft -z -D 
/data/postgresql11/backup-wal_archive-20200414-1/

produces correctly the following situtation on the server:

srap32dxr1:/data/postgresql11 # ls -ltrh backup-20200414-1
total 1.7G
-rw--- 1 postgres postgres 1.7G Apr 14 16:20 base.tar.gz
-rw--- 1 postgres postgres  17K Apr 14 16:20 pg_wal.tar.gz
srap32dxr1:/data/postgresql11 # ls -ltrh data/pg_wal
total 81M
-rw--- 1 postgres postgres 16M Apr 14 15:11 0001000C009F
-rw--- 1 postgres postgres 16M Apr 14 15:20 0001000C00A0
-rw--- 1 postgres postgres 16M Apr 14 16:10 0001000C00A1
-rw--- 1 postgres postgres 16M Apr 14 16:20 0001000C00A2
-rw--- 1 postgres postgres 342 Apr 14 16:20 
0001000C009D.0028.backup
drwx-- 2 postgres postgres 36K Apr 14 16:25 archive_status
-rw--- 1 postgres postgres 16M Apr 14 16:50 0001000C009E
srap32dxr1:/data/postgresql11 # ls -ltrh wal_archive/
total 49M
-rw--- 1 postgres postgres 16M Apr 14 15:20 0001000C009B
-rw--- 1 postgres postgres 16M Apr 14 16:10 0001000C009C
-rw--- 1 postgres postgres 16M Apr 14 16:20 0001000C009D
-rw--- 1 postgres postgres 342 Apr 14 16:20 
0001000C009D.0028.backup

The content of the dir wal_archive/ is growing and growing over the
time.

The pg_wal.tar.gz contains:

srap32dxr1:/data/postgresql11 # tar tzf backup-20200414-1/pg_wal.tar.gz
0001000C009D

i..e. the files 0001000C009B ... 0001000C009D
could be removed in wal_archive/

Correct? Or how is the long time storage of this dir?


pg_basebackup is a point in time backup tool.


Adrian,

Thanks for the answer. What do you mean by 'point in time backup tool'?


That pg_basebackup captures the state of the cluster at some point in 
time. As you have found out keeping up with the current state of the 
cluster requires further work. Laurenz provided one built in tool for 
doing that. The other tools I posted earlier also provide means of 
keeping backups consistent.



And, coming back to my question, can older WAL in wal_archive/ be
removed without any risks if the daily

pg_wal.tar.gz
base.tar.gz

are moved to a safe place?


If you want something that
continuously archives and prunes as it goes then you probably want to look
at the tools below:

https://pgbackrest.org/

https://www.pgbarman.org/

https://postgrespro.github.io/pg_probackup/



Thanks, I will have a look into the tools.

matthias




--
Adrian Klaver
adrian.kla...@aklaver.com




GENERATED STORED columns and table rewrites?

2020-04-15 Thread Maciek Sakrejda
Hello,

Does adding a GENERATED STORED column need to rewrite the table (like
adding a column with a DEFAULT before 11)? Neither the ALTER TABLE docs [1]
nor the generated column docs [2] discuss this. The former has a very nice
Tip regarding DEFAULT columns--maybe we should clarify GENERATED STORED
behavior either way? It seems like a very similar issue. Experimenting with
this, \timing suggests that there is work proportional to the size of the
table, but I'd like to understand the behavior better (and other users
probably would, too).

Thanks,
Maciek

[1]:
https://www.postgresql.org/docs/current/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN
[2]: https://www.postgresql.org/docs/current/ddl-generated-columns.html


timestamp and timestamptz

2020-04-15 Thread Niels Jespersen
Hello all

We have some data that have entered a timestamp column from a csv. The data in 
the csv are in utc. We want to access the data in our native timezone (CET).

I am considering a few alternatives:


1.   Early in the process, convert to timestamptz and keep this datatype.

2.   Early in the process, convert to timestamp as understood in CET.  This 
will imply by convention that the data in the timestamp column represents CET. 
Users will need to be told that data represents CET, even if data is somwhere 
in the future kept in another country in another timezone.

I probably should choose 1 over 2. But I am a bit hesitant, probably because we 
almost never have used timestamptz.

Can we agree that the below query is selecting both the original utc timestamps 
and 2 and 1 (as decribed above)?

set timezone to 'cet';
select read_time read_time_utc, (read_time at time zone 'utc')::timestamp 
read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t 
limit 10;

We are on Postgres 12.

Regards Niels Jespersen


Re: timestamp and timestamptz

2020-04-15 Thread Magnus Hagander
On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen  wrote:

> Hello all
>
>
>
> We have some data that have entered a timestamp column from a csv. The
> data in the csv are in utc. We want to access the data in our native
> timezone (CET).
>
>
>
> I am considering a few alternatives:
>
>
>
> 1.   Early in the process, convert to timestamptz and keep this
> datatype.
>
> 2.   Early in the process, convert to timestamp as understood in
> CET.  This will imply by convention that the data in the timestamp column
> represents CET. Users will need to be told that data represents CET, even
> if data is somwhere in the future kept in another country in another
> timezone.
>
>
>
> I probably should choose 1 over 2. But I am a bit hesitant, probably
> because we almost never have used timestamptz.
>

Yes, you should.



> Can we agree that the below query is selecting both the original utc
> timestamps and 2 and 1 (as decribed above)?
>
>
>
> set timezone to 'cet';
>
> select read_time read_time_utc, (read_time at time zone 'utc')::timestamp
> read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz
> from t limit 10;
>
>
>
As long as you use option 1:

SELECT read_time
will return the time in CET (as a timestamptz) after you've set timezone to
'cet'. If you set timezone to 'utc' it will directly return utc.

SELECT read_time AT TIME ZONE 'utc'
will return the time in UTC (as a timestamp)


And just make sure you have done a "set time zone 'utc'" before you *load*
the data, and everything should just work automatically.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: timestamp and timestamptz

2020-04-15 Thread Steve Crawford
What is the exact format of the timestamp in the CSV? As long as it is in a
"fully qualified" format, i.e. includes the time-zone offset, then you will
have no problem as the data represents a point in time.

It is easier to conceptualize "time stamp with time zone" (timestamptz) as
actually representing a "point in time". If your client is set to CET then
the data will display as desired.

-Steve

On Wed, Apr 15, 2020 at 10:50 AM Niels Jespersen  wrote:

> Hello all
>
>
>
> We have some data that have entered a timestamp column from a csv. The
> data in the csv are in utc. We want to access the data in our native
> timezone (CET).
>
>
>
> I am considering a few alternatives:
>
>
>
> 1.   Early in the process, convert to timestamptz and keep this
> datatype.
>
> 2.   Early in the process, convert to timestamp as understood in
> CET.  This will imply by convention that the data in the timestamp column
> represents CET. Users will need to be told that data represents CET, even
> if data is somwhere in the future kept in another country in another
> timezone.
>
>
>
> I probably should choose 1 over 2. But I am a bit hesitant, probably
> because we almost never have used timestamptz.
>
>
>
> Can we agree that the below query is selecting both the original utc
> timestamps and 2 and 1 (as decribed above)?
>
>
>
> set timezone to 'cet';
>
> select read_time read_time_utc, (read_time at time zone 'utc')::timestamp
> read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz
> from t limit 10;
>
>
>
> We are on Postgres 12.
>
>
>
> Regards Niels Jespersen
>


Re: timestamp and timestamptz

2020-04-15 Thread Steve Crawford
On Wed, Apr 15, 2020 at 11:06 AM Steve Crawford <
scrawf...@pinpointresearch.com> wrote:

> What is the exact format of the timestamp in the CSV? As long as it is in
> a "fully qualified" format, i.e. includes the time-zone offset, then you
> will have no problem as the data represents a point in time.
>
> It is easier to conceptualize "time stamp with time zone" (timestamptz) as
> actually representing a "point in time". If your client is set to CET then
> the data will display as desired.
>
> -Steve
>
> On Wed, Apr 15, 2020 at 10:50 AM Niels Jespersen  wrote:
>
>> Hello all
>>
>>
>>
>> We have some data that have entered a timestamp column from a csv. The
>> data in the csv are in utc. We want to access the data in our native
>> timezone (CET).
>>
>>
>>
>> I am considering a few alternatives:
>>
>>
>>
>> 1.   Early in the process, convert to timestamptz and keep this
>> datatype.
>>
>> 2.   Early in the process, convert to timestamp as understood in
>> CET.  This will imply by convention that the data in the timestamp column
>> represents CET. Users will need to be told that data represents CET, even
>> if data is somwhere in the future kept in another country in another
>> timezone.
>>
>>
>>
>> I probably should choose 1 over 2. But I am a bit hesitant, probably
>> because we almost never have used timestamptz.
>>
>>
>>
>> Can we agree that the below query is selecting both the original utc
>> timestamps and 2 and 1 (as decribed above)?
>>
>>
>>
>> set timezone to 'cet';
>>
>> select read_time read_time_utc, (read_time at time zone 'utc')::timestamp
>> read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz
>> from t limit 10;
>>
>>
>>
>> We are on Postgres 12.
>>
>>
>>
>> Regards Niels Jespersen
>>
>
Oops - sorry for top-post. Darn GMail.

-Steve


Re: GENERATED STORED columns and table rewrites?

2020-04-15 Thread Michael Lewis
Yes, the system will do a full table rewrite to compute the value and store
it. Unfortunately, I believe it is an access exclusive lock during that
entire time.


Re: timestamp and timestamptz

2020-04-15 Thread Tim Cross


Niels Jespersen  writes:

> Hello all
>
>  
>
> We have some data that have entered a timestamp column from a csv. The data 
> in the csv are in utc. We want to access the data in
> our native timezone (CET). 
>
>  
>
> I am considering a few alternatives: 
>
>  
>
> 1.   Early in the process, convert to timestamptz and keep this datatype. 
>
> 2.   Early in the process, convert to timestamp as understood in CET.  
> This will imply by convention that the data in the timestamp
> column represents CET. Users will need to be told that data represents CET, 
> even if data is somwhere in the future kept in another
> country in another timezone. 
>
>  
>
> I probably should choose 1 over 2. But I am a bit hesitant, probably because 
> we almost never have used timestamptz. 
>
>  
>
> Can we agree that the below query is selecting both the original utc 
> timestamps and 2 and 1 (as decribed above)?
>
>  
>
> set timezone to 'cet';
>
> select read_time read_time_utc, (read_time at time zone 'utc')::timestamp 
> read_time_cet, (read_time at time zone 'utc')::timestamptz
> read_time_tz from t limit 10;
>
>  
>
> We are on Postgres 12. 
>
>  
>

Keep your life simple - just go with option 1. Keep all timestamps in
UTC and let clients deal with it in whatever way they need to. This will
also help deal with issues associated with daylight savings time (DST
can be a real pain as different locations have it and others don't and
the 'switchover' date is subject to political whims and can change).
Your option 2 will cause all sorts of issues and keep in mind that most
larger countries have multiple timezones, so even if your data is all
associated with a single country, you can potentially have multiple
conversion routines required. On most *nix systems, clock time is UTC as
well, so having everything in UTC really helps when you want to do
diagnosis across your database and system log files etc. 

-- 
Tim Cross




Re: timestamp and timestamptz

2020-04-15 Thread raf
Tim Cross wrote:

> Niels Jespersen  writes:
> 
> > Hello all
> >
> > We have some data that have entered a timestamp column from a csv. The data 
> > in the csv are in utc. We want to access the data in
> > our native timezone (CET). 
> >
> > I am considering a few alternatives: 
> >
> > 1.   Early in the process, convert to timestamptz and keep this 
> > datatype. 
> >
> > 2.   Early in the process, convert to timestamp as understood in CET.  
> > This will imply by convention that the data in the timestamp
> > column represents CET. Users will need to be told that data represents CET, 
> > even if data is somwhere in the future kept in another
> > country in another timezone. 
> >
> > I probably should choose 1 over 2. But I am a bit hesitant, probably 
> > because we almost never have used timestamptz. 
> >
> > Can we agree that the below query is selecting both the original utc 
> > timestamps and 2 and 1 (as decribed above)?
> >
> > set timezone to 'cet';
> >
> > select read_time read_time_utc, (read_time at time zone 'utc')::timestamp 
> > read_time_cet, (read_time at time zone 'utc')::timestamptz
> > read_time_tz from t limit 10;
> >
> > We are on Postgres 12. 
> 
> Keep your life simple - just go with option 1. Keep all timestamps in
> UTC and let clients deal with it in whatever way they need to.

That suggestion (keep timestamps in UTC) doesn't sound like option 1 to me
(i.e. convert timestamps-that-are-in-UTC-but-don't-record-the-fact to 
timestamptz).
Option 1 could be done using any timezone. The point is to record what the
timezone is. It doesn't matter whether it's left as UTC or converted to CET
as long as that decision is recorded in the data by using timestamptz (and
as long as you get postgres to perform the time zone conversion because it
will get it right).

But presumably there is some reason why the data is already in UTC
and there's probably no reason to change that. But its timezone should
be recorded.

> This will
> also help deal with issues associated with daylight savings time (DST
> can be a real pain as different locations have it and others don't and
> the 'switchover' date is subject to political whims and can change).

That's a dim view of the tzdata database which gets updated regularly
to take such political decisions into account. As long as postgres uses
the same tzdata as all good UNIX-based operating systems do, and it's kept
up to date, it should be fine (as long as you never trust what a Windows
host says the timezone is for arbitrary timestamps).

This is required even if you leave data in UTC if it ever needs to be
displayed in any other time zone. Postgres still needs up to date tzdata
to perform the conversions later for users.

> Your option 2 will cause all sorts of issues and keep in mind that most
> larger countries have multiple timezones, so even if your data is all
> associated with a single country, you can potentially have multiple
> conversion routines required. On most *nix systems, clock time is UTC as
> well, so having everything in UTC really helps when you want to do
> diagnosis across your database and system log files etc. 

I don't see much difference in storing a timestamptz in UTC or a timestamptz
in CET. As long as the intended offset from UTC is recorded (which it is
in a timestamptz) it should be fine. If the timestamp is CET, then that fact
is in the data and the user doesn't need to be told it separately. It's obvious
when they see the data because the timezone is part of the data.

And you don't need conversion routines. Postgres can compare timestamptz
values and convert from one time zone to another for display purposes.

However, option 2 seems to be converting the data to CET but not using the
timestamptz datatype. That would be a bad idea. I think timestamptz should
always be preferred to timestamp. Not recording the timezone is where the
problems come from.

Although having a single timezone for log files is a great idea. They hardly
ever include timezone information so keeping everything in the same timezone
is important. Mind you, they hardly ever even include the year. What's with
that? (yes, I'm looking at you /var/log).

> -- 
> Tim Cross

cheers,
raf





Re: timestamp and timestamptz

2020-04-15 Thread David G. Johnston
On Wed, Apr 15, 2020 at 4:53 PM raf  wrote:

> I don't see much difference in storing a timestamptz in UTC or a
> timestamptz
> in CET. As long as the intended offset from UTC is recorded (which it is
> in a timestamptz) it should be fine.
>

I only really skimmed the entire response but this framing of how
timestamptz stores results is wrong.  Once you've stored a timestamptz in
PostgreSQL you no longer have any knowledge of the timezone.  If you truly
need that you need to record that in a different field.  What you do know
is that PostgreSQL has a known point-in-time in UTC and can give you back
the same value expressed in any other timezone according to the rules in
the timezone database.

Or, as written verbatim in the documentation:
"""
For timestamp with time zone, the internally stored value is always in UTC
(Universal Coordinated Time, traditionally known as Greenwich Mean Time,
GMT). An input value that has an explicit time zone specified is converted
to UTC using the appropriate offset for that time zone. If no time zone is
stated in the input string, then it is assumed to be in the time zone
indicated by the system's TimeZone parameter, and is converted to UTC using
the offset for the timezone zone.
"""

https://www.postgresql.org/docs/12/datatype-datetime.html

David J.


Re: pg_restore: could not close data file: Success

2020-04-15 Thread Kyotaro Horiguchi
Hello.

Added -hackers.

At Wed, 15 Apr 2020 12:14:25 +0200, "Peter J. Holzer"  wrote 
in 
> On 2020-04-15 12:01:46 +0200, Peter J. Holzer wrote:
> > I'm trying to restore a backup on a different machine and it terminates
> > with the not really helpful messages:
> > 
> > pg_restore: [directory archiver] could not close data file: Success
> > pg_restore: [parallel archiver] a worker process died unexpectedly
> [...]
> > My guess is that maybe one of the data files is damaged
> 
> As is often the case the matter became obvious a few minutes after
> writing the mail. 
> 
> There were indeed two file with length 0 in the dump. That happened
> because the backup failed because it couldn't obtain a lock on a table.
> 
> I nicer error message (something like "cannot decompress '13503.dat.gz':
> Empty file") would have helped.

Unfortunately, just emptying .dat.gz file doesn't worked for me.
Anyway the message is emitted the following way.

pg_backup_directoy.c:
>  if (cfclose(cfp) !=0)
>fatal("could not close data file: %m");

%m doesn't work for some kinds of errors about compressed files but
cfclose conseals the true cause.

I'm surprised to find an old thread about the same issue.

https://www.postgresql.org/message-id/20160307.174354.251049100.horiguchi.kyotaro%40lab.ntt.co.jp

But I don't think it's not acceptable that use fake errno for gzclose,
but cfclose properly passes-through the error code from gzclose, so it
is enought that the caller should recognize the difference.

Please find the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 1417401086..3a8394d7f2 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -645,6 +645,13 @@ cfgets(cfp *fp, char *buf, int len)
 		return fgets(buf, len, fp->uncompressedfp);
 }
 
+/*
+ * cfclose close the stream
+ *
+ * Returns 0 if successfully closed the cfp. Most of errors are reported as -1
+ * and errno is set.  Otherwise the return value is the return value from
+ * gzclose and errno doesn't hold a meangful value.
+ */
 int
 cfclose(cfp *fp)
 {
@@ -665,6 +672,11 @@ cfclose(cfp *fp)
 #endif
 	{
 		result = fclose(fp->uncompressedfp);
+
+		/* normalize error return, just in case EOF is not -1 */
+		if (result != 0)
+			result = -1;
+
 		fp->uncompressedfp = NULL;
 	}
 	free_keep_errno(fp);
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index c9cce5ed8a..ecc6aa5fbb 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -108,6 +108,7 @@ void
 InitArchiveFmt_Directory(ArchiveHandle *AH)
 {
 	lclContext *ctx;
+	int			ret;
 
 	/* Assuming static functions, this can be copied for each format. */
 	AH->ArchiveEntryPtr = _ArchiveEntry;
@@ -218,8 +219,14 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 		ReadToc(AH);
 
 		/* Nothing else in the file, so close it again... */
-		if (cfclose(tocFH) != 0)
-			fatal("could not close TOC file: %m");
+		ret = cfclose(tocFH);
+		if (ret < 0)
+		{
+			if (ret == -1)
+fatal("could not close TOC file: %m");
+			else
+fatal("could not close TOC file: zlib error (%d)", ret);
+		}
 		ctx->dataFH = NULL;
 	}
 }
@@ -378,6 +385,7 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
 	char	   *buf;
 	size_t		buflen;
 	cfp		   *cfp;
+	int			ret;
 
 	if (!filename)
 		return;
@@ -396,8 +404,15 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
 	}
 
 	free(buf);
-	if (cfclose(cfp) !=0)
-		fatal("could not close data file: %m");
+
+	ret = cfclose(cfp);
+	if (ret < 0)
+	{
+		if (ret == -1)
+			fatal("could not close data file: %m");
+		else
+			fatal("could not close data file: zlib error (%d)", ret);
+	}
 }
 
 /*
@@ -429,6 +444,7 @@ _LoadBlobs(ArchiveHandle *AH)
 	lclContext *ctx = (lclContext *) AH->formatData;
 	char		fname[MAXPGPATH];
 	char		line[MAXPGPATH];
+	int			ret;
 
 	StartRestoreBlobs(AH);
 
@@ -460,9 +476,16 @@ _LoadBlobs(ArchiveHandle *AH)
 		fatal("error reading large object TOC file \"%s\"",
 			  fname);
 
-	if (cfclose(ctx->blobsTocFH) != 0)
-		fatal("could not close large object TOC file \"%s\": %m",
-			  fname);
+	ret = cfclose(ctx->blobsTocFH);
+	if (ret < 0)
+	{
+		if (ret == -1)
+			fatal("could not close large object TOC file \"%s\": %m",
+  fname);
+		else
+			fatal("could not close large object TOC file \"%s\": zlib error (%d)",
+  fname, ret);
+	}
 
 	ctx->blobsTocFH = NULL;
 
@@ -555,6 +578,7 @@ _CloseArchive(ArchiveHandle *AH)
 	{
 		cfp		   *tocFH;
 		char		fname[MAXPGPATH];
+		int			ret;
 
 		setFilePath(AH, fname, "toc.dat");
 
@@ -576,8 +600,14 @@ _CloseArchive(ArchiveHandle *AH)
 		WriteHead(AH);
 		AH->format = archDirectory;
 		WriteToc(AH);
-		if (cfclose(tocFH) != 0)
-			fatal("could not close TOC file: %m");
+		ret = cfclose(tocFH);
+		if (ret < 0)
+		{
+			if (ret == -1)
+fatal("could not close TOC file: %m");
+			else
+fatal("could not close TOC file: zlib

SV: timestamp and timestamptz

2020-04-15 Thread Niels Jespersen


Fra: Magnus Hagander 
Sendt: 15. april 2020 20:05
Til: Niels Jespersen 
Cc: pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz



On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen 
mailto:n...@dst.dk>> wrote:
Hello all

We have some data that have entered a timestamp column from a csv. The data in 
the csv are in utc. We want to access the data in our native timezone (CET).

I am considering a few alternatives:


1.   Early in the process, convert to timestamptz and keep this datatype.

2.   Early in the process, convert to timestamp as understood in CET.  This 
will imply by convention that the data in the timestamp column represents CET. 
Users will need to be told that data represents CET, even if data is somwhere 
in the future kept in another country in another timezone.

I probably should choose 1 over 2. But I am a bit hesitant, probably because we 
almost never have used timestamptz.

Yes, you should.


Can we agree that the below query is selecting both the original utc timestamps 
and 2 and 1 (as decribed above)?

set timezone to 'cet';
select read_time read_time_utc, (read_time at time zone 'utc')::timestamp 
read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t 
limit 10;


As long as you use option 1:

SELECT read_time
will return the time in CET (as a timestamptz) after you've set timezone to 
'cet'. If you set timezone to 'utc' it will directly return utc.

SELECT read_time AT TIME ZONE 'utc'
will return the time in UTC (as a timestamp)


And just make sure you have done a "set time zone 'utc'" before you *load* the 
data, and everything should just work automatically.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Thank you Magnus (and others) for your replies.

The raw input data are in this, slightly strange format: 2019.05.01 00:00:00. 
No timezone indicator, just an informal guarantee from the supplier that it is 
indeed utc. And no real chance of changing the format. We know, from experience.

The data volume is a bit high, a few billion rows pr month. So, table 
partitioning is very helpful (aka really indispensable). Data will be 
aggregated in several ways for analytics. Time aggregations must be according 
to our local timezone (cet). We do not want data from one day being aggregated 
into the wrong date because of timezone issues. This means that partition 
boundaries (monthly pratitions most often, sometimes day partitions) must be on 
CET-boundaries so that partition pruning will pull data from the relevant 
cet-month not the utc-month.

Now, if I load data into a timestamptz with timezone set to utc, partition to 
cet-boundaries, query and aggredate with timezone set to cet, everything wil be 
ok, I think. My small testcase below shows that the row goes into the 
april-partition (as it should). The planner does the correct partition pruning 
according to specified filtering and set timezone. All good.

create table t (t_id bigserial, ts timestamptz) partition by range (ts);
create table t_2020_02 partition of t for values from ('2020-02-01 
00:00:00+01') to ('2020-03-01 00:00:00+01');
create table t_2020_03 partition of t for values from ('2020-03-01 
00:00:00+01') to ('2020-04-01 00:00:00+02');
create table t_2020_04 partition of t for values from ('2020-04-01 
00:00:00+02') to ('2020-05-01 00:00:00+02');

set timezone to 'utc';
insert into t (ts) values('2020-03-31 23:30:00');

Once again, thank you for invaluable feedback.

Niels Jespersen



Re: pg_restore: could not close data file: Success

2020-04-15 Thread Michael Paquier
On Thu, Apr 16, 2020 at 12:08:09PM +0900, Kyotaro Horiguchi wrote:
> I'm surprised to find an old thread about the same issue.
> 
> https://www.postgresql.org/message-id/20160307.174354.251049100.horiguchi.kyotaro%40lab.ntt.co.jp
> 
> But I don't think it's not acceptable that use fake errno for gzclose,
> but cfclose properly passes-through the error code from gzclose, so it
> is enought that the caller should recognize the difference.

A problem with this patch is that we may forget again to add this
special error handling if more code paths use cfclose().

As of HEAD, there are three code paths where cfclose() is called but
it does not generate an error: two when ending a blob and one when
ending a data file.  Perhaps it would make sense to just move all this
error within the routine itself?  Note that it would also mean
registering file names in lclContext or equivalent as that's an
important piece of the error message.
--
Michael


signature.asc
Description: PGP signature