Storage Inefficiency In PostgreSQL

2019-04-15 Thread Ray Cheung
Hi ,

We are currently contemplating switching from MySQL to PostgreSQL, the main
attraction being the use of the TimescaleDB extension. Having done much of
the ground investigation there is one area of significant concern - the
storage requirement of PostgreSQL. Put simply, comparing like for like for a
set of tables, PostgreSQL consumes far more storage space than MySQL:

- MySQL (5.6): 156 MB
- PostgreSQL (11.2): 246 MB
- PostgreSQL + TimescaleDB (partitioned/chunked data): 324 MB

I've also submitted this in stackoverflow:
https://stackoverflow.com/questions/55655272/how-to-reduce-postgresql-databa
se-size.

I can rearrange the table/column-alignment to save 6 bytes per row of the
main table, with a saving of a few mega-bytes. Not enough to make any real
difference. Does anyone know:

- Why PostgreSQL is so storage inefficient in comparison?
- What existing methods can be used to reduce the storage consumption (I've
already tried realignment and vacuum full)?
- Are there any plans to address this storage consumption inefficiency (in
comparison to MySQL) problem?

Many thanks,

sps-ray


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus





Re: Storage Inefficiency In PostgreSQL

2019-04-15 Thread Chris Travers
On Mon, Apr 15, 2019 at 10:43 AM Ray Cheung <
ray.che...@silverpowersystems.com> wrote:

> Hi ,
>
> We are currently contemplating switching from MySQL to PostgreSQL, the main
> attraction being the use of the TimescaleDB extension. Having done much of
> the ground investigation there is one area of significant concern - the
> storage requirement of PostgreSQL. Put simply, comparing like for like for
> a
> set of tables, PostgreSQL consumes far more storage space than MySQL:
>
> - MySQL (5.6): 156 MB
> - PostgreSQL (11.2): 246 MB
> - PostgreSQL + TimescaleDB (partitioned/chunked data): 324 MB
>
> I've also submitted this in stackoverflow:
>
> https://stackoverflow.com/questions/55655272/how-to-reduce-postgresql-databa
> se-size
> 
> .
>
> I can rearrange the table/column-alignment to save 6 bytes per row of the
> main table, with a saving of a few mega-bytes. Not enough to make any real
> difference. Does anyone know:
>
> - Why PostgreSQL is so storage inefficient in comparison?
>

The storage strategies are different enough you can't really assume direct
comparisons.

Long story short, iMySQL is optimized for two things:  primary key lookups,
and reducing disk I/O from updates to heavily indexed tables.
PostgreSQL is optimized for a lot of things, including access through
secondary indexes and sequential scans.  This means that both tables and
indexes are structured differently.


> - What existing methods can be used to reduce the storage consumption (I've
> already tried realignment and vacuum full)?
>

You could take a look at extensions that give you foreign data wrappers for
columnar stores, but note this has a number of important tradeoffs in
performance and is not recommended for OLTP systems.  However if space is
your primary concern, I would assume you are trying to set up some sort of
OLAP system?


> - Are there any plans to address this storage consumption inefficiency (in
> comparison to MySQL) problem?
>

Long run  pluggable storage should give people a different set of options
and choices to make here.

>
> Many thanks,
>
> sps-ray
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
>
>

-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: SQLSTATE when PostgreSQL crashes during COMMIT statement

2019-04-15 Thread Francisco Olarte
On Mon, Apr 15, 2019 at 4:11 AM Takahashi, Ryohei
 wrote:
> If application executes COMMIT statement and COMMIT failes because of 
> PostgreSQL crash,
> it is unknown whether the transaction is really committed.
> Therefore, I think application should check the transaction is really 
> committed after a while when certain SQLSTATE is returned.
> Which SQLSTATE should application check? Or, is there any document which is 
> written about this?
> In my observation, ecpg returns '57P02' and libpq returns 'null' when 
> PostgreSQL crashes during COMMIT statement.
> Any other SQLSTATE?

My math is rusty, but I doubt you can reliably detect wheter a commit
failed. Detecting full success is easy, you get a correct code. Commit
failed without server crash is easy to, you get fail code. But if the
library has sent the commit message but not received the ok/fail code,
as the server has to 1st persist the query to disk and 2nd send the
result (OK/FAIL) back you never know what has happened. So, if the
library gives you an error ( "I have not received the commit
confirmation" , not something like "commit failed" ), you will still
need to test the data, if you can, to know how the server come back
up.

Francisco Olarte.




Re: SQLSTATE when PostgreSQL crashes during COMMIT statement

2019-04-15 Thread Andreas Kretschmer




Am 15.04.19 um 12:41 schrieb Francisco Olarte:

On Mon, Apr 15, 2019 at 4:11 AM Takahashi, Ryohei
 wrote:

If application executes COMMIT statement and COMMIT failes because of 
PostgreSQL crash,
it is unknown whether the transaction is really committed.
Therefore, I think application should check the transaction is really committed 
after a while when certain SQLSTATE is returned.
Which SQLSTATE should application check? Or, is there any document which is 
written about this?
In my observation, ecpg returns '57P02' and libpq returns 'null' when 
PostgreSQL crashes during COMMIT statement.
Any other SQLSTATE?

My math is rusty, but I doubt you can reliably detect wheter a commit
failed. Detecting full success is easy, you get a correct code. Commit
failed without server crash is easy to, you get fail code. But if the
library has sent the commit message but not received the ok/fail code,
as the server has to 1st persist the query to disk and 2nd send the
result (OK/FAIL) back you never know what has happened. So, if the
library gives you an error ( "I have not received the commit
confirmation" , not something like "commit failed" ), you will still
need to test the data, if you can, to know how the server come back
up.




since pg10 you can use check the status of a transaction, here is a 
explanation:

https://blog.2ndquadrant.com/postgresql-10-transaction-traceability/

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com





Inexplicable UPDATE...RETURNING behaviour

2019-04-15 Thread Joe Wildish
Hello all,

We are seeing an inexplicable behaviour when issuing an "UPDATE..RETURNING" 
statement. I am unsure if it is a Postgres bug. Additional eyes-on would be 
much appreicated.

When issuing the following statement we are seeing multiple rows UPDATE'd 
despite the use of LIMIT 1 and despite the "uid" column in the "some_queue" 
table having a PRIMARY KEY constraint on it:

UPDATE queue.some_queue AS q
   SET (state, awaiting) = ('executing', FALSE)
  FROM (SELECT uid
  FROM queue.some_queue
 WHERE awaiting
   AND process_after <= CURRENT_TIMESTAMP AT TIME ZONE 'UTC'
 ORDER BY process_after ASC
   FOR UPDATE SKIP LOCKED
 LIMIT 1)
AS dq(uid)
 WHERE q.uid = dq.uid
 RETURNING q.uid;

However, when using the following statement, which (AFAIK) is semantically 
equivalent, we see only a single row being updated/dequeued:

UPDATE queue.some_queue AS q
   SET (state, awaiting) = ('executing', FALSE)
 WHERE uid = (SELECT uid
FROM queue.some_queue
   WHERE awaiting
 AND process_after <= CURRENT_TIMESTAMP AT TIME ZONE 'UTC'
   ORDER BY process_after ASC
 FOR UPDATE SKIP LOCKED
   LIMIT 1)
 RETURNING uid;

IMO the two statements should yield the same result. But, we see the first one 
updating multiple rows and therefore dequeing multiple uids, yet the second one 
functions as intended (ie. single item is dequeued).

We can replicate this locally in tests but I can't explain it. Is this a bug, 
or am I overlooking something?

Cheers,
-Joe

PG. Postgres 10.6 in production, and the same behaviour with 10.5 + 11.2 in dev.



Re: Inexplicable UPDATE...RETURNING behaviour

2019-04-15 Thread Tom Lane
Joe Wildish  writes:
> We are seeing an inexplicable behaviour when issuing an "UPDATE..RETURNING" 
> statement. I am unsure if it is a Postgres bug. Additional eyes-on would be 
> much appreicated.

> When issuing the following statement we are seeing multiple rows UPDATE'd 
> despite the use of LIMIT 1 and despite the "uid" column in the "some_queue" 
> table having a PRIMARY KEY constraint on it:

> UPDATE queue.some_queue AS q
>SET (state, awaiting) = ('executing', FALSE)
>   FROM (SELECT uid
>   FROM queue.some_queue
>  WHERE awaiting
>AND process_after <= CURRENT_TIMESTAMP AT TIME ZONE 'UTC'
>  ORDER BY process_after ASC
>FOR UPDATE SKIP LOCKED
>  LIMIT 1)
> AS dq(uid)
>  WHERE q.uid = dq.uid
>  RETURNING q.uid;

Yeah, there was another similar complaint a few weeks ago --- has this
suddenly gotten to be a popular coding idea?

The basic problem with what you have here is that FOR UPDATE (especially
with SKIP LOCKED) makes the sub-select's output unstable by definition.
If it's executed more than once then you might get different rows back,
allowing the outer UPDATE's join to potentially match multiple rows from
the outer instance of queue.some_queue.  Typically, since it's LIMIT 1,
I'd think that the planner would put dq on the outside of a nestloop plan
and you'd escape seeing any problem --- but if it gets put on the inside
of a nestloop, it's going to misbehave.

There are (at least) two different ways that the sub-select's output
might change when re-executed, even though it's still using the same
snapshot as before:

1. If some other transaction releases a row lock in between, SKIP LOCKED
might not skip that row any more.

2. The row returned the first time will absolutely not get chosen the
second time, given this particular query formulation, because its latest
updated version will have awaiting = false thanks to the action of the
outer UPDATE, so it'll fail the inner WHERE test.


The way I'd recommend fixing it is to put the FOR UPDATE into a WITH
to guarantee single execution:

WITH dq(uid) AS (SELECT uid ... LIMIT 1)
UPDATE queue.some_queue q SET ...
FROM dq
WHERE q.uid = dq.uid
RETURNING q.uid;

regards, tom lane




PG10 declarative partitioning that allow null value

2019-04-15 Thread Soni

Hello All,
currently we are planning to migrate inheritance partitioning to 
declarative partitioning by range values of int on pg10. But the key 
values can contain null. I know that range partitioning does not allow 
null values.
We are looking for workaround for this, that is create a list 
partitioning and then followed by range sub-partitioning. List 
partitioning of number of digits of the int key column.



CREATE TABLE partitioned_table_name (like table_name) partition by list 
(floor(log(int_key_column)+1));


create table partitions_1 partition of partitioned_table_name for values 
in (null);
create table partitions_2 partition of partitioned_table_name for values 
in (1);

.
.
create table partitions_2 partition of partitioned_table_name for values 
in (9) partition by range(int_key_column);


Question is :
is this a good practice ? will performance dropped significantly due to 
the process of counting the number of digits ?

or is there any better workaround for this ?

Thanks,

Soni.





Re: PG10 declarative partitioning that allow null value

2019-04-15 Thread Soni
I think it's better to had list partitioning of true/false based on 
(int_key_column is null)
CREATE TABLE partitioned_table_name (like table_name) partition by list 
((int_key_column is null));


On 16/04/2019 9:37, Soni wrote:

Hello All,
currently we are planning to migrate inheritance partitioning to 
declarative partitioning by range values of int on pg10. But the key 
values can contain null. I know that range partitioning does not allow 
null values.
We are looking for workaround for this, that is create a list 
partitioning and then followed by range sub-partitioning. List 
partitioning of number of digits of the int key column.



CREATE TABLE partitioned_table_name (like table_name) partition by 
list (floor(log(int_key_column)+1));


create table partitions_1 partition of partitioned_table_name for 
values in (null);
create table partitions_2 partition of partitioned_table_name for 
values in (1);

.
.
create table partitions_2 partition of partitioned_table_name for 
values in (9) partition by range(int_key_column);


Question is :
is this a good practice ? will performance dropped significantly due 
to the process of counting the number of digits ?

or is there any better workaround for this ?

Thanks,

Soni.






Re: PG10 declarative partitioning that allow null value

2019-04-15 Thread Tom Lane
Soni  writes:
>> currently we are planning to migrate inheritance partitioning to 
>> declarative partitioning by range values of int on pg10. But the key 
>> values can contain null. I know that range partitioning does not allow 
>> null values.

In v11 you could use a default partition ...

regards, tom lane




RE: SQLSTATE when PostgreSQL crashes during COMMIT statement

2019-04-15 Thread Takahashi, Ryohei
Hi Francisco and Andreas,


Thank you for your reply.

> So, if the library gives you an error ( "I have not received the commit
> confirmation" , not something like "commit failed" ), you will still
> need to test the data, if you can, to know how the server come back up.

Yes.
I want to know what error does library give to the application on such case.
I supposed it is certain SQLSTATE.


> since pg10 you can use check the status of a transaction, here is a 
> explanation:
> https://blog.2ndquadrant.com/postgresql-10-transaction-traceability/

Thank you for the information.
According to the page and the PostgreSQL documentation of txid_status(),
the application should look up the transaction's completion status when the 
connection is lost,
not when certain SQLSTATE is given, right?


Regards,
Ryohei Takahashi