RLS not using index scan but seq scan when condition gets a bit complicated

2022-03-01 Thread Charles Huang
Hi Postgres community,

We are experiencing some performance issues when RLS is enabled for large 
tables. With simplified example:

We have a table:

CREATE TABLE emp.employees (
  employee_id INTEGER PRIMARY KEY,
-- companies table are defined in a different schema, not accessible to emp 
service
  company_id INTEGER NOT NULL,   
  employee_name TEXT NOT NULL
);

Index for employees table:

CREATE INDEX employees_company_id_idx ON emp.employees (company_id);

And for the table we have RLS select policy:

CREATE POLICY employee_select_policy ON emp.employees FOR SELECT
  USING (
company_id = 
ANY(coalesce(string_to_array(current_setting('emp.authorized_read_company_ids', 
TRUE), ',')::INTEGER[], ARRAY []::INTEGER[]))
  );


When a very simple query is executed, for instance:

SET emp.authorized_read_company_ids = '1, 2, 3, ..., 200';
SELECT count(*) FROM emp.employees WHERE TRUE;   -- 68091 rows

The query plan for this query reads:

Aggregate  (cost=1096.02..1096.03 rows=1 width=8) (actual time=8.740..8.740 
rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=778
  ->  Index Only Scan using employees_company_id_idx on emp.employees  
(cost=0.35..970.78 rows=50099 width=0) (actual time=0.124..4.976 rows=49953 
loops=1)
Output: company_id
Index Cond: (employees.company_id = ANY 
(COALESCE((string_to_array(current_setting('emp.authorized_read_company_ids'::text,
 true), ','::text))::integer[], '{}'::integer[])))
Heap Fetches: 297
Buffers: shared hit=778
Planning:
  Buffers: shared hit=12
Planning Time: 0.824 ms
Execution Time: 8.768 ms

The problem rises when we make the RLS select policy condition a bit more 
complicated by adding admin checks inside RLS select policy:

CREATE POLICY employee_select_policy ON emp.employees FOR SELECT
  USING (
   coalesce(nullif(current_setting('emp.is_admin', TRUE), ''), 
'false')::BOOLEAN
OR company_id = 
ANY(coalesce(string_to_array(current_setting('emp.authorized_read_company_ids', 
TRUE), ',')::INTEGER[], ARRAY []::INTEGER[]))
  );

When the same simple query is executed:

SET emp.is_admin = TRUE;
SET emp.authorized_read_company_ids = '1, 2, 3, ..., 200';
SELECT count(*) FROM emp.employees WHERE TRUE;   -- 68091 rows

The query plan now reads:

Aggregate  (cost=6238.51..6238.52 rows=1 width=8) (actual 
time=2156.271..2156.272 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=367
  ->  Index Only Scan using employees_company_id_idx on emp.employees  
(cost=0.29..6099.16 rows=55740 width=0) (actual time=0.065..2151.939 rows=49953 
loops=1)
Output: company_id
Filter: ((COALESCE(NULLIF(current_setting('emp.is_admin'::text, true), 
''::text), 'false'::text))::boolean OR (employees.company_id = ANY 
(COALESCE((string_to_array(current_setting('emp.authorized_read_company_ids'::text,
 true), ','::text))::integer[], '{}'::integer[]
Rows Removed by Filter: 11430
Heap Fetches: 392
Buffers: shared hit=367
Planning Time: 0.744 ms
Execution Time: 2156.302 ms

We can see the performance has deteriorated horribly because the RLS is not 
using index any more for the company ids, the RLS scan happens for every single 
row in the result set against every single company id in the db context.

With the size of table and the number of company ids inside the db context 
growing, the execution time becomes longer and longer.

To summarise: We would like to have admin users run without any RLS 
restrictions, and normal users to have RLS enforced using an index based on 
company_ids. Unfortunately, we cannot have queries executed by admin users 
connect to the database as a different database user.

Is there anything you could suggest?

Thanks,
Charles

Re: Never Ending query in PostgreSQL

2022-03-01 Thread Tomas Vondra
On 2/27/22 18:20, Jeff Janes wrote:
> 
> On Sun, Feb 27, 2022 at 7:09 AM Kumar, Mukesh  > wrote:
> 
> Hi Team, 
> 
> Can you please help in tunning the attached query as , i am trying
> to run this query and it runs for several hours and it did not give
> any output.
> 
> 
> Several hours is not all that long.  Without an EXPLAIN ANALYZE, we
> could easily spend several hours scratching our heads and still get
> nowhere.  So unless having this running cripples the rest of your
> system, please queue up another one and let it go longer.  But first, do
> an ANALYZE (and preferably a VACUUM ANALYZE) on all the tables.  If you
> have a test db which is a recent clone of production, you could do it
> there so as not to slow down production.  The problem is that the row
> estimates must be way off (otherwise, it shouldn't take long) and if
> that is the case, we can't use the plan to decide much of anything,
> since we don't trust it.
> 

I'd bet Jeff is right and poor estimates are the root cause. The pattern
with a cascade of "nested loop" in the explain is fairly typical. This
is likely due to the complex join conditions and correlation.


> In parallel you could start evicting table joins from the query to
> simplify it until it gets to the point where it will run, so you can
> then see the actual row counts.  To do that it does help if you know
> what the intent of the query is (or for that matter, the text of the
> query--you attached the plan twice).
> 

Right, simplify the query. Or maybe do it the other way around - start
with the simplest query (the inner-most part of the explain) and add
joins one by one (by following the explains) until it suddenly starts
being much slower.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: Never Ending query in PostgreSQL

2022-03-01 Thread Kumar, Mukesh
Hi Tomas , 

Thanks for replying , We have identified a Join condition which is creating a 
problem for that query.

Accept my apologies for pasting the plan twice. I am attaching the query again 
in this mail

We have found that by evicting the View paymenttransdetails_view from the 
attached query runs in approx. 10 secs and the view contains multiple 
conditions and 1 jojn as well.

I am attaching the View definition as well.

Please suggest if there is a work around for this query to run faster without 
evicting the above from the query.



Thanks and Regards, 
Mukesh Kumar

-Original Message-
From: Tomas Vondra  
Sent: Tuesday, March 1, 2022 7:35 PM
To: Jeff Janes ; Kumar, Mukesh 
Cc: [email protected]
Subject: Re: Never Ending query in PostgreSQL

On 2/27/22 18:20, Jeff Janes wrote:
> 
> On Sun, Feb 27, 2022 at 7:09 AM Kumar, Mukesh 
> mailto:[email protected]>> wrote:
> 
> Hi Team,
> 
> Can you please help in tunning the attached query as , i am trying
> to run this query and it runs for several hours and it did not give
> any output.
> 
> 
> Several hours is not all that long.  Without an EXPLAIN ANALYZE, we 
> could easily spend several hours scratching our heads and still get 
> nowhere.  So unless having this running cripples the rest of your 
> system, please queue up another one and let it go longer.  But first, 
> do an ANALYZE (and preferably a VACUUM ANALYZE) on all the tables.  If 
> you have a test db which is a recent clone of production, you could do 
> it there so as not to slow down production.  The problem is that the 
> row estimates must be way off (otherwise, it shouldn't take long) and 
> if that is the case, we can't use the plan to decide much of anything, 
> since we don't trust it.
> 

I'd bet Jeff is right and poor estimates are the root cause. The pattern with a 
cascade of "nested loop" in the explain is fairly typical. This is likely due 
to the complex join conditions and correlation.


> In parallel you could start evicting table joins from the query to 
> simplify it until it gets to the point where it will run, so you can 
> then see the actual row counts.  To do that it does help if you know 
> what the intent of the query is (or for that matter, the text of the 
> query--you attached the plan twice).
> 

Right, simplify the query. Or maybe do it the other way around - start with the 
simplest query (the inner-most part of the explain) and add joins one by one 
(by following the explains) until it suddenly starts being much slower.


regards

--
Tomas Vondra
EnterpriseDB: 
https://urldefense.com/v3/__http://www.enterprisedb.com__;!!KupS4sW4BlfImQPd!P_2LgOrDOnTxBqFECBDdQolWyDNytft5mDbiJF_Bn827W6GdEOflXZ8a-NWSzdi6nJgewzgEJom8uFDBFgGKSETUD5VHA38$
The Enterprise PostgreSQL Company


Tuned_Query (002).sql
Description: Tuned_Query (002).sql


View.sql
Description: View.sql


Simple task with partitioning which I can't realize

2022-03-01 Thread Andrew Zakharov
Hello all -

I have a task which is simple at the first look. I have a table which
contains hierarchy of address objects starting with macro region end ends
with particular buildings. You can imagine how big is it. 

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  .

constraint pk_region_hierarchy primary key (gid),

constraint fk_region_hierarchy_region_hierarchy_parent foreign key
(parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning
by list on the region_code field as I did in Oracle database. I've carefully
looked thru docs/faqs/google/communities and found out that I must include
"gid" field into partition key because a primary key field. Thus partition
method "by list" is not appropriate method in this case and "by range"
either. What I have left from partition methods? Hash? How can I create
partitions by gid & region_code by hash? Feasible? Will it be working
properly (with partition pruning) when search criteria is by region_code
only? Same problem appears when there is simple serial "id" used as primary
identifier. Removing all constraints is not considered. I understand that
such specific PostgreSQL partitioning implementation has done by tons of
reasons but how I can implement partitioning for my EASY case? I see the
only legacy inheritance is left, right? Very sad if it's true.

Your advices are very important.

Thanks in advance.

Andrew.

 



Re: Never Ending query in PostgreSQL

2022-03-01 Thread Tomas Vondra
On 3/1/22 16:01, Kumar, Mukesh wrote:
> Hi Tomas ,
> 
> Thanks for replying , We have identified a Join condition which is
> creating a problem for that query.
> 
> Accept my apologies for pasting the plan twice. I am attaching the
> query again in this mail
> 

Queries without explain (or even better "explain analyze") are useless.
We don't have the data, we don't know what the executed plan is, we
don't know what plan might be a better one.

There's a wiki page about reporting slow queries (what info to include,
etc):

https://wiki.postgresql.org/wiki/Slow_Query_Questions

> We have found that by evicting the View paymenttransdetails_view from
> the attached query runs in approx. 10 secs and the view contains
> multiple conditions and 1 jojn as well.
> 

You need to add individual tables, not a view which is itself a join of
10+ tables. The idea is that you start with a fast query, add tables one
by one (in the join order from the explain). You'll be able to do
EXPLAIN ANALYZE and watch estimate accuracy, and then at some point it
gets much slower, which is the join that causes trouble. But you might
still be able to do explain analyze.

So looking at the explain plan you shared before, you'd start with a
join of so_vendor_address_base + so_vendor_base, and then you'd add

- sapecc_lfa1_assoc
- lms_payment_item_vendor_base
- lms_payment_line_item_base
- lms_payment_check_request
- lms_pay_line_item_acct_base
- ...

(in this order). I'd bet "lms_payment_check_request" is where things
start to go south.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Simple task with partitioning which I can't realize

2022-03-01 Thread David G. Johnston
On Tue, Mar 1, 2022 at 8:37 AM Andrew Zakharov  wrote:

> create table region_hierarchy(
>
>   gid uuid not null default uuid_generate_v1mc(),
>
>   parent_gid uuid null,
>
>   region_code int2,
>
>
>


> I’ve carefully looked thru docs/faqs/google/communities and found out that
> I must include “gid” field into partition key because a primary key field.
>

Yes, you are coming up against the following limitation:

"Unique constraints (and hence primary keys) on partitioned tables must
include all the partition key columns. This limitation exists because the
individual indexes making up the constraint can only directly enforce
uniqueness within their own partitions; therefore, the partition structure
itself must guarantee that there are not duplicates in different
partitions."

https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

That limitation is independent of partitioning; i.e., the legacy
inheritance option doesn't bypass it.

Thus, your true "key" is composite: (region, identifier).  Thus you need to
add a "parent_region_code" column as well, redefine the PK as (region_code,
gid), and the REFERENCES clause to link the two paired fields.

You can decide whether that is sufficient or if you want some added comfort
in ensuring that a gid cannot appear in multiple regions by creating a
single non-partitioned table containing all gid values and add a unique
constraint there.

Or maybe allow for duplicates across region codes and save space by using a
smaller data type (int or bigint - while renaming the column to "rid" or
some such) - combined with having the non-partitioned reference table being
defined as (region_code, rid, gid).

David J.


RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-01 Thread [email protected]


   >  -Original Message-
   >  From: Justin Pryzby 
   >  Sent: Monday, February 28, 2022 17:05
   >  To: [email protected]
   >  Cc: [email protected]
   >  Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
   >  
   >  On Mon, Feb 28, 2022 at 09:43:09PM +, [email protected]
   >  wrote:
   >  >On Wed, Feb 23, 2022 at 07:04:15PM -0600, Justin Pryzby wrote:
   >  >>  > And the aforementioned network trace.  You could set a capture
   >  filter on TCP
   >  >>  > SYN|RST so it's not absurdly large.  From my notes, it might 
look like
   >  this:
   >  >>  > (tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0)
   >  >>
   >  >>  I'd also add '|| icmp'.  My hunch is that you'll see some ICMP 
(not
   >  "ping")
   >  >>  being sent by an intermediate gateway, resulting in the connection
   >  being
   >  >>  reset.
   >  >
   >  > I am so sorry but I do not understand what you are asking me to do. I am
   >  unfamiliar with these commands. Is this a postgres configuration file? Is 
this
   >  something I just do once or something I leave on to hopefully catch it 
when
   >  the issue occurs? Is this something to do on the DB machine or the ETL
   >  machine? FYI:
   >  
   >  It's no problem.
   >  
   >  I suggest that you run wireshark with a capture filter to try to show 
*why*
   >  the connections are failing.  I think the capture filter might look like:
   >  
   >  (icmp || (tcp[tcpflags] & (tcp-rst|tcp-syn|tcp-fin)!=0)) && host
   >  10.64.17.211
   >  
   >  With the "host" filtering for the IP address of the *remote* machine.
   >  
   >  You could run that on whichever machine is more convenient and leave it
   >  running for however long it takes for that error to happen.  You'll be 
able to
   >  save a .pcap file for inspection.  I suppose it'll show either a TCP RST 
or an
   >  ICMP.
   >  Whichever side sent that is where the problem is.  I still suspect the 
issue
   >  isn't in postgres.
   >  
   >  >   - My ETL machine is on 10.64.17.211
   >  >   - My DB machine is on 10.64.17.210
   >  >   - Both on Windows Server 2012 R2, x64
   >  
   >  These network details make my theory unlikely.
   >  
   >  They're on the same subnet with no intermediate gateways, and
   >  communicate directly via a hub/switch/crossover cable.  If that's true, 
then
   >  both will have each other's hardware address in ARP after pinging from one
   >  to the other.
   >  
   >  --
   >  Justin

Yes, the machines ARE on the same subnet. They actually even are on the same 
physical rack as per what I have been told. When I run a tracert, I get this:

Tracing route to PRODDB.xxx.int [10.64.17.210] over a maximum of 30 hops:
  1 1 ms<1 ms<1 ms  PRODDB.xxx.int [10.64.17.210]
Trace complete.

Now, there is an additional component I think... Storage is on an array and I 
am not getting a clear answer as to where it is 😊 Is it possible that something 
is happening at the storage layer? Could that be reported as a network issue vs 
a storage issue for Postgres?

Also, both machines are actually VMs. I forgot to mention that and not sure if 
that's relevant.

Thank you,
Laurent.




Re: Simple task with partitioning which I can't realize

2022-03-01 Thread Marc Millas
Hi,

is there any chance (risk ?) that a given gid be present in more than one
region ?
if not (or if you implement it via a dedicated, non partition table),

you may create a simple table partitioned by region, and create unique
indexes for each partition.
this is NOT equivalent to a unique constraint at global table level, of
course.

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov  wrote:

> Hello all –
>
> I have a task which is simple at the first look. I have a table which
> contains hierarchy of address objects starting with macro region end ends
> with particular buildings. You can imagine how big is it.
>
> Here is short sample of table declaration:
>
>
>
> create table region_hierarchy(
>
>   gid uuid not null default uuid_generate_v1mc(),
>
>   parent_gid uuid null,
>
>   region_code int2,
>
>   …
>
> constraint pk_region_hierarchy primary key (gid),
>
> constraint fk_region_hierarchy_region_hierarchy_parent foreign key
> (parent_gid) references region_hierarchy(gid)
>
> );
>
>
>
> Being an Oracle specialist, I planned to using same declarative
> partitioning by list on the region_code field as I did in Oracle database.
> I’ve carefully looked thru docs/faqs/google/communities and found out that
> I must include “gid” field into partition key because a primary key field.
> Thus partition method “by list” is not appropriate method in this case and
> “by range” either. What I have left from partition methods? Hash? How can I
> create partitions by gid & region_code by hash? Feasible? Will it be
> working properly (with partition pruning) when search criteria is by
> region_code only? Same problem appears when there is simple serial “id”
> used as primary identifier. Removing all constraints is not considered. I
> understand that such specific PostgreSQL partitioning implementation has
> done by tons of reasons but how I can implement partitioning for my EASY
> case? I see the only legacy inheritance is left, right? Very sad if it’s
> true.
>
> Your advices are very important.
>
> Thanks in advance.
>
> Andrew.
>
>
>


RE: Simple task with partitioning which I can't realize

2022-03-01 Thread Andrew Zakharov
David, - yes, creation composite foreign/primary key is not a problem. But the 
main question is what method should I use for partitioning by composite key 
gid, region_code? The partition method itself created not only for faster data 
access but for better administration. The administration like a truncate/insert 
is a main reason why I split the data for my DWH case. If the only hash method 
is left I cannot administer the partitions separately this way. But anyway, 
could you please provide your vision the brief declaration for main table and 
partition?

Thanks.

Andrew.

 

From: David G. Johnston  
Sent: Tuesday, March 01, 2022 6:54 PM
To: Andrew Zakharov 
Cc: Pgsql Performance 
Subject: Re: Simple task with partitioning which I can't realize

 

On Tue, Mar 1, 2022 at 8:37 AM Andrew Zakharov mailto:[email protected]> > wrote:

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

 

 

I’ve carefully looked thru docs/faqs/google/communities and found out that I 
must include “gid” field into partition key because a primary key field.

 

Yes, you are coming up against the following limitation:

 

"Unique constraints (and hence primary keys) on partitioned tables must include 
all the partition key columns. This limitation exists because the individual 
indexes making up the constraint can only directly enforce uniqueness within 
their own partitions; therefore, the partition structure itself must guarantee 
that there are not duplicates in different partitions."

 

https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

 

That limitation is independent of partitioning; i.e., the legacy inheritance 
option doesn't bypass it.

 

Thus, your true "key" is composite: (region, identifier).  Thus you need to add 
a "parent_region_code" column as well, redefine the PK as (region_code, gid), 
and the REFERENCES clause to link the two paired fields.

 

You can decide whether that is sufficient or if you want some added comfort in 
ensuring that a gid cannot appear in multiple regions by creating a single 
non-partitioned table containing all gid values and add a unique constraint 
there.

 

Or maybe allow for duplicates across region codes and save space by using a 
smaller data type (int or bigint - while renaming the column to "rid" or some 
such) - combined with having the non-partitioned reference table being defined 
as (region_code, rid, gid).

 

David J.

 



Re: Simple task with partitioning which I can't realize

2022-03-01 Thread David G. Johnston
On Tue, Mar 1, 2022 at 9:37 AM Andrew Zakharov  wrote:

> David, - yes, creation composite foreign/primary key is not a problem. But
> the main question is what method should I use for partitioning by composite
> key gid, region_code?
>

The convention here is to inline or bottom-post responses.

Your original plan - list partitions by region_code.  You couldn't do that
before because you weren't seeing the region_code as being part of your PK
and all partition columns must be part of the PK.  My suggestion is that
instead of figuring out how to work around that limitation (not that I
think there is a good one to be had) you accept it and just add region_code
to the PK.

David J.


RE: Simple task with partitioning which I can't realize

2022-03-01 Thread Andrew Zakharov
Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different 
region partitions. I considered simple partitioned table w/o any keys but I’d 
believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas  
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov 
Cc: [email protected]
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one 
region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes 
for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.




Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com  

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov mailto:[email protected]> > wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains 
hierarchy of address objects starting with macro region end ends with 
particular buildings. You can imagine how big is it. 

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

constraint pk_region_hierarchy primary key (gid),

constraint fk_region_hierarchy_region_hierarchy_parent foreign key 
(parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by 
list on the region_code field as I did in Oracle database. I’ve carefully 
looked thru docs/faqs/google/communities and found out that I must include 
“gid” field into partition key because a primary key field. Thus partition 
method “by list” is not appropriate method in this case and “by range” either. 
What I have left from partition methods? Hash? How can I create partitions by 
gid & region_code by hash? Feasible? Will it be working properly (with 
partition pruning) when search criteria is by region_code only? Same problem 
appears when there is simple serial “id” used as primary identifier. Removing 
all constraints is not considered. I understand that such specific PostgreSQL 
partitioning implementation has done by tons of reasons but how I can implement 
partitioning for my EASY case? I see the only legacy inheritance is left, 
right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 



Re: Simple task with partitioning which I can't realize

2022-03-01 Thread Marc Millas
Andrew,

contrary to Oracle, in postgres you can add the indexes and/or the
constraints which are meaningful to you at partition level.
I was not saying NOT to create keys, but I was saying to create them at
partition level.


Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Tue, Mar 1, 2022 at 5:45 PM Andrew Zakharov  wrote:

> Hi Marc –
>
> Since there is a DWH fed by ETL there no risks to have same gids in
> different region partitions. I considered simple partitioned table w/o any
> keys but I’d believed there is a solutions with keys that’s why I’m seeking
> the clue.
>
> Thanks.
>
> Andrew.
>
>
>
> *From:* Marc Millas 
> *Sent:* Tuesday, March 01, 2022 7:29 PM
> *To:* Andrew Zakharov 
> *Cc:* [email protected]
> *Subject:* Re: Simple task with partitioning which I can't realize
>
>
>
> Hi,
>
>
>
> is there any chance (risk ?) that a given gid be present in more than one
> region ?
>
> if not (or if you implement it via a dedicated, non partition table),
>
>
>
> you may create a simple table partitioned by region, and create unique
> indexes for each partition.
>
> this is NOT equivalent to a unique constraint at global table level, of
> course.
>
>
> Marc MILLAS
>
> Senior Architect
>
> +33607850334
>
> www.mokadb.com
>
>
>
>
>
>
>
> On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov  wrote:
>
> Hello all –
>
> I have a task which is simple at the first look. I have a table which
> contains hierarchy of address objects starting with macro region end ends
> with particular buildings. You can imagine how big is it.
>
> Here is short sample of table declaration:
>
>
>
> create table region_hierarchy(
>
>   gid uuid not null default uuid_generate_v1mc(),
>
>   parent_gid uuid null,
>
>   region_code int2,
>
>   …
>
> constraint pk_region_hierarchy primary key (gid),
>
> constraint fk_region_hierarchy_region_hierarchy_parent foreign key
> (parent_gid) references region_hierarchy(gid)
>
> );
>
>
>
> Being an Oracle specialist, I planned to using same declarative
> partitioning by list on the region_code field as I did in Oracle database.
> I’ve carefully looked thru docs/faqs/google/communities and found out that
> I must include “gid” field into partition key because a primary key field.
> Thus partition method “by list” is not appropriate method in this case and
> “by range” either. What I have left from partition methods? Hash? How can I
> create partitions by gid & region_code by hash? Feasible? Will it be
> working properly (with partition pruning) when search criteria is by
> region_code only? Same problem appears when there is simple serial “id”
> used as primary identifier. Removing all constraints is not considered. I
> understand that such specific PostgreSQL partitioning implementation has
> done by tons of reasons but how I can implement partitioning for my EASY
> case? I see the only legacy inheritance is left, right? Very sad if it’s
> true.
>
> Your advices are very important.
>
> Thanks in advance.
>
> Andrew.
>
>
>
>


RE: Simple task with partitioning which I can't realize

2022-03-01 Thread Andrew Zakharov
Yes, Marc –

I understood you properly and totally. I was just saying about the hope that 
there is a trick to keep constraints on the base table level for my case.

Thanks a bunch.

Andrew.

 

 

On Tue, Mar 01, 2022 at 9:00 PM  Marc Millas  wrote:



Andrew,

 

contrary to Oracle, in postgres you can add the indexes and/or the constraints 
which are meaningful to you at partition level.

I was not saying NOT to create keys, but I was saying to create them at 
partition level.

 




Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com  

 

 

 

On Tue, Mar 1, 2022 at 5:45 PM Andrew Zakharov mailto:[email protected]> > wrote:

Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different 
region partitions. I considered simple partitioned table w/o any keys but I’d 
believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas mailto:[email protected]> > 
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov mailto:[email protected]> >
Cc: [email protected] 
 
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one 
region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes 
for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.




Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com  

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov mailto:[email protected]> > wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains 
hierarchy of address objects starting with macro region end ends with 
particular buildings. You can imagine how big is it. 

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

constraint pk_region_hierarchy primary key (gid),

constraint fk_region_hierarchy_region_hierarchy_parent foreign key 
(parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by 
list on the region_code field as I did in Oracle database. I’ve carefully 
looked thru docs/faqs/google/communities and found out that I must include 
“gid” field into partition key because a primary key field. Thus partition 
method “by list” is not appropriate method in this case and “by range” either. 
What I have left from partition methods? Hash? How can I create partitions by 
gid & region_code by hash? Feasible? Will it be working properly (with 
partition pruning) when search criteria is by region_code only? Same problem 
appears when there is simple serial “id” used as primary identifier. Removing 
all constraints is not considered. I understand that such specific PostgreSQL 
partitioning implementation has done by tons of reasons but how I can implement 
partitioning for my EASY case? I see the only legacy inheritance is left, 
right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 



Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-01 Thread Justin Pryzby
On Tue, Mar 01, 2022 at 04:28:31PM +, [email protected] wrote:
> Now, there is an additional component I think... Storage is on an array and I 
> am not getting a clear answer as to where it is 😊 Is it possible that 
> something is happening at the storage layer? Could that be reported as a 
> network issue vs a storage issue for Postgres?

No.  If there were an error with storage, it'd be reported as a local error,
and the query would fail, rather than failing with client-server communication.

> Also, both machines are actually VMs. I forgot to mention that and not sure 
> if that's relevant.

Are they running on the same hypervisor ?  Is that hyperv ?
Lacking other good hypotheses, that does seem relevant.

-- 
Justin




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-01 Thread Ranier Vilela
Em seg., 28 de fev. de 2022 às 13:50, [email protected] <
[email protected]> escreveu:

>
>
> >From: Ranier Vilela 
> >Sent: Thursday, February 24, 2022 08:46
> >To: Justin Pryzby 
> >Cc: [email protected]; [email protected]
> >Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
> >
> >Em qui., 24 de fev. de 2022 às 09:59, Justin Pryzby  [email protected]> escreveu:
> >On Thu, Feb 24, 2022 at 08:50:45AM -0300, Ranier Vilela wrote:
> >> I can't understand why you are still using 13.4?
> >> [1] There is a long discussion about the issue with 13.4, the project
> was
> >> made to fix a DLL bottleneck.
> >>
> >> Why you not use 13.6?
> >
> >That other problem (and its fix) were in the windows build environment,
> and not
> >an issue in some postgres version.
> >Yeah, correct.
> >But I think that it was very clear in the other thread that version 13.4,
> >on Windows, may have a slowdown, because of the DLL problem.
> >So it would be better to use the latest available version
> >that has this specific fix and many others.
> >
> >regards,
> >Ranier Vilela
>
>
> OK, absolutely. I was thinking about even moving to 14. I know migrations
> within a release are painless, but my experience with upgrading across
> releases has also been quite good (short of bugs that were found of
> course). Any opinion on 14.2?
>
Of course, 14.2 would be better than 13.6, but I think that there are
chances that this specific problem is not beneficial.
And both 13.6 and 14.2 still suffer from a Windows version specific issue
[1].
A solution has been proposed which has not yet been accepted.

But in general terms, you will benefit from adopting 14.2 for sure.

regards,
Ranie Vilela

[1]
https://www.postgresql.org/message-id/CAEudQAovOEM0haC4NbWZaYGW4ESmAE1j6_yr93tS8Xo8i7%2B54A%40mail.gmail.com


Re: Advice needed: query performance deteriorates by 2000% within 1 minute

2022-03-01 Thread Michael Lewis
If you expect to have high cache hits and/or have ssd or similar fast
storage, random page cost should be more like 1-2 rather than the default
4. When using jsonb, you'd normally have estimates based solely on the
constants for the associated datatype (1/3 or 2/3 for a nullable boolean
for instance, and I think half a percent for an int column) but because you
are using an index on a function, you should be getting custom stats
related to that. They just don't seem to be helping you a ton.

With gin indexes, there is also the pending list to consider. I haven't had
to deal with that much, but just know of it from the documentation.


Re: Advice needed: query performance deteriorates by 2000% within 1 minute

2022-03-01 Thread Michael Lewis
And I would absolutely crank up autovacuum and analyze settings. Turn up
the cost limits, turn down the cost delays, decrease the scale factor.
Whatever you need to do such that autovacuum runs often. No need to
schedule a manual vacuum at all. Just don't wait until 20% of the table is
dead before an autovacuum is triggered like the default behavior. The cost
to gather new stats and do garbage collection is rather minimal compared to
the benefit to queries that rely on the data in many cases.


RE: Simple task with partitioning which I can't realize

2022-03-01 Thread Michel SALAIS
De : Marc Millas  
Envoyé : mardi 1 mars 2022 19:00
À : Andrew Zakharov 
Cc : [email protected]
Objet : Re: Simple task with partitioning which I can't realize

 

Andrew,

 

contrary to Oracle, in postgres you can add the indexes and/or the constraints 
which are meaningful to you at partition level.

I was not saying NOT to create keys, but I was saying to create them at 
partition level.

 




Marc MILLAS

Senior Architect

+33607850334

  www.mokadb.com

 

 

 

On Tue, Mar 1, 2022 at 5:45 PM Andrew Zakharov <  
[email protected]> wrote:

Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different 
region partitions. I considered simple partitioned table w/o any keys but I’d 
believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas mailto:[email protected]> > 
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov mailto:[email protected]> >
Cc: [email protected] 
 
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one 
region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes 
for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.




Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com  

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov mailto:[email protected]> > wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains 
hierarchy of address objects starting with macro region end ends with 
particular buildings. You can imagine how big is it. 

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

constraint pk_region_hierarchy primary key (gid),

constraint fk_region_hierarchy_region_hierarchy_parent foreign key 
(parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by 
list on the region_code field as I did in Oracle database. I’ve carefully 
looked thru docs/faqs/google/communities and found out that I must include 
“gid” field into partition key because a primary key field. Thus partition 
method “by list” is not appropriate method in this case and “by range” either. 
What I have left from partition methods? Hash? How can I create partitions by 
gid & region_code by hash? Feasible? Will it be working properly (with 
partition pruning) when search criteria is by region_code only? Same problem 
appears when there is simple serial “id” used as primary identifier. Removing 
all constraints is not considered. I understand that such specific PostgreSQL 
partitioning implementation has done by tons of reasons but how I can implement 
partitioning for my EASY case? I see the only legacy inheritance is left, 
right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 
_

Hi

To say it using Oracle vocabulary, PostgreSQL doesn’t offer GLOBAL INDEXES. 
Even when we create an index on the partitioned table which is now possible, 
PostgreSQL create LOCAL indexes on each partition separately.

There is no global indexes on partitioned tables in PostgreSQL. So it is not 
simple to offer uniqueness at global level using indexes. That is why, it is 
required that partition key columns be part of the primary key AND any other 
UNIQE constraint.

 

Michel SALAIS