Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Hi all, especially Jim Jarvie, I saw your email to me only now on my related issue. My issue remains this one: Well this got interesting - the already moved error showed up: and I have already gone through all those index pruning and all that good stuff. I remain with my original question from 30th of June, to me it feels like a bug of some sort: "tuple to be locked was already moved to another partition due to concurrent update" This would not exactly look like a bug, because the message says "to be locked", so at least it's not allowing two workers to lock the same tuple. But it seems that the skip-locked mode should not make an error out of this, but treat it as the tuple was already locked. Why would it want to lock the tuple (representing the job) if another worker has already finished his UPDATE of the job to mark it as "done" (which is what makes the tuple move to the "completed" partition.) Either the SELECT for jobs to do returned a wrong tuple, which was already updated, or there is some lapse in the locking. Either way it would seem to be a waste of time throwing all these errors when the tuple should not even have been selected for update and locking. I wonder if anybody knows anything about that issue? Of course you'll want to see the DDL and SQL queries, etc. but you can't really try it out unless you do some massively parallel magic. I still think that it should simply not happen. Don't waste time on old tuples trying to fetch and lock something that's no longer there. It's a waste of resources. regards, -Gunther On 8/20/2020 6:39 PM, Jim Jarvie wrote: On 20-Aug.-2020 17:42, Michael Lewis wrote: Can you share an explain analyze for the query that does the select for update? I wouldn't assume that partition pruning is possible at all with hash, and it would be interesting to see how it is finding those rows. Well this got interesting - the already moved error showed up: Note, the actual process partitions are regular table partitions, these are not hashed. Only the incoming and completed are hashed due to row counts at either end of the processing; in flight (where the issue shows up) is quite small: [queuedb] # explain analyze select queueid,txobject,objectid,state from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') and txobject = 'ticket' limit 250 for update skip locked; ERROR: 40001: tuple to be locked was already moved to another partition due to concurrent update LOCATION: heapam_tuple_lock, heapam_handler.c:405 Time: 579.131 ms [queuedb] # explain analyze select queueid,txobject,objectid,state from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') and txobject = 'ticket' limit 250 for update skip locked; ERROR: 40001: tuple to be locked was already moved to another partition due to concurrent update LOCATION: heapam_tuple_lock, heapam_handler.c:405 Time: 568.008 ms [queuedb] # explain analyze select queueid,txobject,objectid,state from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') and txobject = 'ticket' limit 250 for update skip locked; QUERY PLAN -- Limit (cost=0.00..25.71 rows=250 width=34) (actual time=1306.041..1306.338 rows=250 loops=1) -> LockRows (cost=0.00..7934.38 rows=77150 width=34) (actual time=1306.040..1306.315 rows=250 loops=1) -> Append (cost=0.00..7162.88 rows=77150 width=34) (actual time=520.685..1148.347 rows=31500 loops=1) -> Seq Scan on queue_tx_active (cost=0.00..6764.50 rows=77000 width=34) (actual time=520.683..1145.258 rows=31500 loops=1) Filter: ((txobject = 'ticket'::mq.queue_object) AND ((state = 'tx_active'::mq.tx_state) OR (state = 'tx_fail_retryable'::mq.tx_state))) -> Seq Scan on queue_tx_fail_retryable (cost=0.00..12.62 rows=150 width=34) (never executed) Filter: ((txobject = 'ticket'::mq.queue_object) AND ((state = 'tx_active'::mq.tx_state) OR (state = 'tx_fail_retryable'::mq.tx_state))) Planning Time: 0.274 ms Execution Time: 1306.380 ms (9 rows) Time: 1317.150 ms (00:01.317) [queuedb] #
Re: LIMIT OFFSET with DB view vs plain SQL
+ pgsql-performance On Thu, Mar 28, 2019 at 6:41 PM Raj Gandhi wrote: > Hi everyone, > > > > I’m using LIMIT offset with DB view. Looks like query planner is applying > the LIMIT for DB view at the end after processing all rows. > > When running same SQL that was used to create the DB view, LIMIT is > applied earlier so the query is much faster. > > > > Explain plan using DB view > > https://explain.depesz.com/s/gzjQ > > > > Explain plan using raw SQL > > https://explain.depesz.com/s/KgwO > > > > In both tests LIMIT was 100 with offset = 0. > > Is there any way to force DB view to apply limit earlier? > > > > Thanks, > > Raj >
Re: LIMIT OFFSET with DB view vs plain SQL
Merlin, I tried the hack you suggested but that didn't work. Planner used
the same path.
The same query works much faster when using the raw SQL instead of DB view:
Here is the definition of DB View ‘job’
SELECT w.id,
w.parent_id,
w.status AS state,
w.percent_complete AS progress_percentage,
w.start_time,
w.end_time,
w.est_completion_time AS estimated_completion_time,
w.root_id,
w.internal AS is_internal,
w.order_id AS step_order,
c.resource_type,
c.resource_id,
c.id AS command_id,
c.client_cookie,
c.user_name AS "user",
c.metadata,
c.client_address,
response_body(r.*, w.*) AS response_body
FROM work_unit w
LEFT JOIN command c ON c.work_unit_id = w.id
LEFT JOIN command_response r ON r.command_id::text = c.id::text;
*Query that uses the DB view:*
SELECT id, start_time
FROM job
order by id LIMIT 101 OFFSET 0;
Explain plan: https://explain.depesz.com/s/gzjQ
*Query using the raw SQL*
ORDER BY id LIMIT 101 OFFSET 0;
Explain plan:https://explain.depesz.com/s/KgwO
On Fri, Mar 29, 2019 at 11:26 AM Merlin Moncure wrote:
> On Thu, Mar 28, 2019 at 5:44 PM Raj Gandhi wrote:
> >
> > + pgsql-performance
> >
> > On Thu, Mar 28, 2019 at 6:41 PM Raj Gandhi
> wrote:
> >>
> >> Hi everyone,
> >>
> >>
> >>
> >> I’m using LIMIT offset with DB view. Looks like query planner is
> applying the LIMIT for DB view at the end after processing all rows.
> >>
> >> When running same SQL that was used to create the DB view, LIMIT is
> applied earlier so the query is much faster.
> >>
> >>
> >>
> >> Explain plan using DB view
> >>
> >> https://explain.depesz.com/s/gzjQ
> >>
> >>
> >>
> >> Explain plan using raw SQL
> >>
> >> https://explain.depesz.com/s/KgwO
> >>
> >>
> >>
> >> In both tests LIMIT was 100 with offset = 0.
> >>
> >> Is there any way to force DB view to apply limit earlier?
>
> huh. OFFSET does indeed force a materialize plan. This is a widely
> used tactic to hack the planner ('OFFSET 0').
>
> Maybe try converting your query from something like:
>
> SELECT * FROM foo LIMIT m OFFSET N;
> to
> WITH data AS
> (
> SELECT * FROM foo LIMIT m + n
> )
> SELECT * FROM foo OFFSET n;
>
> I didn't try this, and it may not help, but it's worth a shot.
>
> merlin
>
Re: LIMIT OFFSET with DB view vs plain SQL
Any other idea how to resolve the performance issue with the database view?
On Fri, Mar 29, 2019 at 7:38 PM Raj Gandhi wrote:
> Merlin, I tried the hack you suggested but that didn't work. Planner used
> the same path.
>
> The same query works much faster when using the raw SQL instead of DB
> view:
>
> Here is the definition of DB View ‘job’
>
> SELECT w.id,
>
> w.parent_id,
>
> w.status AS state,
>
> w.percent_complete AS progress_percentage,
>
> w.start_time,
>
> w.end_time,
>
> w.est_completion_time AS estimated_completion_time,
>
> w.root_id,
>
> w.internal AS is_internal,
>
> w.order_id AS step_order,
>
> c.resource_type,
>
> c.resource_id,
>
> c.id AS command_id,
>
> c.client_cookie,
>
> c.user_name AS "user",
>
> c.metadata,
>
> c.client_address,
>
> response_body(r.*, w.*) AS response_body
>
>FROM work_unit w
>
> LEFT JOIN command c ON c.work_unit_id = w.id
>
> LEFT JOIN command_response r ON r.command_id::text = c.id::text;
>
>
>
>
>
> *Query that uses the DB view:*
>
> SELECT id, start_time
>
> FROM job
>
> order by id LIMIT 101 OFFSET 0;
>
>
>
> Explain plan: https://explain.depesz.com/s/gzjQ
>
>
> *Query using the raw SQL*
>
>
>
> ORDER BY id LIMIT 101 OFFSET 0;
>
>
>
> Explain plan:https://explain.depesz.com/s/KgwO
>
>
>
>
>
> On Fri, Mar 29, 2019 at 11:26 AM Merlin Moncure
> wrote:
>
>> On Thu, Mar 28, 2019 at 5:44 PM Raj Gandhi wrote:
>> >
>> > + pgsql-performance
>> >
>> > On Thu, Mar 28, 2019 at 6:41 PM Raj Gandhi
>> wrote:
>> >>
>> >> Hi everyone,
>> >>
>> >>
>> >>
>> >> I’m using LIMIT offset with DB view. Looks like query planner is
>> applying the LIMIT for DB view at the end after processing all rows.
>> >>
>> >> When running same SQL that was used to create the DB view, LIMIT is
>> applied earlier so the query is much faster.
>> >>
>> >>
>> >>
>> >> Explain plan using DB view
>> >>
>> >> https://explain.depesz.com/s/gzjQ
>> >>
>> >>
>> >>
>> >> Explain plan using raw SQL
>> >>
>> >> https://explain.depesz.com/s/KgwO
>> >>
>> >>
>> >>
>> >> In both tests LIMIT was 100 with offset = 0.
>> >>
>> >> Is there any way to force DB view to apply limit earlier?
>>
>> huh. OFFSET does indeed force a materialize plan. This is a widely
>> used tactic to hack the planner ('OFFSET 0').
>>
>> Maybe try converting your query from something like:
>>
>> SELECT * FROM foo LIMIT m OFFSET N;
>> to
>> WITH data AS
>> (
>> SELECT * FROM foo LIMIT m + n
>> )
>> SELECT * FROM foo OFFSET n;
>>
>> I didn't try this, and it may not help, but it's worth a shot.
>>
>> merlin
>>
>
Re: LIMIT OFFSET with DB view vs plain SQL
Thanks Rui. The performance of using function is close to the plain SQL.
Why Query planner is choosing different path with DB view?
explain analyze select foo(101,0);
QUERY
PLAN
ProjectSet (cost=0.00..5.27 rows=1000 width=32) (actual
time=10.340..10.374 rows=101 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001
rows=1 loops=1)
Planning time: 0.035 ms
Execution time: 10.436 ms
(4 rows)
On Mon, Apr 1, 2019 at 4:14 PM Rui DeSousa wrote:
> Try using a function that returns the result set.
>
> i.e.
>
> create or replace function foo(_limit int, _offset int)
> returns setof sample_table
> as $$
> begin
> return query
> select *
> from sample_table
> order by created_date
> limit _limit
> offset _offset
> ;
> end;
> $$ language plpgsql
> volatile
> ;
>
>
> Given your query; return a table instead of a set. i.e.:
>
> returns table (
>id int
> , parent_id int
> .
> .
> .
> , response_body text
> )
> as $$
>
>
> Query example:
>
> select * from foo(100, 50);
>
>
> On Apr 1, 2019, at 9:56 AM, Raj Gandhi wrote:
>
> Any other idea how to resolve the performance issue with the database view?
>
> On Fri, Mar 29, 2019 at 7:38 PM Raj Gandhi wrote:
>
>> Merlin, I tried the hack you suggested but that didn't work. Planner
>> used the same path.
>>
>> The same query works much faster when using the raw SQL instead of DB
>> view:
>>
>> Here is the definition of DB View ‘job’
>> SELECT w.id,
>> w.parent_id,
>> w.status AS state,
>> w.percent_complete AS progress_percentage,
>> w.start_time,
>> w.end_time,
>> w.est_completion_time AS estimated_completion_time,
>> w.root_id,
>> w.internal AS is_internal,
>> w.order_id AS step_order,
>> c.resource_type,
>> c.resource_id,
>> c.id AS command_id,
>> c.client_cookie,
>> c.user_name AS "user",
>> c.metadata,
>> c.client_address,
>> response_body(r.*, w.*) AS response_body
>>FROM work_unit w
>> LEFT JOIN command c ON c.work_unit_id = w.id
>> LEFT JOIN command_response r ON r.command_id::text = c.id::text;
>>
>>
>>
>>
>> *Query that uses the DB view:*
>> SELECT id, start_time
>> FROM job
>> order by id LIMIT 101 OFFSET 0;
>>
>>
>> Explain plan: https://explain.depesz.com/s/gzjQ
>>
>> *Query using the raw SQL*
>>
>> ORDER BY id LIMIT 101 OFFSET 0;
>>
>>
>> Explain plan:https://explain.depesz.com/s/KgwO
>>
>>
>>
>>
>> On Fri, Mar 29, 2019 at 11:26 AM Merlin Moncure
>> wrote:
>>
>>> On Thu, Mar 28, 2019 at 5:44 PM Raj Gandhi
>>> wrote:
>>> >
>>> > + pgsql-performance
>>> >
>>> > On Thu, Mar 28, 2019 at 6:41 PM Raj Gandhi
>>> wrote:
>>> >>
>>> >> Hi everyone,
>>> >>
>>> >>
>>> >>
>>> >> I’m using LIMIT offset with DB view. Looks like query planner is
>>> applying the LIMIT for DB view at the end after processing all rows.
>>> >>
>>> >> When running same SQL that was used to create the DB view, LIMIT is
>>> applied earlier so the query is much faster.
>>> >>
>>> >>
>>> >>
>>> >> Explain plan using DB view
>>> >>
>>> >> https://explain.depesz.com/s/gzjQ
>>> >>
>>> >>
>>> >>
>>> >> Explain plan using raw SQL
>>> >>
>>> >> https://explain.depesz.com/s/KgwO
>>> >>
>>> >>
>>> >>
>>> >> In both tests LIMIT was 100 with offset = 0.
>>> >>
>>> >> Is there any way to force DB view to apply limit earlier?
>>>
>>> huh. OFFSET does indeed force a materialize plan. This is a widely
>>> used tactic to hack the planner ('OFFSET 0').
>>>
>>> Maybe try converting your query from something like:
>>>
>>> SELECT * FROM foo LIMIT m OFFSET N;
>>> to
>>> WITH data AS
>>> (
>>> SELECT * FROM foo LIMIT m + n
>>> )
>>> SELECT * FROM foo OFFSET n;
>>>
>>> I didn't try this, and it may not help, but it's worth a shot.
>>>
>>> merlin
>>>
>>
>
Re: Slow query on V12.
Hi, Can you check by vacuum analyze the database. And run the query. **Remember don't use Vacuum full. On Tue, 24 Sep 2019, 12:07 am Luís Roberto Weck, < [email protected]> wrote: > Hi! > > Recently I've been looking for bloat in my databases and found a query to > show which tables are more bloated and by how much. > > This is the explain plan on v12.3: https://explain.depesz.com/s/8dW8C > And this is with v11: https://explain.depesz.com/s/diXY > > Both databases have approx. the same size and have the same schema, but on > v12 I the query takes much longer to run. > > >
Query went slow all of sudden. ON V 11.3
Hi All, All of sudden the query went slow before the query was executing in 30- 35 sec now even after 30 mins i am not getting any result. later I have dropped a table ( t_meners) and recreated it and again it started working very fast. is there way to find what happen on that why is not any issue in table how to find out. i Have the same issue on the other databases also so that i can check on it SELECT ((UID-1)/1) AS BatchNo, * INTO "temp_tt1" FROM (SELECT ROW_NUMBER() OVER ( ORDER BY a."rno") AS UID, a.* FROM "temp_10032019020721_4470" AS a INNER JOIN "t_ages" AS b ON LOWER(a."cr") = LOWER(b."c_pagealias") LEFT JOIN "t_meners" AS c ON LOWER(a."cr") = LOWER(c."c_id") WHERE c."c_id" IS NULL ) AS TempTable
Re: Query went slow all of sudden. ON V 11.3
Hi Justin, Its been executing for 35 + mins due to statement time out its getting canceled. Yes temp_10032019020721_4470table index is there on cr column. On Fri, Oct 4, 2019 at 6:50 PM Justin Pryzby wrote: > On Fri, Oct 04, 2019 at 03:52:26PM +0530, nikhil raj wrote: > > Hi All, > > > > All of sudden the query went slow before the query was executing in 30- > 35 > > sec now even after 30 mins i am not getting any result. > > Can you show "explain(analyze,buffers)" when it's running fast, and at > least > "explain" when it's slow ? > > > later I have dropped a table ( t_meners) and recreated it and again it > > started working very fast. > > What indexes exist on that table and on temp_10032019020721_4470 ? > > Justin >
Re: DB running out of memory issues after upgrade
Below are the same configurations ins .conf file before and after updagrade show max_connections; = 1743show shared_buffers = "4057840kB"show effective_cache_size = "8115688kB"show maintenance_work_mem = "259MB"show checkpoint_completion_target = "0.9"show wal_buffers = "16MB"show default_statistics_target = "100"show random_page_cost = "1.1"show effective_io_concurrency =" 200"show work_mem = "4MB"show min_wal_size = "256MB"show max_wal_size = "2GB"show max_worker_processes = "8"show max_parallel_workers_per_gather = "2" here is some sys logs, 2020-02-16 21:01:17 UTC [-]The database process was killed by the OS due to excessive memory consumption. 2020-02-16 13:41:16 UTC [-]The database process was killed by the OS due to excessive memory consumption. I identified one simple select which consuming more memory and here is the query plan, "Result (cost=0.00..94891854.11 rows=3160784900 width=288)"" -> Append (cost=0.00..47480080.61 rows=3160784900 width=288)"" -> Seq Scan on msghist (cost=0.00..15682777.12 rows=312949 width=288)"" Filter: (((data -> 'info'::text) ->> 'status'::text) = 'CLOSE'::text)"" -> Seq Scan on msghist msghist_1 (cost=0.00..189454.50 rows=31294900 width=288)"" Filter: (((data -> 'info'::text) ->> 'status'::text) = 'CLOSE'::text)" Thanks, On Tuesday, February 18, 2020, 09:59:37 AM PST, Tomas Vondra wrote: On Tue, Feb 18, 2020 at 05:46:28PM +, Nagaraj Raj wrote: >after upgrade Postgres to v9.6.11 from v9.6.9 DB running out of memory issues >no world load has changed before and after upgrade. > >spec: RAM 16gb,4vCore >Any bug reported like this or suggestions on how to fix this issue? I >appreciate the response..!! > This bug report (in fact, we don't know if it's a bug, but OK) is woefully incomplete :-( The server log is mostly useless, unfortunately - it just says a bunch of processes were killed (by OOM killer, most likely) so the server has to restart. It tells us nothing about why the backends consumed so much memory etc. What would help us is knowing how much memory was the backend (killed by OOM) consuming, which should be in dmesg. And then MemoryContextStats output - you need to connect to a backend consuming a lot of memory using gdb (before it gets killed) and do (gdb) p MemoryContextStats(TopMemoryContext) (gdb) q and show us the output printed into server log. If it's a backend running a query, it'd help knowing the execution plan. It would also help knowing the non-default configuration, i.e. stuff tweaked in postgresql.conf. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: DB running out of memory issues after upgrade
Hi Merlin, Its configured high value for max_conn, but active and idle session have never crossed the count 50. DB Size: 20 GBTable size: 30MBRAM: 16GBvC: 4 yes, its view earlier I posted and here is there query planner for new actual view, "Append (cost=0.00..47979735.57 rows=3194327000 width=288)"" -> Seq Scan on msghist (cost=0.00..15847101.30 rows=316270 width=288)"" -> Seq Scan on msghist msghist_1 (cost=0.00..189364.27 rows=31627000 width=288)" Thanks,RjOn Tuesday, February 18, 2020, 10:51:02 AM PST, Merlin Moncure wrote: On Tue, Feb 18, 2020 at 12:40 PM Justin Pryzby wrote: > This is almost certainly unrelated. It looks like that query did a seq scan > and accessed a large number of tuples (and pages from "shared_buffers"), which > the OS then shows as part of that processes memory, even though *shared* > buffers are not specific to that one process. Yeah. This server looks highly overprovisioned, I'm in particularly suspicious of the high max_connections setting. To fetch this out I'd be tracking connections in the database, both idle and not idle, continuously. The solution is most likely to install a connection pooler such as pgbouncer. merlin
Performance issue
efault" NOT NULL, orphan_ind character(1) COLLATE pg_catalog."default" NOT NULL, CONSTRAINT pk_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id) ); User query: explain analyze select COUNT(*) as "DII_1" from ( select distinct table0."individual_entity_proxy_id" as "INDIVIDUAL_ENTITY_PROXY_ID" from test1 table0 inner join test2 table1 on table0."individual_entity_proxy_id" = table1."individual_entity_proxy_id" where ((table0."shared_paddr_with_customer_ind" = 'N') and (table0."profane_wrd_ind" = 'N') and (table0."tmo_ofnsv_name_ind" = 'N') and ((table0."last_contacted_dm_datetime" is null) or (table0."last_contacted_dm_datetime" < TIMESTAMP '2020-03-15 0:00:00.00')) and (table0."has_individual_address" = 'Y') and (table0."has_last_name" = 'Y') and (table0."has_first_name" = 'Y') and (table0."direct_mail_preference" is null)) and ((table1."tax_bnkrpt_dcsd_ind" = 'N') and (table1."cstmr_prspct_ind" = 'Prospect') and (table1."govt_prison_ind" = 'N')) ) TXT_1; Explain Analyze : "Aggregate (cost=5345632.91..5345632.92 rows=1 width=8) (actual time=442688.462..442688.462 rows=1 loops=1)" " -> Unique (cost=150.13..4943749.39 rows=32150682 width=8) (actual time=0.022..439964.214 rows=32368180 loops=1)" "-> Merge Join (cost=150.13..4863372.68 rows=32150682 width=8) (actual time=0.021..435818.276 rows=32368180 loops=1)" " Merge Cond: (table0.individual_entity_proxy_id = table1.individual_entity_proxy_id)" " -> Index Scan using indx_prospect_indv_entty_id on test1 table0 (cost=0.56..2493461.92 rows=32233405 width=8) (actual time=0.011..63009.551 rows=32368180 loops=1)" "Filter: ((direct_mail_preference IS NULL) AND ((last_contacted_dm_datetime IS NULL) OR (last_contacted_dm_datetime < '2020-03-15 00:00:00'::timestamp without time zone)) AND (shared_paddr_with_customer_ind = 'N'::bpchar) AND (profane_wrd_ind = 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND (has_individual_address = 'Y'::bpchar) AND (has_last_name = 'Y'::bpchar) AND (has_first_name = 'Y'::bpchar))" "Rows Removed by Filter: 7709177" " -> Index Scan using pk_entity_proxy_id on test2 table1 (cost=0.56..1867677.94 rows=40071417 width=8) (actual time=0.008..363534.437 rows=40077727 loops=1)" "Filter: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND (govt_prison_ind = 'N'::bpchar) AND ((cstmr_prspct_ind)::text = 'Prospect'::text))" "Rows Removed by Filter: 94756" "Planning time: 0.400 ms" "Execution time: 442688.523 ms" Server config: PostgreSQL v10.11 RAM: 380GB vCore: 32 Shared_buffers: 65GB work_mem:104857kB maintenance_work_mem:256MB effective_cache_size: 160GB https://dba.stackexchange.com/questions/269138/postgresql-server-running-very-slow-at-minimal-work-load Thanks, Raj
simple query running for ever
I wrote a simple query, and it is taking too long, not sure what is wrong in it, even its not giving EXPLAIN ANALYZE. select T0."physical_address_sid", T0."individual_entity_proxy_id", T2."infrrd_hh_rank_nbr" from "cms_prospects".PROSPECT T0 --inner join "sas_prs_tmp".DEDUPE3583E3F18 T1 on T0."individual_entity_proxy_id" = T1."individual_entity_proxy_id" left join "cms_prospects".INDIVIDUAL_DEMOGRAPHIC T2 on T0."individual_entity_proxy_id" = T2."individual_entity_proxy_id" "Merge Left Join (cost=55.96..18147747.08 rows=213620928 width=20)" " Merge Cond: (t0.individual_entity_proxy_id = t2.individual_entity_proxy_id)" " -> Index Scan using pk_prospect on prospect t0 (cost=0.57..10831606.89 rows=213620928 width=16)" " -> Index Only Scan using indxp_individual_demo_infrrd_hh_rank_nbr on individual_demographic t2 (cost=0.57..5013756.93 rows=260652064 width=12)" Any suggestions or help would be highly appreciated. Best regards, Rj
Re: simple query running for ever
Hi Justin, My apologies, I missed that. Yes, I change work mem to 2GB but didn't see any difference. So, as your suggestion removed the distinct on pk and added a multi-column index so query planner did index-only can that is fixed the issue and query completed in 1Min. Best regards, Rj On Tuesday, June 16, 2020, 01:47:21 PM PDT, Justin Pryzby wrote: On Tue, Jun 16, 2020 at 08:35:31PM +, Nagaraj Raj wrote: > I wrote a simple query, and it is taking too long, not sure what is wrong in > it, even its not giving EXPLAIN ANALYZE. Is this related to last week's question ? https://www.postgresql.org/message-id/1211705382.726951.1592174752720%40mail.yahoo.com Was that issue resolved ? I didn't see answers to a few questions I asked there. > select T0."physical_address_sid", T0."individual_entity_proxy_id", > T2."infrrd_hh_rank_nbr" > from "cms_prospects".PROSPECT T0 > --inner join "sas_prs_tmp".DEDUPE3583E3F18 T1 on > T0."individual_entity_proxy_id" = T1."individual_entity_proxy_id" > left join "cms_prospects".INDIVIDUAL_DEMOGRAPHIC T2 on > T0."individual_entity_proxy_id" = T2."individual_entity_proxy_id" > > "Merge Left Join (cost=55.96..18147747.08 rows=213620928 width=20)" > " Merge Cond: (t0.individual_entity_proxy_id = > t2.individual_entity_proxy_id)" > " -> Index Scan using pk_prospect on prospect t0 (cost=0.57..10831606.89 > rows=213620928 width=16)" > " -> Index Only Scan using indxp_individual_demo_infrrd_hh_rank_nbr on > individual_demographic t2 (cost=0.57..5013756.93 rows=260652064 width=12)" > > Any suggestions or help would be highly appreciated. > > Best regards, > Rj
Re: simple query running for ever
_code character varying(40) COLLATE pg_catalog."default", taco_bell_diner_propnsty_code character varying(40) COLLATE pg_catalog."default", auto_insrnc_byr_propnsty_code character varying(40) COLLATE pg_catalog."default", load_dttm timestamp without time zone NOT NULL, updt_dttm timestamp without time zone, md5_chk_sum character varying(200) COLLATE pg_catalog."default", deld_from_src_ind character(1) COLLATE pg_catalog."default" NOT NULL, orphan_ind character(1) COLLATE pg_catalog."default" NOT NULL, CONSTRAINT pk_individual_demographic PRIMARY KEY (individual_entity_proxy_id) ); Server config: PostgreSQL v10.11RAM: 380GB vCore: 32 Shared_buffers: 65G Bwork_mem:104857kB maintenance_work_mem:256MB effective_cache_size: 160GB On Tuesday, June 16, 2020, 01:44:09 PM PDT, Michael Lewis wrote: On Tue, Jun 16, 2020 at 2:35 PM Nagaraj Raj wrote: I wrote a simple query, and it is taking too long, not sure what is wrong in it, even its not giving EXPLAIN ANALYZE. More context is needed. Please review- https://wiki.postgresql.org/wiki/Slow_Query_Questions
Re: simple query running for ever
And here is the explain analyze: https://explain.depesz.com/s/uQGA Thanks! On Tuesday, June 16, 2020, 02:13:37 PM PDT, Nagaraj Raj wrote: Hi Michael, Sorry, I missed table structure, explain select T0."physical_address_sid", T0."individual_entity_proxy_id", T2."infrrd_hh_rank_nbr" from "cms_prospects".PROSPECT T0 inner join public.t1680035748gcccqqdpmrblxp33_bkp T1 on T0."individual_entity_proxy_id" = T1."individual_entity_proxy_id" left join "cms_prospects".INDIVIDUAL_DEMOGRAPHIC T2 on T0."individual_entity_proxy_id" = T2."individual_entity_proxy_id"; "Hash Join (cost=1417.48..21353422.52 rows=213620928 width=20)" " Hash Cond: ((t0.individual_entity_proxy_id)::numeric = t1.individual_entity_proxy_id)" " -> Merge Left Join (cost=55.96..18147747.08 rows=213620928 width=20)" " Merge Cond: (t0.individual_entity_proxy_id = t2.individual_entity_proxy_id)" " -> Index Scan using pk_prospect on prospect t0 (cost=0.57..10831606.89 rows=213620928 width=16)" " -> Index Only Scan using indxp_individual_demo_infrrd_hh_rank_nbr on individual_demographic t2 (cost=0.57..5013756.93 rows=260652064 width=12)" " -> Hash (cost=741.79..741.79 rows=49579 width=8)" " -> Seq Scan on t1680035748gcccqqdpmrblxp33_bkp t1 (cost=0.00..741.79 rows=49579 width=8)" --T0 CREATE TABLE cms_prospects.prospect ( individual_entity_proxy_id bigint NOT NULL, household_entity_proxy_id bigint, individual_personal_link_sid bigint NOT NULL, city_name character varying(100) COLLATE pg_catalog."default", state_prov_cd character varying(40) COLLATE pg_catalog."default", pstl_code character varying(40) COLLATE pg_catalog."default", npa integer, nxx integer, email_domain character varying(400) COLLATE pg_catalog."default", email_preference character varying(40) COLLATE pg_catalog."default", direct_mail_preference character varying(40) COLLATE pg_catalog."default", profane_wrd_ind character(1) COLLATE pg_catalog."default", tmo_ofnsv_name_ind character(1) COLLATE pg_catalog."default", census_block_id character varying(40) COLLATE pg_catalog."default", has_first_name character(1) COLLATE pg_catalog."default", has_middle_name character(1) COLLATE pg_catalog."default", has_last_name character(1) COLLATE pg_catalog."default", has_email_address character(1) COLLATE pg_catalog."default", has_individual_address character(1) COLLATE pg_catalog."default", email_address_sid bigint, person_name_sid bigint, physical_address_sid bigint, telephone_number_sid bigint, last_contacted_email_datetime timestamp without time zone, last_contacted_dm_datetime timestamp without time zone, last_contacted_digital_datetime timestamp without time zone, last_contacted_anychannel_dttm timestamp without time zone, hard_bounce_ind integer, closest_store_site_id1 character varying(40) COLLATE pg_catalog."default", distance_1 numeric(5,2), load_dttm timestamp without time zone NOT NULL, updt_dttm timestamp without time zone, md5_chk_sum character varying(200) COLLATE pg_catalog."default", deld_from_src_ind character(1) COLLATE pg_catalog."default" NOT NULL, orphan_ind character(1) COLLATE pg_catalog."default" NOT NULL, CONSTRAINT pk_prospect PRIMARY KEY (individual_entity_proxy_id) ); --T1 CREATE TABLE public.t1680035748gcccqqdpmrblxp33_bkp( individual_entity_proxy_id numeric(20,0)); -- T2 CREATE TABLE cms_prospects.individual_demographic ( individual_entity_proxy_id bigint NOT NULL, cstmr_prspct_ind character varying(40) COLLATE pg_catalog."default", last_appnd_dttm timestamp without time zone, last_sprsn_dttm timestamp without time zone, infrrd_gender_code character varying(40) COLLATE pg_catalog."default", govt_prison_ind character(1) COLLATE pg_catalog."default", tax_bnkrpt_dcsd_ind character(1) COLLATE pg_catalog."default", underbank_rank_nbr integer, hvy_txn_rank_nbr integer, prominence_nbr integer, ocptn_code character varying(40) COLLATE pg_catalog."default", educ_lvl_nbr integer, gender_code character varying(40) COLLATE pg_catalog."default", infrrd_hh_rank_nbr integer, econmc_stable_nbr integer, directv_sbscrbr_propnsty_code character varying(40) COLLATE pg_catalog."default", amazon_prm_propnsty_code character varying(40) COLLATE pg_catalog."default", iphone_user_propnsty_code character varying(40) COLLATE pg_catalog."default", smrt_hm_devc_propnsty_code character varying(40) COLLATE pg_catalog."default", dog_ownr_propnsty_code character varying(40) COLLATE pg_catalog."default", cat_ownr_propnsty_code character varying(40)
Query performance issue
I have a query which will more often run on DB and very slow and it is doing 'seqscan'. I was trying to optimize it by adding indexes in different ways but nothing helps. Any suggestions? Query: EXPALIN ANALYZE select serial_no,receivingplant,sku,r3_eventtime from (select serial_no,receivingplant,sku,eventtime as r3_eventtime, row_number() over (partition by serial_no order by eventtime desc) as mpos from receiving_item_delivered_received where eventtype='LineItemdetailsReceived'and replenishmenttype = 'DC2SWARRANTY'and coalesce(serial_no,'') <> '') Rec where mpos = 1; Query Planner: "Subquery Scan on rec (cost=70835.30..82275.49 rows=1760 width=39) (actual time=2322.999..3451.783 rows=333451 loops=1)"" Filter: (rec.mpos = 1)"" Rows Removed by Filter: 19900"" -> WindowAgg (cost=70835.30..77875.42 rows=352006 width=47) (actual time=2322.997..3414.384 rows=353351 loops=1)"" -> Sort (cost=70835.30..71715.31 rows=352006 width=39) (actual time=2322.983..3190.090 rows=353351 loops=1)"" Sort Key: receiving_item_delivered_received.serial_no, receiving_item_delivered_received.eventtime DESC"" Sort Method: external merge Disk: 17424kB"" -> Seq Scan on receiving_item_delivered_received (cost=0.00..28777.82 rows=352006 width=39) (actual time=0.011..184.677 rows=353351 loops=1)"" Filter: (((COALESCE(serial_no, ''::character varying))::text <> ''::text) AND ((eventtype)::text = 'LineItemdetailsReceived'::text) AND ((replenishmenttype)::text = 'DC2SWARRANTY'::text))"" Rows Removed by Filter: 55953""Planning Time: 0.197 ms""Execution Time: 3466.985 ms" Table DDL: CREATE TABLE receiving_item_delivered_received( load_dttm timestamp with time zone, iamuniqueid character varying(200) , batchid character varying(200) , eventid character varying(200) , eventtype character varying(200) , eventversion character varying(200) , eventtime timestamp with time zone, eventproducerid character varying(200) , deliverynumber character varying(200) , activityid character varying(200) , applicationid character varying(200) , channelid character varying(200) , interactionid character varying(200) , sessionid character varying(200) , receivingplant character varying(200) , deliverydate date, shipmentdate date, shippingpoint character varying(200) , replenishmenttype character varying(200) , numberofpackages character varying(200) , carrier_id character varying(200) , carrier_name character varying(200) , billoflading character varying(200) , pro_no character varying(200) , partner_id character varying(200) , deliveryitem character varying(200) , ponumber character varying(200) , poitem character varying(200) , tracking_no character varying(200) , serial_no character varying(200) , sto_no character varying(200) , sim_no character varying(200) , sku character varying(200) , quantity numeric(15,2), uom character varying(200) ); -- Index: receiving_item_delivered_rece_eventtype_replenishmenttype_c_idx -- DROP INDEX receiving_item_delivered_rece_eventtype_replenishmenttype_c_idx; CREATE INDEX receiving_item_delivered_rece_eventtype_replenishmenttype_c_idx ON receiving_item_delivered_received USING btree (eventtype , replenishmenttype , COALESCE(serial_no, ''::character varying) ) ;-- Index: receiving_item_delivered_rece_serial_no_eventtype_replenish_idx -- DROP INDEX receiving_item_delivered_rece_serial_no_eventtype_replenish_idx; CREATE INDEX receiving_item_delivered_rece_serial_no_eventtype_replenish_idx ON receiving_item_delivered_received USING btree (serial_no , eventtype , replenishmenttype ) WHERE eventtype::text = 'LineItemdetailsReceived'::text AND replenishmenttype::text = 'DC2SWARRANTY'::text AND COALESCE(serial_no, ''::character varying)::text <> ''::text;-- Index: receiving_item_delivered_recei_eventtype_replenishmenttype_idx1 -- DROP INDEX receiving_item_delivered_recei_eventtype_replenishmenttype_idx1; CREATE INDEX receiving_item_delivered_recei_eventtype_replenishmenttype_idx1 ON receiving_item_delivered_received USING btree (eventtype , replenishmenttype ) WHERE eventtype::text = 'LineItemdetailsReceived'::text AND replenishmenttype::text = 'DC2SWARRANTY'::text;-- Index: receiving_item_delivered_receiv_eventtype_replenishmenttype_idx -- DROP INDEX receiving_item_delivered_receiv_eventtype_replenishmenttype_idx; CREATE INDEX receiving_item_delivered_receiv_eventtype_replenishmenttype_idx ON receiving_item_delivered_received USING btree (eventtype , replenishmenttype ) ;-- Index: receiving_item_delivered_received_eventtype_idx -- DROP INDEX receiving_item_delivered_received_eventtype_idx; CREATE INDEX receiving_item_delivered_received_eventtype_idx ON receiving_item_delivered_received USIN
Re: Query performance issue
query planner:SPJe | explain.depesz.com | | | | SPJe | explain.depesz.com | | | On Friday, September 4, 2020, 02:19:06 PM PDT, Nagaraj Raj wrote: I have a query which will more often run on DB and very slow and it is doing 'seqscan'. I was trying to optimize it by adding indexes in different ways but nothing helps. Any suggestions? Query: EXPALIN ANALYZE select serial_no,receivingplant,sku,r3_eventtime from (select serial_no,receivingplant,sku,eventtime as r3_eventtime, row_number() over (partition by serial_no order by eventtime desc) as mpos from receiving_item_delivered_received where eventtype='LineItemdetailsReceived'and replenishmenttype = 'DC2SWARRANTY'and coalesce(serial_no,'') <> '') Rec where mpos = 1; Query Planner: "Subquery Scan on rec (cost=70835.30..82275.49 rows=1760 width=39) (actual time=2322.999..3451.783 rows=333451 loops=1)"" Filter: (rec.mpos = 1)"" Rows Removed by Filter: 19900"" -> WindowAgg (cost=70835.30..77875.42 rows=352006 width=47) (actual time=2322.997..3414.384 rows=353351 loops=1)"" -> Sort (cost=70835.30..71715.31 rows=352006 width=39) (actual time=2322.983..3190.090 rows=353351 loops=1)"" Sort Key: receiving_item_delivered_received.serial_no, receiving_item_delivered_received.eventtime DESC"" Sort Method: external merge Disk: 17424kB"" -> Seq Scan on receiving_item_delivered_received (cost=0.00..28777.82 rows=352006 width=39) (actual time=0.011..184.677 rows=353351 loops=1)"" Filter: (((COALESCE(serial_no, ''::character varying))::text <> ''::text) AND ((eventtype)::text = 'LineItemdetailsReceived'::text) AND ((replenishmenttype)::text = 'DC2SWARRANTY'::text))"" Rows Removed by Filter: 55953""Planning Time: 0.197 ms""Execution Time: 3466.985 ms" Table DDL: CREATE TABLE receiving_item_delivered_received( load_dttm timestamp with time zone, iamuniqueid character varying(200) , batchid character varying(200) , eventid character varying(200) , eventtype character varying(200) , eventversion character varying(200) , eventtime timestamp with time zone, eventproducerid character varying(200) , deliverynumber character varying(200) , activityid character varying(200) , applicationid character varying(200) , channelid character varying(200) , interactionid character varying(200) , sessionid character varying(200) , receivingplant character varying(200) , deliverydate date, shipmentdate date, shippingpoint character varying(200) , replenishmenttype character varying(200) , numberofpackages character varying(200) , carrier_id character varying(200) , carrier_name character varying(200) , billoflading character varying(200) , pro_no character varying(200) , partner_id character varying(200) , deliveryitem character varying(200) , ponumber character varying(200) , poitem character varying(200) , tracking_no character varying(200) , serial_no character varying(200) , sto_no character varying(200) , sim_no character varying(200) , sku character varying(200) , quantity numeric(15,2), uom character varying(200) ); -- Index: receiving_item_delivered_rece_eventtype_replenishmenttype_c_idx -- DROP INDEX receiving_item_delivered_rece_eventtype_replenishmenttype_c_idx; CREATE INDEX receiving_item_delivered_rece_eventtype_replenishmenttype_c_idx ON receiving_item_delivered_received USING btree (eventtype , replenishmenttype , COALESCE(serial_no, ''::character varying) ) ;-- Index: receiving_item_delivered_rece_serial_no_eventtype_replenish_idx -- DROP INDEX receiving_item_delivered_rece_serial_no_eventtype_replenish_idx; CREATE INDEX receiving_item_delivered_rece_serial_no_eventtype_replenish_idx ON receiving_item_delivered_received USING btree (serial_no , eventtype , replenishmenttype ) WHERE eventtype::text = 'LineItemdetailsReceived'::text AND replenishmenttype::text = 'DC2SWARRANTY'::text AND COALESCE(serial_no, ''::character varying)::text <> ''::text;-- Index: receiving_item_delivered_recei_eventtype_replenishmenttype_idx1 -- DROP INDEX receiving_item_delivered_recei_eventtype_replenishmenttype_idx1; CREATE INDEX receiving_item_delivered_recei_eventtype_replenishmenttype_idx1 ON receiving_item_delivered_received USING btree (eventtype , replenishmenttype ) WHERE eventtype::text = 'LineItemdetailsReceived'::text AND replenishmenttype::text = 'DC2SWARRANTY'::text;-- Index: receiving_item_de
Re: Query performance issue
Sorry, I have attached the wrong query planner, which executed in lower environment which has fewer resources: Updated one,eVFiF | explain.depesz.com | | | | eVFiF | explain.depesz.com | | | Thanks,RjOn Friday, September 4, 2020, 02:39:57 PM PDT, Michael Lewis wrote: CREATE INDEX receiving_item_delivered_received ON receiving_item_delivered_received USING btree ( eventtype, replenishmenttype, serial_no, eventtime DESC ); More work_mem as Tomas suggests, but also, the above index should find the candidate rows by the first two keys, and then be able to skip the sort by reading just that portion of the index that matches eventtype='LineItemdetailsReceived'and replenishmenttype = 'DC2SWARRANTY'
Re: Query performance issue
Hi Mechel, I added the index as you suggested and the planner going through the bitmap index scan,heap and the new planner is,HaOx | explain.depesz.com | | | | HaOx | explain.depesz.com | | | Mem config: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit vCPU = 64RAM = 512show shared_buffers = 355 GBshow work_mem = 214 MB show maintenance_work_mem = 8363MBshow effective_cache_size = 355 GB Thanks,Rj On Friday, September 4, 2020, 02:55:50 PM PDT, Michael Lewis wrote: "Subquery Scan on rec (cost=1628601.89..1676580.92 rows=7381 width=41) (actual time=22171.986..23549.079 rows=1236042 loops=1)"" Filter: (rec.mpos = 1)"" Rows Removed by Filter: 228737"" Buffers: shared hit=45 read=1166951"" I/O Timings: read=29.530"" -> WindowAgg (cost=1628601.89..1658127.45 rows=1476278 width=49) (actual time=22171.983..23379.219 rows=1464779 loops=1)"" Buffers: shared hit=45 read=1166951"" I/O Timings: read=29.530"" -> Sort (cost=1628601.89..1632292.58 rows=1476278 width=41) (actual time=22171.963..22484.044 rows=1464779 loops=1)"" Sort Key: receiving_item_delivered_received.serial_no, receiving_item_delivered_received.eventtime DESC"" Sort Method: quicksort Memory: 163589kB"" Buffers: shared hit=45 read=1166951"" I/O Timings: read=29.530"" -> Gather (cost=1000.00..1477331.13 rows=1476278 width=41) (actual time=1.296..10428.060 rows=1464779 loops=1)"" Workers Planned: 2"" Workers Launched: 2"" Buffers: shared hit=39 read=1166951"" I/O Timings: read=29.530"" -> Parallel Seq Scan on receiving_item_delivered_received (cost=0.00..1328703.33 rows=615116 width=41) (actual time=1.262..10150.325 rows=488260 loops=3)"" Filter: (((COALESCE(serial_no, ''::character varying))::text <> ''::text) AND ((eventtype)::text = 'LineItemdetailsReceived'::text) AND ((replenishmenttype)::text = 'DC2SWARRANTY'::text))"" Rows Removed by Filter: 6906258"" Buffers: shared hit=39 read=1166951"" I/O Timings: read=29.530""Planning Time: 0.375 ms""Execution Time: 23617.348 ms" That is doing a lot of reading from disk. What do you have shared_buffers set to? I'd expect better cache hits unless it is quite low or this is a query that differs greatly from the typical work. Also, did you try adding the index I suggested? That lowest node has 488k rows coming out of it after throwing away 6.9 million. I would expect an index on only eventtype, replenishmenttype to be quite helpful. I don't assume you have tons of rows where serial_no is null.
Re: Query performance issue
Hi Michael, I created an index as suggested, it improved. I was tried with partial index but the planner not using it. also, there is no difference even with timing OFF. ktbv : Optimization for: plan #HaOx | explain.depesz.com | | | | ktbv : Optimization for: plan #HaOx | explain.depesz.com | | | Thanks,Rj On Saturday, September 5, 2020, 06:42:31 AM PDT, Michael Lewis wrote: On Fri, Sep 4, 2020, 4:20 PM Nagaraj Raj wrote: Hi Mechel, I added the index as you suggested and the planner going through the bitmap index scan,heap and the new planner is,HaOx | explain.depesz.com | | | | HaOx | explain.depesz.com | | | Mem config: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit vCPU = 64RAM = 512show shared_buffers = 355 GBshow work_mem = 214 MB show maintenance_work_mem = 8363MBshow effective_cache_size = 355 GB I'm not very familiar with Aurora, but I would certainly try the explain analyze with timing OFF and verify that the total time is similar. If the system clock is slow to read, execution plans can be significantly slower just because of the cost to measure each step. That sort being so slow is perplexing. Did you do the two column or four column index I suggested? Obviously it depends on your use case and how much you want to tune this specific query, but you could always try a partial index matching the where condition and just index the other two columns to avoid the sort.
Single column vs composite partial index
Hi,
I'm running one query, and I created two types of index one is composite and
the other one with single column one and query planner showing almost the same
cost for both index bitmap scan, I'm not sure which is appropriate to keep in
production tables.
explain analyze SELECT BAN, SUBSCRIBER_NO, ACTV_CODE, ACTV_RSN_CODE,
EFFECTIVE_DATE, TRX_SEQ_NO, LOAD_DTTM, rnk AS RNK FROM ( SELECT CT.BAN,
CT.SUBSCRIBER_NO, CT.ACTV_CODE, CT.ACTV_RSN_CODE, CT.EFFECTIVE_DATE,
CT.TRX_SEQ_NO, CT.LOAD_DTTM, row_number() over (partition by CT.BAN,
CT.SUBSCRIBER_NO, CT.ACTV_CODE, CT.ACTV_RSN_CODE order by CT.TRX_SEQ_NO DESC,
CT.LOAD_DTTM DESC) rnk FROM SAM_T.L_CSM_TRANSACTIONS CT WHERE CT.ACTV_CODE in (
'NAC', 'CAN', 'RSP', 'RCL') AND LOAD_DTTM::DATE >= CURRENT_DATE - 7 ) S WHERE
RNK = 1 1st Index with single column:
CREATE INDEX l_csm_transactions_load_dttm_idx1 ON sam_t.l_csm_transactions
USING btree (load_dttm ASC NULLS LAST)
/*"Subquery Scan on s (cost=32454.79..33555.15 rows=129 width=61) (actual
time=56.473..56.473 rows=0 loops=1)"" Filter: (s.rnk = 1)"" -> WindowAgg
(cost=32454.79..33231.52 rows=25891 width=61) (actual time=56.472..56.472
rows=0 loops=1)"" -> Sort (cost=32454.79..32519.51 rows=25891
width=53) (actual time=56.470..56.470 rows=0 loops=1)"" Sort Key:
ct.ban, ct.subscriber_no, ct.actv_code, ct.actv_rsn_code, ct.trx_seq_no DESC,
ct.load_dttm DESC"" Sort Method: quicksort Memory: 25kB""
-> Bitmap Heap Scan on l_csm_transactions ct (cost=1271.13..30556.96
rows=25891 width=53) (actual time=56.462..56.462 rows=0 loops=1)""
Recheck Cond: ((actv_code)::text = ANY ('{NAC,CAN,RSP,RCL}'::text[]))""
Filter: ((load_dttm)::date >= (CURRENT_DATE - 7))""
Rows Removed by Filter: 79137"" Heap Blocks:
exact=23976"" -> Bitmap Index Scan on
l_csm_transactions_actv_code_idx1 (cost=0.00..1264.66 rows=77673 width=0)
(actual time=6.002..6.002 rows=79137 loops=1)""Planning Time: 0.270
ms""Execution Time: 56.639 ms"*/
2nd one with composite and partial index:
CREATE INDEX l_csm_transactions_actv_code_load_dttm_idx1 ON
sam_t.l_csm_transactions USING btree (actv_code COLLATE pg_catalog."default"
ASC NULLS LAST, (load_dttm::date) DESC NULLS FIRST) WHERE actv_code::text =
ANY (ARRAY['NAC'::character varying, 'CAN'::character varying, 'RSP'::character
varying, 'RCL'::character varying]::text[]);
/*"Subquery Scan on s (cost=32023.15..33123.52 rows=129 width=61) (actual
time=2.256..2.256 rows=0 loops=1)"" Filter: (s.rnk = 1)"" -> WindowAgg
(cost=32023.15..32799.88 rows=25891 width=61) (actual time=2.255..2.255 rows=0
loops=1)"" -> Sort (cost=32023.15..32087.88 rows=25891 width=53)
(actual time=2.254..2.254 rows=0 loops=1)"" Sort Key: ct.ban,
ct.subscriber_no, ct.actv_code, ct.actv_rsn_code, ct.trx_seq_no DESC,
ct.load_dttm DESC"" Sort Method: quicksort Memory: 25kB""
-> Bitmap Heap Scan on l_csm_transactions ct (cost=1449.32..30125.32
rows=25891 width=53) (actual time=2.247..2.247 rows=0 loops=1)""
Recheck Cond: (((load_dttm)::date >= (CURRENT_DATE - 7)) AND
((actv_code)::text = ANY ('{NAC,CAN,RSP,RCL}'::text[])))""
-> Bitmap Index Scan on l_csm_transactions_actv_code_load_dttm_idx1
(cost=0.00..1442.85 rows=25891 width=0) (actual time=2.244..2.245 rows=0
loops=1)"" Index Cond: ((load_dttm)::date >=
(CURRENT_DATE - 7))""Planning Time: 0.438 ms""Execution Time: 2.303 ms"*/
Please suggest me the best choice.
Appritiated the responce.
Thanks,Rj
Re: Single column vs composite partial index
> You're asking whether to keep one index or the other?
My ask is which index can be used for the mentioned query in production for
better IO
> It depends on *all* the queries you'll run, not just this one.
I'm more concerned about this specific query, this has been using in one block
stored procedure, so it will be run more often on the table.
explain(ANALYZE, BUFFERS) output:
"Subquery Scan on s (cost=32023.15..33123.52 rows=129 width=61) (actual
time=2.615..2.615 rows=0 loops=1)"" Filter: (s.rnk = 1)"" Buffers: shared
hit=218"" -> WindowAgg (cost=32023.15..32799.88 rows=25891 width=61) (actual
time=2.614..2.615 rows=0 loops=1)"" Buffers: shared hit=218"" ->
Sort (cost=32023.15..32087.88 rows=25891 width=53) (actual time=2.613..2.613
rows=0 loops=1)"" Sort Key: ct.ban, ct.subscriber_no,
ct.actv_code, ct.actv_rsn_code, ct.trx_seq_no DESC, ct.load_dttm DESC""
Sort Method: quicksort Memory: 25kB"" Buffers: shared
hit=218"" -> Bitmap Heap Scan on l_csm_transactions ct
(cost=1449.32..30125.32 rows=25891 width=53) (actual time=2.605..2.605 rows=0
loops=1)"" Recheck Cond: (((load_dttm)::date >=
(CURRENT_DATE - 7)) AND ((actv_code)::text = ANY
('{NAC,CAN,RSP,RCL}'::text[])))"" Buffers: shared hit=218""
-> Bitmap Index Scan on
l_csm_transactions_actv_code_load_dttm_idx1 (cost=0.00..1442.85 rows=25891
width=0) (actual time=2.602..2.602 rows=0 loops=1)""
Index Cond: ((load_dttm)::date >= (CURRENT_DATE - 7))""
Buffers: shared hit=218""Planning Time: 0.374 ms""Execution Time: 2.661 ms"
>The actual performance might change based on thing like maintenance like
>reindex, cluster, vacuum, hardware, and DB state (like cached blocks).
Note: Stats are up to date
> And Postgres version.
PostgreSQL 11.7 running on RedHat
Thanks,Rj
On Tuesday, September 15, 2020, 09:18:55 PM PDT, Justin Pryzby
wrote:
On Tue, Sep 15, 2020 at 10:33:24PM +, Nagaraj Raj wrote:
> Hi,
> I'm running one query, and I created two types of index one is composite and
> the other one with single column one and query planner showing almost the
> same cost for both index bitmap scan, I'm not sure which is appropriate to
> keep in production tables.
You're asking whether to keep one index or the other ?
It depends on *all* the queries you'll run, not just this one.
The most general thing to do would be to make multiple, single column indexes,
and let the planner figure out which is best (it might bitmap-AND or -OR them
together).
However, for this query, you can see the 2nd query is actually faster (2ms vs
56ms) - the cost is an estimate based on a model.
The actual performance might change based on thing like maintenance like
reindex, cluster, vacuum, hardware, and DB state (like cached blocks).
And postgres version.
The rowcount estimates are bad. Maybe you need to ANALYZE the table (or adjust
the autoanalyze thresholds), or evaluate if there's a correlation between
columns. Bad rowcount estimates beget bad plans and poor performance.
Also: you could use explain(ANALYZE,BUFFERS).
I think the fast plan would be possible with a tiny BRIN index on load_dttm.
(Possibly combined indexes on actv_code or others).
If you also have a btree index on time, then you can CLUSTER on it (and
analyze) and it might improve that plan further (but would affect other
queries, too).
> explain analyze SELECT BAN, SUBSCRIBER_NO, ACTV_CODE, ACTV_RSN_CODE,
> EFFECTIVE_DATE, TRX_SEQ_NO, LOAD_DTTM, rnk AS RNK FROM ( SELECT CT.BAN,
> CT.SUBSCRIBER_NO, CT.ACTV_CODE, CT.ACTV_RSN_CODE, CT.EFFECTIVE_DATE,
> CT.TRX_SEQ_NO, CT.LOAD_DTTM, row_number() over (partition by CT.BAN,
> CT.SUBSCRIBER_NO, CT.ACTV_CODE, CT.ACTV_RSN_CODE order by CT.TRX_SEQ_NO DESC,
> CT.LOAD_DTTM DESC) rnk FROM SAM_T.L_CSM_TRANSACTIONS CT WHERE CT.ACTV_CODE in
> ( 'NAC', 'CAN', 'RSP', 'RCL') AND LOAD_DTTM::DATE >= CURRENT_DATE - 7 ) S
> WHERE RNK = 1
> 1st Index with single column:
> CREATE INDEX l_csm_transactions_load_dttm_idx1 ON sam_t.l_csm_transactions
> USING btree (load_dttm ASC NULLS LAST)
> /*"Subquery Scan on s (cost=32454.79..33555.15 rows=129 width=61) (actual
>time=56.473..56.473 rows=0 loops=1)
> Filter: (s.rnk = 1)
> -> WindowAgg (cost=32454.79..33231.52 rows=25891 width=61) (actual
>time=56.472..56.472 rows=0 loops=1)
> -> Sort (cost=32454.79..32519.51 rows=25891 width=53) (actual
>time=56.470..56.470 rows=0 loops=1)
> Sort Key: ct.b
Query performance
Hi, I have long running query which running for long time and its planner always performing sequnce scan the table2.My gole is to reduce Read IO on the disk cause, this query runns more oftenly ( using this in funtion for ETL). table1: transfer_order_header(records 2782678)table2: transfer_order_item ( records: 15995697)here is the query: set work_mem = '688552kB';explain (analyze,buffers)select COALESCE(itm.serialnumber,'') AS SERIAL_NO, COALESCE(itm.ITEM_SKU,'') AS SKU, COALESCE(itm.receivingplant,'') AS RECEIVINGPLANT, COALESCE(itm.STO_ID,'') AS STO, supplyingplant, COALESCE(itm.deliveryitem,'') AS DELIVERYITEM, min(eventtime) as eventtime FROM sor_t.transfer_order_header hed,sor_t.transfer_order_item itm where hed.eventid=itm.eventid group by 1,2,3,4,5,6 Query Planner[2]: "Finalize GroupAggregate (cost=1930380.06..4063262.11 rows=16004137 width=172) (actual time=56050.500..83268.566 rows=15891873 loops=1)"" Group Key: (COALESCE(itm.serialnumber, ''::character varying)), (COALESCE(itm.item_sku, ''::character varying)), (COALESCE(itm.receivingplant, ''::character varying)), (COALESCE(itm.sto_id, ''::character varying)), hed.supplyingplant, (COALESCE(itm.deliveryitem, ''::character varying))"" Buffers: shared hit=712191 read=3, temp read=38232 written=38233"" -> Gather Merge (cost=1930380.06..3669827.09 rows=13336780 width=172) (actual time=56050.488..77106.993 rows=15948520 loops=1)"" Workers Planned: 2"" Workers Launched: 2"" Buffers: shared hit=2213081 read=12, temp read=142840 written=142843"" -> Partial GroupAggregate (cost=1929380.04..2129431.74 rows=6668390 width=172) (actual time=50031.458..54888.828 rows=5316173 loops=3)"" Group Key: (COALESCE(itm.serialnumber, ''::character varying)), (COALESCE(itm.item_sku, ''::character varying)), (COALESCE(itm.receivingplant, ''::character varying)), (COALESCE(itm.sto_id, ''::character varying)), hed.supplyingplant, (COALESCE(itm.deliveryitem, ''::character varying))"" Buffers: shared hit=2213081 read=12, temp read=142840 written=142843"" -> Sort (cost=1929380.04..1946051.01 rows=6668390 width=172) (actual time=50031.446..52823.352 rows=5332010 loops=3)"" Sort Key: (COALESCE(itm.serialnumber, ''::character varying)), (COALESCE(itm.item_sku, ''::character varying)), (COALESCE(itm.receivingplant, ''::character varying)), (COALESCE(itm.sto_id, ''::character varying)), hed.supplyingplant, (COALESCE(itm.deliveryitem, ''::character varying))"" Sort Method: external merge Disk: 305856kB"" Worker 0: Sort Method: external merge Disk: 436816kB"" Worker 1: Sort Method: external merge Disk: 400048kB"" Buffers: shared hit=2213081 read=12, temp read=142840 written=142843"" -> Parallel Hash Join (cost=133229.66..603743.97 rows=6668390 width=172) (actual time=762.925..3901.133 rows=5332010 loops=3)"" Hash Cond: ((itm.eventid)::text = (hed.eventid)::text)"" Buffers: shared hit=2213027 read=12"" -> Parallel Seq Scan on transfer_order_item itm (cost=0.00..417722.90 rows=6668390 width=68) (actual time=0.005..524.359 rows=5332010 loops=3)"" Buffers: shared hit=351039"" -> Parallel Hash (cost=118545.68..118545.68 rows=1174718 width=35) (actual time=755.590..755.590 rows=926782 loops=3)"" Buckets: 4194304 Batches: 1 Memory Usage: 243808kB"" Buffers: shared hit=1861964 read=12"" -> Parallel Index Only Scan using transfer_order_header_eventid_supplyingplant_eventtime_idx1 on transfer_order_header hed (cost=0.56..118545.68 rows=1174718 width=35) (actual time=0.128..388.436 rows=926782 loops=3)"" Heap Fetches: 18322"" Buffers: shared hit=1861964 read=12""Planning Time: 1.068 ms""Execution Time: 84274.004 ms" Tables[1] created ddls in dbfiddle. PG Server: PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit.RAM: 456Mem Settings: "maintenance_work_mem" "8563712" "kB" "work_mem" "688552" "kB" "wal_buffers" "2048" "8kB" "shared_buffers" "44388442" "8kB" Any suggestions would greatly appretiated. Thanks,Rj
Temporarily disable not null constraints
Hi, Can we disable not null constraints temporarily in the session-based transaction, like we disable FK constraints? SETsession_replication_role = ‘replica’; alter table table_name disable trigger user;” above two options are working for unique constraints violation exception. Thanks,Rj
Query performance issue
Hi,
I have a query performance issue, it takes a long time, and not even getting
explain analyze the output. this query joining on 3 tables which have around a
- 176223509
b - 286887780
c - 214219514
explainselect Count(a."individual_entity_proxy_id")from "prospect" ainner join
"individual_demographic" bon a."individual_entity_proxy_id" =
b."individual_entity_proxy_id"inner join "household_demographic" c on
a."household_entity_proxy_id" = c."household_entity_proxy_id"where
(((a."last_contacted_anychannel_dttm" is null) or
(a."last_contacted_anychannel_dttm" < TIMESTAMP '2020-11-23 0:00:00.00'))
and (a."shared_paddr_with_customer_ind" = 'N')
and (a."profane_wrd_ind" = 'N') and (a."tmo_ofnsv_name_ind" =
'N') and (a."has_individual_address" = 'Y')
and (a."has_last_name" = 'Y') and (a."has_first_name" = 'Y'))
and ((b."tax_bnkrpt_dcsd_ind" = 'N') and (b."govt_prison_ind" =
'N') and (b."cstmr_prspct_ind" = 'Prospect')) and ((
c."hspnc_lang_prfrnc_cval" in ('B', 'E', 'X') ) or
(c."hspnc_lang_prfrnc_cval" is null));-- Explain output
"Finalize Aggregate (cost=32813309.28..32813309.29 rows=1 width=8)"" ->
Gather (cost=32813308.45..32813309.26 rows=8 width=8)"" Workers
Planned: 8"" -> Partial Aggregate (cost=32812308.45..32812308.46
rows=1 width=8)"" -> Merge Join (cost=23870130.00..32759932.46
rows=20950395 width=8)"" Merge Cond:
(a.individual_entity_proxy_id = b.individual_entity_proxy_id)""
-> Sort (cost=23870127.96..23922503.94 rows=20950395 width=8)""
Sort Key: a.individual_entity_proxy_id""
-> Hash Join (cost=13533600.42..21322510.26 rows=20950395 width=8)""
Hash Cond: (a.household_entity_proxy_id =
c.household_entity_proxy_id)"" -> Parallel Seq
Scan on prospect a (cost=0.00..6863735.60 rows=22171902 width=16)""
Filter: (((last_contacted_anychannel_dttm IS NULL) OR
(last_contacted_anychannel_dttm < '2020-11-23 00:00:00'::timestamp without time
zone)) AND (shared_paddr_with_customer_ind = 'N'::bpchar) AND (profane_wrd_ind
= 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND
(has_individual_address = 'Y'::bpchar) AND (has_last_name = 'Y'::bpchar) AND
(has_first_name = 'Y'::bpchar))"" -> Hash
(cost=10801715.18..10801715.18 rows=166514899 width=8)""
-> Seq Scan on household_demographic c (cost=0.00..10801715.18
rows=166514899 width=8)"" Filter:
(((hspnc_lang_prfrnc_cval)::text = ANY ('{B,E,X}'::text[])) OR
(hspnc_lang_prfrnc_cval IS NULL))"" -> Index Only Scan
using indx_individual_demographic_prxyid_taxind_prspctind_prsnind on
individual_demographic b (cost=0.57..8019347.13 rows=286887776 width=8)""
Index Cond: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND
(cstmr_prspct_ind = 'Prospect'::text) AND (govt_prison_ind = 'N'::bpchar))"
Tables ddl are attached in dbfiddle -- Postgres 11 | db<>fiddle
|
|
| |
Postgres 11 | db<>fiddle
Free online SQL environment for experimenting and sharing.
|
|
|
Server configuration is: Version: 10.11RAM - 320GBvCPU - 32
"maintenance_work_mem" 256MB"work_mem" 1GB"shared_buffers" 64GB
Any suggestions?
Thanks,Rj
tables meta data collection
Hi Everyone,
I was trying to collect table metadata with a description; the use case is that
I need to show all columns of the tables whether it has the description or not.
I tried the below query, but it only gives column details that have a
description and ignore others if not.
Postgres 11 | db<>fiddle
|
|
| |
Postgres 11 | db<>fiddle
Free online SQL environment for experimenting and sharing.
|
|
|
create table test(id int);create table test1(id int Primary key );comment on
column test.id is 'Test descr';
SELECT c.table_schema,c.table_name,c.column_name,case when c.domain_name
is not null then c.domain_name when c.data_type='character varying' THEN
'character varying('||c.character_maximum_length||')' when
c.data_type='character' THEN 'character('||c.character_maximum_length||')'
when c.data_type='numeric' THEN
'numeric('||c.numeric_precision||','||c.numeric_scale||')' else
c.data_typeend as data_type,c.is_nullable, (select 'Y' from
information_schema.table_constraints tcojoin
information_schema.key_column_usage kcu on kcu.constraint_name =
tco.constraint_name and kcu.constraint_schema = tco.constraint_schema
and kcu.constraint_schema = c.table_schema and kcu.table_name = c.table_name
and kcu.column_name = c.column_namewhere tco.constraint_type = 'PRIMARY KEY' )
as is_in_PK,(select distinct 'Y' from information_schema.table_constraints
tcojoin information_schema.key_column_usage kcu on kcu.constraint_name =
tco.constraint_name and kcu.constraint_schema = tco.constraint_schema
and kcu.constraint_schema = c.table_schema and kcu.table_name = c.table_name
and kcu.column_name = c.column_namewhere tco.constraint_type = 'FOREIGN KEY' )
as is_in_FK,pgd.description
FROM pg_catalog.pg_statio_all_tables as st Left outer join
pg_catalog.pg_description pgd on (pgd.objoid=st.relid) left outer join
information_schema.columns c on (pgd.objsubid=c.ordinal_position and
c.table_schema=st.schemaname and c.table_name=st.relname)where
c.table_name='test'order by c.table_schema,c.table_name,c.ordinal_position;
expected formate is :
| table_schema | table_name | column_name | data_type | is_nullable | is_in_pk
| is_in_fk | description |
any suggestions?
Thanks,Rj
Users grants with setting options
Hi, I have a quick question, does role custom parameters settings will be granted to users well? Does user c_role will have the same settings m_role.CREATE ROLE m_role ;CREATE ROLE c_role ;ALTER ROLE m_role SET configuration_parameter TO 'VALUE';GRANT m_role TO c_role;
Re: Users grants with setting options
Thank you for confirmation. On Monday, March 8, 2021, 03:46:28 PM PST, David G. Johnston wrote: On Mon, Mar 8, 2021 at 4:30 PM Nagaraj Raj wrote: I have a quick question, does role custom parameters settings will be granted to users well? Parameters are not inherited - the role credentials that are logging in are the ones that are used to check for defaults. This "no" is not explicitly documented that I can find; though easy enough to test. David J.
INSERTS waiting with wait_event is "transactionid"
Hi, We are trying to load data around 1Bil records into one table with INSERT statements (not able to use COPY command) and they are been waiting for a lock and the wait_event is "transactionid", I didn't find any information in the documents. Queries have been waiting for hours. Table DDL'sCREATE TABLE test_load( billg_acct_cid_hash character varying(50) COLLATE pg_catalog."default" NOT NULL, accs_mthd_cid_hash character varying(50) COLLATE pg_catalog."default" NOT NULL, soc character varying(10) COLLATE pg_catalog."default" NOT NULL, soc_desc character varying(100) COLLATE pg_catalog."default", service_type_cd character varying(10) COLLATE pg_catalog."default", soc_start_dt date, soc_end_dt date, product_eff_dt date, product_exp_dt date, curr_ind character varying(1) COLLATE pg_catalog."default", load_dttm timestamp without time zone NOT NULL, updt_dttm timestamp without time zone, md5_chk_sum character varying(100) COLLATE pg_catalog."default", deld_from_src_ind character(1) COLLATE pg_catalog."default", orphan_ind character(1) COLLATE pg_catalog."default", CONSTRAINT test_load_pk PRIMARY KEY (billg_acct_cid_hash, accs_mthd_cid_hash, soc)); query results from pg_locks ; SELECT COALESCE(blockingl.relation::regclass::text, blockingl.locktype) AS locked_item, now() - blockeda.query_start AS waiting_duration, blockeda.pid AS blocked_pid, left(blockeda.query,7) AS blocked_query, blockedl.mode AS blocked_mode, blockinga.pid AS blocking_pid, left(blockinga.query,7) AS blocking_query, blockingl.mode AS blocking_mode FROM pg_locks blockedl JOIN pg_stat_activity blockeda ON blockedl.pid = blockeda.pid JOIN pg_locks blockingl ON (blockingl.transactionid = blockedl.transactionid OR blockingl.relation = blockedl.relation AND blockingl.locktype = blockedl.locktype) AND blockedl.pid <> blockingl.pid JOIN pg_stat_activity blockinga ON blockingl.pid = blockinga.pid AND blockinga.datid = blockeda.datid WHERE NOT blockedl.granted order by blockeda.query_start "transactionid" "18:20:06.068154" 681216 "INSERT " "ShareLock" 679840 "INSERT " "ExclusiveLock""transactionid" "18:19:05.504781" 679688 "INSERT " "ShareLock" 679856 "INSERT " "ExclusiveLock""transactionid" "18:18:17.30099" 679572 "INSERT " "ShareLock" 679612 "INSERT " "ShareLock""transactionid" "18:18:17.30099" 679572 "INSERT " "ShareLock" 679580 "INSERT " "ShareLock""transactionid" "18:18:17.30099" 679572 "INSERT " "ShareLock" 681108 "INSERT " "ExclusiveLock""transactionid" "18:14:17.969603" 681080 "INSERT " "ShareLock" 681204 "INSERT " "ExclusiveLock""transactionid" "18:13:41.531575" 681112 "INSERT " "ShareLock" 679636 "INSERT " "ExclusiveLock""transactionid" "18:04:16.195069" 679556 "INSERT " "ShareLock" 679776 "INSERT " "ExclusiveLock""transactionid" "17:58:54.284211" 679696 "INSERT " "ShareLock" 678940 "INSERT " "ExclusiveLock""transactionid" "17:57:54.220879" 681144 "INSERT " "ShareLock" 679792 "INSERT " "ExclusiveLock""transactionid" "17:57:28.736147" 679932 "INSERT " "ShareLock" 679696 "INSERT " "ExclusiveLock""transactionid" "17:53:48.701858" 679580 "INSERT " "ShareLock" 679572 "INSERT " "ShareLock" query results from pg_stat_activity ; SELECT pg_stat_activity.pid, pg_stat_activity.usename, pg_stat_activity.state, now() - pg_stat_activity.query_start AS runing_time, LEFT(pg_stat_activity.query,7) , pg_stat_activity.wait_event FROM pg_stat_activity ORDER BY (now() - pg_stat_activity.query_start) DESC; | | | | | | | | 681216 | postgres | active | 07:32.7 | INSERT | transactionid | | 679688 | postgres | active | 06:32.2 | INSERT | transactionid | | 679572 | postgres | active | 05:44.0 | INSERT | transactionid | | 681080 | postgres | active | 01:44.6 | INSERT | transactionid | | 681112 | postgres | active | 01:08.2 | INSERT | transactionid | | 679556 | postgres | active | 51:42.9 | INSERT | transactionid | | 679696 | postgres | active | 46:20.9 | INSERT | transactionid | | 681144 | postgres | active | 45:20.9 | INSERT | transactionid | | 679932 | postgres | active | 44:55.4 | INSERT | transactionid | | 679580 | postgres | active | 41:15.4 | INSERT | transactionid | | 679400 | postgres | active | 39:51.2 | INSERT | transactionid | | 679852 | postgres | active | 37:05.3 | INSERT | transactionid | | 681188 | postgres | active | 36:23.2 | INSERT | transactionid | | 679544 | postgres | active | 35:33.4 | INSERT | transactionid | | 675460 | postgres | active | 26:06.8 | INSERT | transactionid | select version (); PostgreSQL 11.6, compiled by Visual C++ build 1800, 64-bit CPU: v32RAM: 320 GBshared_buffers = 64GB effective_cache_size = 160 GB any comments on the issue? Thanks,Rj
Re: INSERTS waiting with wait_event is "transactionid"
Hi Laurenz, Thanks for the response. Yeah understand that, but I'm trying to figure out why it is taking too long. there is foreign key relation to this table. Thanks,RjOn Friday, April 9, 2021, 02:16:08 AM PDT, Laurenz Albe wrote: On Thu, 2021-04-08 at 20:14 +, Nagaraj Raj wrote: > We are trying to load data around 1Bil records into one table with INSERT > statements > (not able to use COPY command) and they are been waiting for a lock and the >wait_event > is "transactionid", I didn't find any information in the documents. Queries >have been > waiting for hours. That means that your statement is stuck behind a row lock. Row locks are stored on the table row itself and contain the transaction ID. So the process has to wait until the transaction goes away, which is implemented as waiting for a lock on the transaction ID. There must be a long running transaction that locks a row that is needed for the INSERT. It could be a row in a different table that is referenced by a foreign key. Make that long running transaction go away. Transactions should never last that long. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Partition with check constraint with "like"
Hi,
I am trying to create partitions on the table based on first letter of the
column record value using inherit relation & check constraint.
Somehow able to create and load the data into the tables as per my requirement.
But the problem is when querying the data on that partitioned column, it's
referring to all the children's tables instead of the matching table.
create table t1(id int,name text);
CREATE TABLE partition_tab.t1_name_null( CONSTRAINT null_check CHECK (name IS
NULL)) INHERITS (t1); CREATE or replace FUNCTION
partition_tab.func_t1_insert_trigger() RETURNS trigger LANGUAGE 'plpgsql' COST
100 VOLATILE NOT LEAKPROOFAS $BODY$DECLARE chk_cond text; c_table TEXT;
c_table1 text; new_name text; m_table1 text; BEGIN if ( NEW.name is null)
THEN INSERT into partition_tab.t1_name_null VALUES (NEW.*); elseif ( NEW.name
is not null) THEN new_name:= substr(NEW.name,1,1); raise info 'new_name
%',new_name; c_table := TG_TABLE_NAME || '_' || new_name; c_table1 :=
'partition_tab.' || c_table; m_table1 := ''||TG_TABLE_NAME; IF NOT
EXISTS(SELECT relname FROM pg_class WHERE relname=lower(c_table)) THEN RAISE
NOTICE 'values out of range partition, creating partition table:
partition_tab.%',c_table;
chk_cond := new_name||'%'; raise info 'chk_cond %',chk_cond;
EXECUTE 'CREATE TABLE partition_tab.' || c_table || '(check ( name like '''||
chk_cond||''')) INHERITS (' ||TG_TABLE_NAME|| ');';
END IF; EXECUTE 'INSERT INTO ' || c_table1 || ' SELECT(' || m_table1 || ' '
|| quote_literal(NEW) || ').* RETURNING id;'; END IF; RETURN NULL; END;$BODY$;
CREATE TRIGGER t1_trigger BEFORE INSERT OR UPDATE ON t1 FOR EACH ROW EXECUTE
PROCEDURE partition_tab.func_t1_insert_trigger()
examples: Postgres 11 | db<>fiddle
|
|
| |
Postgres 11 | db<>fiddle
Free online SQL environment for experimenting and sharing.
|
|
|
Any suggestions.
Thanks,Rj
Re: Partition with check constraint with "like"
Thank you. This is a great help.
But "a" have some records with alpha and numeric.
example :
insert into mytable values('alpha'),('bravo');
insert into mytable values('1lpha'),('2ravo');
On Thursday, May 20, 2021, 06:23:14 PM PDT, David Rowley
wrote:
On Fri, 21 May 2021 at 12:32, Nagaraj Raj wrote:
> I am trying to create partitions on the table based on first letter of the
> column record value using inherit relation & check constraint.
You'll get much better performance out of native partitioning than you
will with the old inheritance method of doing it.
> EXECUTE 'CREATE TABLE partition_tab.' || c_table || '(check ( name like
>'''|| chk_cond||''')) INHERITS (' ||TG_TABLE_NAME|| ');';
This is a bad idea. There's a lock upgrade hazard here that could end
up causing deadlocks on INSERT. You should just create all the tables
you need beforehand.
I'd recommend you do this using RANGE partitioning. For example:
create table mytable (a text not null) partition by range (a);
create table mytable_a partition of mytable for values from ('a') to
('b'); -- note the upper bound of the range is non-inclusive.
create table mytable_b partition of mytable for values from ('b') to ('c');
insert into mytable values('alpha'),('bravo');
explain select * from mytable where a = 'alpha';
QUERY PLAN
---
Seq Scan on mytable_a mytable (cost=0.00..27.00 rows=7 width=32)
Filter: (a = 'alpha'::text)
(2 rows)
The mytable_b is not scanned.
David
Re: Partition with check constraint with "like"
So what about 'Z' or 'z' and 9?
I created the partitions tables FROM (A) to (B) ;FROM (B) to (C) ;
.. FROM (Y) to (Z) ;
then what would be the range of ZFROM (Z) to (?) ;
same way for 9 On Thursday, May 20, 2021, 07:38:50 PM PDT, Justin Pryzby
wrote:
On Fri, May 21, 2021 at 02:36:14AM +, Nagaraj Raj wrote:
> Thank you. This is a great help.
> But "a" have some records with alpha and numeric.
So then you should make one or more partitions FROM ('1')TO('9').
> example :
> insert into mytable values('alpha'),('bravo');
> insert into mytable values('1lpha'),('2ravo');
>
>
> On Thursday, May 20, 2021, 06:23:14 PM PDT, David Rowley
> wrote:
>
> On Fri, 21 May 2021 at 12:32, Nagaraj Raj wrote:
> > I am trying to create partitions on the table based on first letter of the
> > column record value using inherit relation & check constraint.
>
> You'll get much better performance out of native partitioning than you
> will with the old inheritance method of doing it.
>
> > EXECUTE 'CREATE TABLE partition_tab.' || c_table || '(check ( name like
> >'''|| chk_cond||''')) INHERITS (' ||TG_TABLE_NAME|| ');';
>
> This is a bad idea. There's a lock upgrade hazard here that could end
> up causing deadlocks on INSERT. You should just create all the tables
> you need beforehand.
>
> I'd recommend you do this using RANGE partitioning. For example:
>
> create table mytable (a text not null) partition by range (a);
> create table mytable_a partition of mytable for values from ('a') to
> ('b'); -- note the upper bound of the range is non-inclusive.
> create table mytable_b partition of mytable for values from ('b') to ('c');
> insert into mytable values('alpha'),('bravo');
>
> explain select * from mytable where a = 'alpha';
> QUERY PLAN
> ---
> Seq Scan on mytable_a mytable (cost=0.00..27.00 rows=7 width=32)
> Filter: (a = 'alpha'::text)
> (2 rows)
>
> The mytable_b is not scanned.
Re: Partition with check constraint with "like"
Hi David,
Hi,
I am trying to create partitions on the table which have around 2BIL records
and users will always look for the "name", its not possible to create a
partition with a list, so we are trying to create a partition-based first
letter of the name column. name column has a combination of alpha numeric
values.
> postgres=# select chr(ascii('z')+1) ;
> chr
> -
> {
> (1 row)
I tried as below, I'm able to create a partition table for 'Z', but it's not
identifying partition table.
postgres=# select chr(ascii('Z')+1) ;
chr
-
[
(1 row)
create table mytable_z of mytable for values from ('Z') to ('Z[');CREATE
TABLE
insert into mytable values(4,'ZAR83NB');
ERROR: no partition of relation "mytable" found for rowDETAIL: Partition key of
the failing row contains (name) = (ZAR83NB).SQL state: 23514
On Friday, May 21, 2021, 01:24:13 AM PDT, David Rowley
wrote:
On Fri, 21 May 2021 at 19:02, Nagaraj Raj wrote:
> then what would be the range of Z
> FROM (Z) to (?) ;
postgres=# select chr(ascii('z')+1) ;
chr
-
{
(1 row)
> same way for 9
postgres=# select chr(ascii('9')+1) ;
chr
-
:
(1 row)
https://en.wikipedia.org/wiki/ASCII
You can also use MINVALUE and MAXVALUE to mean unbounded at either end
of the range.
But is there a particular need that you want to partition this way? It
seems like it might be a bit painful to maintain, especially if you're
not limiting yourself to ASCII or ANSI characters.
You might want to consider HASH partitioning if you're just looking
for a way to keep your tables and indexes to a more manageable size.
You've not really mentioned your use case here, so it's hard to give
any advice.
There are more details about partitioning in
https://www.postgresql.org/docs/current/ddl-partitioning.html
David
Re: Partition with check constraint with "like"
> select * from t1 where name = somename ? == equality match // if yes, hash
> partitioning may be helpful to a have reasonably balanced distribution
yes, its an equality check,
On Friday, May 21, 2021, 12:08:25 PM PDT, Vijaykumar Jain
wrote:
just out of curiosity,what would a typical query be ?
select * from t1 where name = somename ? == equality match // if yes, hash
partitioning may be helpful to a have reasonably balanced distributionorselect
* from t1 where name like 'some%'; what would be the distribution of
rows for such queries. i mean it can return 1 row or all rows or anything in
between.
that may result in unbalanced partitioning.
then why partition at all ? 2B rows,
if i go with 100KB size per row. that would be around 200GB.
also, queries may benefit from trigram matching.Index Columns for `LIKE` in
PostgreSQL | Niall Burkley's Developer Blog
On Fri, 21 May 2021 at 22:08, Nagaraj Raj wrote:
Hi David,
Hi,
I am trying to create partitions on the table which have around 2BIL records
and users will always look for the "name", its not possible to create a
partition with a list, so we are trying to create a partition-based first
letter of the name column. name column has a combination of alpha numeric
values.
> postgres=# select chr(ascii('z')+1) ;
> chr
> -
> {
> (1 row)
I tried as below, I'm able to create a partition table for 'Z', but it's not
identifying partition table.
postgres=# select chr(ascii('Z')+1) ;
chr
-
[
(1 row)
create table mytable_z of mytable for values from ('Z') to ('Z[');CREATE
TABLE
insert into mytable values(4,'ZAR83NB');
ERROR: no partition of relation "mytable" found for rowDETAIL: Partition key of
the failing row contains (name) = (ZAR83NB).SQL state: 23514
On Friday, May 21, 2021, 01:24:13 AM PDT, David Rowley
wrote:
On Fri, 21 May 2021 at 19:02, Nagaraj Raj wrote:
> then what would be the range of Z
> FROM (Z) to (?) ;
postgres=# select chr(ascii('z')+1) ;
chr
-
{
(1 row)
> same way for 9
postgres=# select chr(ascii('9')+1) ;
chr
-
:
(1 row)
https://en.wikipedia.org/wiki/ASCII
You can also use MINVALUE and MAXVALUE to mean unbounded at either end
of the range.
But is there a particular need that you want to partition this way? It
seems like it might be a bit painful to maintain, especially if you're
not limiting yourself to ASCII or ANSI characters.
You might want to consider HASH partitioning if you're just looking
for a way to keep your tables and indexes to a more manageable size.
You've not really mentioned your use case here, so it's hard to give
any advice.
There are more details about partitioning in
https://www.postgresql.org/docs/current/ddl-partitioning.html
David
--
Thanks,VijayMumbai, India
Re: Partition with check constraint with "like"
Hi,
This is also not working,
create table mytable_z partition of mytable for values from ('Z') to
('[')partition by range(id);
ERROR: empty range bound specified for partition "mytable_z"DETAIL: Specified
lower bound ('Z') is greater than or equal to upper bound ('[').SQL state: 42P17
DB running on version PostgreSQL 11.6, compiled by Visual C++ build 1800, 64-bit
On Friday, May 21, 2021, 02:00:38 PM PDT, Michel SALAIS
wrote:
#yiv0089608923 #yiv0089608923 -- _filtered {} _filtered {} _filtered {}
_filtered {} _filtered {}#yiv0089608923 #yiv0089608923
p.yiv0089608923MsoNormal, #yiv0089608923 li.yiv0089608923MsoNormal,
#yiv0089608923 div.yiv0089608923MsoNormal
{margin:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv0089608923 a:link,
#yiv0089608923 span.yiv0089608923MsoHyperlink
{color:blue;text-decoration:underline;}#yiv0089608923
span.yiv0089608923EmailStyle20
{font-family:sans-serif;color:windowtext;}#yiv0089608923
.yiv0089608923MsoChpDefault {font-size:10.0pt;} _filtered {}#yiv0089608923
div.yiv0089608923WordSection1 {}#yiv0089608923
Hi
I don’t discuss here the choice itself but this is not correct:
create table mytable_z of mytable for values from ('Z') to ('Z[');
It should be
create table mytable_z of mytable for values from ('Z') to ('[')
Michel SALAIS
De : Nagaraj Raj
Envoyé : vendredi 21 mai 2021 18:39
À : David Rowley
Cc : Justin Pryzby ; Pgsql-performance
Objet : Re: Partition with check constraint with "like"
Hi David,
Hi,
I am trying to create partitions on the table which have around 2BIL records
and users will always look for the "name", its not possible to create a
partition with a list, so we are trying to create a partition-based first
letter of the name column. name column has a combination of alpha numeric
values.
> postgres=# select chr(ascii('z')+1) ;
> chr
> -
> {
> (1 row)
I tried as below, I'm able to create a partition table for 'Z', but it's not
identifying partition table.
postgres=# select chr(ascii('Z')+1) ;
chr
-
[
(1 row)
create table mytable_z of mytable for values from ('Z') to ('Z[');
CREATE TABLE
insert into mytable values(4,'ZAR83NB');
ERROR: no partition of relation "mytable" found for row DETAIL: Partition key
of the failing row contains (name) = (ZAR83NB). SQL state: 23514
On Friday, May 21, 2021, 01:24:13 AM PDT, David Rowley
wrote:
On Fri, 21 May 2021 at 19:02, Nagaraj Raj wrote:
> then what would be the range of Z
> FROM (Z) to (?) ;
postgres=# select chr(ascii('z')+1) ;
chr
-
{
(1 row)
> same way for 9
postgres=# select chr(ascii('9')+1) ;
chr
-
:
(1 row)
https://en.wikipedia.org/wiki/ASCII
You can also use MINVALUE and MAXVALUE to mean unbounded at either end
of the range.
But is there a particular need that you want to partition this way? It
seems like it might be a bit painful to maintain, especially if you're
not limiting yourself to ASCII or ANSI characters.
You might want to consider HASH partitioning if you're just looking
for a way to keep your tables and indexes to a more manageable size.
You've not really mentioned your use case here, so it's hard to give
any advice.
There are more details about partitioning in
https://www.postgresql.org/docs/current/ddl-partitioning.html
David
Re: Partition with check constraint with "like"
sorry, forgot to attach the test cases.Postgres 13 | db<>fiddle
|
|
| |
Postgres 13 | db<>fiddle
Free online SQL environment for experimenting and sharing.
|
|
|
On Friday, May 21, 2021, 03:59:18 PM PDT, Nagaraj Raj
wrote:
Hi,
This is also not working,
create table mytable_z partition of mytable for values from ('Z') to
('[')partition by range(id);
ERROR: empty range bound specified for partition "mytable_z"DETAIL: Specified
lower bound ('Z') is greater than or equal to upper bound ('[').SQL state: 42P17
DB running on version PostgreSQL 11.6, compiled by Visual C++ build 1800, 64-bit
On Friday, May 21, 2021, 02:00:38 PM PDT, Michel SALAIS
wrote:
#yiv139250 -- filtered {}#yiv139250 filtered {}#yiv139250 filtered
{}#yiv139250 filtered {}#yiv139250 filtered {}#yiv139250
p.yiv139250MsoNormal, #yiv139250 li.yiv139250MsoNormal,
#yiv139250 div.yiv139250MsoNormal
{margin:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv139250 a:link,
#yiv139250 span.yiv139250MsoHyperlink
{color:blue;text-decoration:underline;}#yiv139250
span.yiv139250EmailStyle20
{font-family:sans-serif;color:windowtext;}#yiv139250
.yiv139250MsoChpDefault {font-size:10.0pt;}#yiv139250 filtered
{}#yiv139250 div.yiv139250WordSection1 {}#yiv139250
Hi
I don’t discuss here the choice itself but this is not correct:
create table mytable_z of mytable for values from ('Z') to ('Z[');
It should be
create table mytable_z of mytable for values from ('Z') to ('[')
Michel SALAIS
De : Nagaraj Raj
Envoyé : vendredi 21 mai 2021 18:39
À : David Rowley
Cc : Justin Pryzby ; Pgsql-performance
Objet : Re: Partition with check constraint with "like"
Hi David,
Hi,
I am trying to create partitions on the table which have around 2BIL records
and users will always look for the "name", its not possible to create a
partition with a list, so we are trying to create a partition-based first
letter of the name column. name column has a combination of alpha numeric
values.
> postgres=# select chr(ascii('z')+1) ;
> chr
> -
> {
> (1 row)
I tried as below, I'm able to create a partition table for 'Z', but it's not
identifying partition table.
postgres=# select chr(ascii('Z')+1) ;
chr
-
[
(1 row)
create table mytable_z of mytable for values from ('Z') to ('Z[');
CREATE TABLE
insert into mytable values(4,'ZAR83NB');
ERROR: no partition of relation "mytable" found for row DETAIL: Partition key
of the failing row contains (name) = (ZAR83NB). SQL state: 23514
On Friday, May 21, 2021, 01:24:13 AM PDT, David Rowley
wrote:
On Fri, 21 May 2021 at 19:02, Nagaraj Raj wrote:
> then what would be the range of Z
> FROM (Z) to (?) ;
postgres=# select chr(ascii('z')+1) ;
chr
-
{
(1 row)
> same way for 9
postgres=# select chr(ascii('9')+1) ;
chr
-
:
(1 row)
https://en.wikipedia.org/wiki/ASCII
You can also use MINVALUE and MAXVALUE to mean unbounded at either end
of the range.
But is there a particular need that you want to partition this way? It
seems like it might be a bit painful to maintain, especially if you're
not limiting yourself to ASCII or ANSI characters.
You might want to consider HASH partitioning if you're just looking
for a way to keep your tables and indexes to a more manageable size.
You've not really mentioned your use case here, so it's hard to give
any advice.
There are more details about partitioning in
https://www.postgresql.org/docs/current/ddl-partitioning.html
David
issue partition scan
I have a table 'sub_soc' with 3BIL records, it's been partitioned and indexed on the soc column. when the user is running a query with left join on this table and joining some other tables, the query planner doing a full table scan instead of looking into partitioned tables and index scan. SELECT t2.cid_hash AS BILLG_ACCT_CID_HASH , t2.proxy_id AS INDVDL_ENTITY_PROXY_ID , t2.accs_mthd AS ACCS_MTHD_CID_HASH FROM public.sub t2 Inner join acc t3 on t3.cid_hash = t2.cid_hash Left join sub_soc t4 on (t2.accs_mthd = t4.accs_mthd AND t2.cid_hash = t4.cid_hash) WHERE ( ( (t3.acct = 'I' AND t3.acct_sub IN ( '4', '5' ) ) OR t2.ban IN ( '00','01','02','03','04','05' ) ) OR (t4.soc = 'NFWJYW0' AND t4.curr_ind = 'Y') ); If I use AND instead of OR, it's doing partition & index scan; otherwise, it's a full scan. Can you please provide suggestions? For DDL structure Postgres 11 | db<>fiddle | | | | Postgres 11 | db<>fiddle Free online SQL environment for experimenting and sharing. | | | Thanks,Raj
Re: issue partition scan
Apologies, I didn't understand you completely. > 1. Those that have sub_soc.soc = 'NFWJYW0' and sub_soc.curr_ind = 'Y' > It can use constraint exclusion on these to only scan applicable partitions. > 2. Those that have (acc.acct = 'I' AND acc.acct_sub IN ( '4', '5' ) ) OR > sub.ban IN ( '00','01','02','03','04','05' ) > It can't use constraint exclusion on these since results can come from any > partition. Why is it not using constraint exclusion on the above two conditions(1 and 2) included in the where clause ? Both sets are pointing to different tables. On Tuesday, May 25, 2021, 04:01:53 PM PDT, Christophe Pettus wrote: > On May 25, 2021, at 15:50, Nagaraj Raj wrote: > > SELECT > t2.cid_hash AS BILLG_ACCT_CID_HASH , > t2.proxy_id AS INDVDL_ENTITY_PROXY_ID , > t2.accs_mthd AS ACCS_MTHD_CID_HASH > FROM > public.sub t2 > Inner join acc t3 on t3.cid_hash = t2.cid_hash > Left join sub_soc t4 on (t2.accs_mthd = t4.accs_mthd > AND t2.cid_hash = t4.cid_hash) > WHERE > ( ( (t3.acct = 'I' AND t3.acct_sub IN ( '4', > '5' ) ) OR t2.ban IN ( '00','01','02','03','04','05' ) ) > OR (t4.soc = 'NFWJYW0' AND t4.curr_ind = 'Y') ); As written, with the OR, it cannot exclude any partitions from the query. The records returned will be from two merged sets: 1. Those that have sub_soc.soc = 'NFWJYW0' and sub_soc.curr_ind = 'Y' It can use constraint exclusion on these to only scan applicable partitions. 2. Those that have (acc.acct = 'I' AND acc.acct_sub IN ( '4', '5' ) ) OR sub.ban IN ( '00','01','02','03','04','05' ) It can't use constraint exclusion on these, since results can come from any partition.
Partition column should be part of PK
we have some partitioned tables with inherence and planning to migrate them to the declaration. Table DDL: CREATE TABLE c_account_p ( billing_account_guid character varying(40) NOT NULL, ingestion_process_id bigint NOT NULL DEFAULT '-1'::integer, load_dttm timestamp(6) without time zone NOT NULL, ban integer NOT NULL, CONSTRAINT billing_account_pkey PRIMARY KEY (billing_account_guid, ban) ) PARTITION by RANGE(load_dttm); When I try the create table, it's throwing below error: ERROR: insufficient columns in the PRIMARY KEY constraint definition DETAIL: PRIMARY KEY constraint on table "l_billing_account_p" lacks column "load_dttm" which is part of the partition key. SQL state: 0A000 Is it mandatory/necessary that the partition column should be a primary key? cause if I include load_dttm as PK then it's working fine.db<>fiddle If the partition column should be supposed to be a PK, it's challenging to create a partition by range with the date column, cause the load_dttm column chances to have duplicate if data loaded COPY or INSERT.INSERT INTO c_account_p SELECT * from c_account_p_bkp ON CONFLICT (billing_account_guid,ban,load_dttm) DO UPDATE SET 'some stuff..' If I receive billing_account_guid, ban combination with different load_dttm then it will end up with duplicate keys. Could some please help me to understand this scenario? Thanks.
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
Hello everyone,
I have a scenario where wanted to add PK on partition to make sure to monitor
unique values for two columns values. but as PG required to partition column
should be part of PK. How can we make sure actual two columns need to be unique
values.
and also while insert into table need be use 'on conflict'.
create table t (id int, pid int , name name , dt date) partition by
range(dt);--create unique index on t(id,pid);--alter table t add constraint uk
unique (id);--create unique index on t(id,pid);alter table t add constraint
uk unique (id,pid,dt);
create table t1 partition of t for values from ('2020-01-01') to
('2020-02-01');alter table t1 add constraint uk1 unique (id,pid);create table
t2 partition of t for values from ('2020-02-01') to ('2020-03-01');alter table
t2 add constraint uk2 unique (id,pid);create table t4 partition of t for
values from ('2020-03-01') to ('2020-04-01');alter table t4 add constraint uk3
unique (id,pid);create table t3 partition of t for values from ('2020-04-01')
to ('2020-05-01');alter table t3 add constraint uk4 unique (id,pid);
insert into t(id,pid,name,dt) values (1,2,'raj','2020-01-01')on conflict
(id,pid) do nothing;
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
specification
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=36b3eb0d51f8bff4b5d445a77d688d88
Thanks,Rj
Re: Partition column should be part of PK
My apologies for making confusion with new thread. Yes its same issue related to earlier post. I was trying to figure out how to ensure unique values for columns (billing_account_guid, ban). If i add partition key to constraint , it wont be possible what im looking for. My use case as below INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO UPDATE SET something… Or INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or uk)(billing_account_guid,ban) DO UPDATE SET something… Thanks Sent from Yahoo Mail for iPhone On Thursday, July 8, 2021, 7:12 PM, Mike Sofen wrote: I believe this thread qualifies for the funniest thread of 2021 (so far). And yes, this is a top post. :-) Mike Sofen -Original Message- From: Alvaro Herrera Sent: Thursday, July 08, 2021 3:29 PM To: Justin Pryzby Cc: Nagaraj Raj ; [email protected] Subject: Re: Partition column should be part of PK On 2021-Jul-08, Justin Pryzby wrote: > If I'm not wrong, this is the same thing you asked 2 week ago. > > If so, why not continue the conversation on the same thread, and why > not reference the old thread ? > > I went to the effort to find the old conversation. > https://www.postgresql.org/message-id/20210625042228.GJ29179@telsasoft > .com Actually, it looks like you're replying to the same email you replied to two weeks ago. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: Partition column should be part of PK
personally, I feel this design is very bad compared to other DB servers. > If the goal is to make sure there is only one (billing_account_uid, ban) in > any partition regardless of date, you'll need to do something more > > sophisticated to make sure that two sessions don't insert an > (billing_account_uid, ban) value into two different partitions. This isn't a > great fit > for table partitioning, and you might want to reconsider if > partitioning the table is the right answer here. If you *must* have table > partitioning, a > possible algorithm is: yes, this is my use case. can I use some trigger on the partition table before inserting the call that function this one handle conflict? CREATE or replace FUNCTION insert_trigger() RETURNS trigger LANGUAGE 'plpgsql' COST 100 VOLATILE NOT LEAKPROOFAS $BODY$DECLARE conn_name text; c_table TEXT; t_schema text; c_table1 text; m_table1 text; BEGIN c_table1 := TG_TABLE_NAME; t_schema := TG_TABLE_SCHEMA; m_table1 := t_schema||'.'||TG_TABLE_NAME; SELECT conname FROM pg_constraint WHERE conrelid = TG_TABLE_NAME ::regclass::oid and contype = 'u' into conn_name; execute 'insert into '|| m_table1 || ' values ' || new.* || ' on conflict on constraint ' || conn_name || ' do nothing -- or somthing'; RETURN null; end; $BODY$; CREATE TRIGGER insert BEFORE INSERT ON t4 FOR EACH ROW WHEN (pg_trigger_depth() < 1) EXECUTE FUNCTION insert_trigger(); CREATE TRIGGER insert BEFORE INSERT ON t3 FOR EACH ROW WHEN (pg_trigger_depth() < 1) EXECUTE FUNCTION insert_trigger(); .. so on .. https://dbfiddle.uk/?rdbms=postgres_11&fiddle=bcfdfc26685ffb498bf82e6d50da95e3 Please suggest. Thanks,Rj On Thursday, July 8, 2021, 08:52:35 PM PDT, Christophe Pettus wrote: > On Jul 8, 2021, at 20:32, Nagaraj Raj wrote: > > My apologies for making confusion with new thread. Yes its same issue related > to earlier post. > > I was trying to figure out how to ensure unique values for columns > (billing_account_guid, ban). If i add partition key to constraint , it wont > be possible what im looking for. > > My use case as below > > INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO > UPDATE SET something… > > Or > > INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or > uk)(billing_account_guid,ban) DO UPDATE SET something… Right now, PostgreSQL does not support unique indexes on partitioned tables (that operate across all partitions) unless the partition key is included in the index definition. If it didn't have that requirement, it would have to independently (and in a concurrency-supported way) scan every partition individually to see if there is a duplicate key violation in any of the partitions, and the machinery to do that does not exist right now. If the goal is to make sure there is only one (billing_account_guid, ban, date) combination across the entire partition set, you can create an index unique index on the partitioned set as (billing_account_guid, ban, date), and INSERT ... ON CONFLICT DO NOTHING works properly then. If the goal is to make sure there is only one (billing_account_uid, ban) in any partition regardless of date, you'll need to do something more sophisticated to make sure that two sessions don't insert an (billing_account_uid, ban) value into two different partitions. This isn't a great fit for table partitioning, and you might want to reconsider if partitioning the table is the right answer here. If you *must* have table partitioning, a possible algorithm is: -- Start a transaction -- Hash the (billing_account_uid, ban) key into a 64 bit value. -- Use that 64 bit value as a key to a call to pg_advisory_xact_lock() [1] to, in essence, create a signal to any other transaction attempting to insert that pair that it is being modified. -- SELECT on that pair to make sure one does not already exist. -- If one does not, do the INSERT. -- Commit, which releases the advisory lock. This doesn't provide quite the same level of uniqueness that a cross-partition index would, but if this is the only code path that does the INSERT, it should keep duplicate from showing up in different partitions. [1] https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
pg_restore schema dump to schema with different name
Hi, I know I can alter schema name after restoring but the problem is the name already exist and I don't want to touch that existing schema.The dump type is "custom". So effectively I want something like.pg_dump -U postgres --schema "source_schema" --format "c" --create --file "source_schema.bak" my_dbpg_restore -U postgres --exit-on-error --dbname "my_db" --destination-schema "destination_schema" Currently this is not something can do. this functionality is there in oracle. Is this future considering to add? (it would really help for create any test schemas without disturbing current schema. ) Thanks,Rj
Re: pg_restore schema dump to schema with different name
Wouldn’t be easy if we have option to_schema ? Absolutely, I should not alter current schema, as it live 24/7. Thanks,RjOn Monday, August 23, 2021, 06:39:03 AM PDT, Jean-Christophe Boggio wrote: > The only way to do that is to create a new database, import the data > there, rename the schema and dump again. > > Then import that dump into the target database. Or maybe (if you can afford to have source_schema unavailable for some time) : * rename source_schema to tmp_source * import (that will re-create source_schema) * rename source_schema to destination_schema * rename back tmp_source to source_schema
Re: pg_restore schema dump to schema with different name
I agree with that.But, probably its good idea to add this feature as many people are migrating from oracle to postgres. clone/restore schemas to existing cluster for any test cases like sandbox schema, temp schema as live backup schema etc. Thanks,Rj On Tuesday, August 24, 2021, 07:56:20 AM PDT, David G. Johnston wrote: On Mon, Aug 23, 2021 at 2:46 AM Nagaraj Raj wrote: Currently this is not something can do. this functionality is there in oracle. Is this future considering to add? (it would really help for create any test schemas without disturbing current schema. ) I find this to be not all that useful. Current practice is to avoid relying on search_path and, in general, to schema-qualify object references (yes, attaching a local SET search_path to a function works, not sure how it would play out in this context). Performing a dependency and contextual rename of one schema name to another is challenging given all of that, and impossible if the schema name is hard-coded into a function body. I won't say we wouldn't accept such a patch, but as this isn't exactly a new problem or realization, and the feature doesn't presently exist, that for whatever reasons individuals may have no one has chosen to volunteer or fund such development. I don't even remember seeing a proposal in the past 5 or so years. David J.
