vacuum after truncate
Hi, I have a big function that includes many truncates on different tables. In the documentation is is written that truncates creates a new file and resign the old filenode to the new filenode and the old file (old data of the table) is deleted in commit. In order to execute my function I run psql -d 'aa' -U 'bb' -C "select function()"; I have a few questions about it ? 1.When I start the function, it means that the whole function is one big transaction right ? 2..Because the truncates I preform are part of a transaction it means that only at the end of the transaction that space will be freed ? Which mean only when the function is finished? 3..Does running vacuum at the end of the function on the tables that were truncated and then populated with data will have any impact or is it better just to analyze them ? Thanks.
Re: vacuum after truncate
Mariel Cherkassky wrote: > Hi, > I have a big function that includes many truncates on different tables. > In the documentation is is written that truncates creates a new file > and resign the old filenode to the new filenode and the old file > (old data of the table) is deleted in commit. > > In order to execute my function I run psql -d 'aa' -U 'bb' -C "select > function()"; > > I have a few questions about it ? > > 1.When I start the function, it means that the whole function is one big > transaction right ? Right. > 2..Because the truncates I preform are part of a transaction it means that > only at the end > of the transaction that space will be freed ? Which mean only when the > function is finished? Exactly. The old file has to be retained, because there could be a ROLLBACK. > 3..Does running vacuum at the end of the function on the tables that were > truncated and > then populated with data will have any impact or is it better just to > analyze them ? For up-to-date statistics, ANALYZE is enough. If you want to set hint bits so that the first reader doesn't have to do it, VACUUM will help. But that is not necessary. Yours, Laurenz Albe
Extremely slow DELETE with cascade foreign keys
Hi, I think something changed recently in my development environment as I don't recall deletes being so slow before. I've created a new dump and restored to a new database, ran VACUUM FULL ANALYSE and a simple delete takes forever as you can see here: explain analyze delete from field_values where transaction_id=226; QUERY PLAN --- Delete on field_values (cost=0.43..257.93 rows=481 width=6) (actual time=367375.805..367375.805 rows=0 loops=1) -> Index Scan using index_field_values_on_transaction_id on field_values (cost=0.43..257.93 rows=481 width=6) (actual time=0.223..4.216 rows=651 loops=1) Index Cond: (transaction_id = 226) Planning time: 0.234 ms Execution time: 367375.882 ms (5 registros) Time: 367377,085 ms (06:07,377) Any ideas on what could be causing this? Could it be an issue with my hard drive? There aren't that many records to delete from the other tables referencing field_values. I've done this sort of operation earlier this year and it was quite fast. Any clues? Thanks in advance, Rodrigo.
Re: Extremely slow DELETE with cascade foreign keys
Rodrigo Rosenfeld Rosas writes: > Hi, I think something changed recently in my development environment as > I don't recall deletes being so slow before. > I've created a new dump and restored to a new database, ran VACUUM FULL > ANALYSE and a simple delete takes forever as you can see here: The usual suspect for this is not having an index on some FK referencing column, thus forcing the FK check trigger to seq-scan the entire referencing table for each referenced row that is to be deleted. regards, tom lane
Re: Extremely slow DELETE with cascade foreign keys
Em 05-12-2017 14:27, Tom Lane escreveu:
Rodrigo Rosenfeld Rosas writes:
Hi, I think something changed recently in my development environment as
I don't recall deletes being so slow before.
I've created a new dump and restored to a new database, ran VACUUM FULL
ANALYSE and a simple delete takes forever as you can see here:
The usual suspect for this is not having an index on some FK referencing
column, thus forcing the FK check trigger to seq-scan the entire
referencing table for each referenced row that is to be deleted.
regards, tom lane
Thanks, indeed that was the case. I manually inspected about a dozen
tables referencing field_values and the last one ("references") was
referenced by another table ("highlighted_texts") and the reference_id
column that has a foreign key on "references"(id) was missing an index.
Good job :)
Best,
Rodrigo.
Re: Extremely slow DELETE with cascade foreign keys
Rodrigo Rosenfeld Rosas wrote: > explain analyze delete from field_values where transaction_id=226; > QUERY PLAN > --- > Delete on field_values (cost=0.43..257.93 rows=481 width=6) (actual > time=367375.805..367375.805 rows=0 loops=1) > -> Index Scan using index_field_values_on_transaction_id on > field_values (cost=0.43..257.93 rows=481 width=6) (actual time=0.223..4.216 > rows=651 loops=1) > Index Cond: (transaction_id = 226) > Planning time: 0.234 ms > Execution time: 367375.882 ms > (5 registros) > > Time: 367377,085 ms (06:07,377) Normally this is because you lack indexes on the referencing columns, so the query that scans the table to find the referencing rows is a seqscan. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Extremely slow DELETE with cascade foreign keys
Em 05-12-2017 14:43, Alvaro Herrera escreveu: Rodrigo Rosenfeld Rosas wrote: explain analyze delete from field_values where transaction_id=226; QUERY PLAN --- Delete on field_values (cost=0.43..257.93 rows=481 width=6) (actual time=367375.805..367375.805 rows=0 loops=1) -> Index Scan using index_field_values_on_transaction_id on field_values (cost=0.43..257.93 rows=481 width=6) (actual time=0.223..4.216 rows=651 loops=1) Index Cond: (transaction_id = 226) Planning time: 0.234 ms Execution time: 367375.882 ms (5 registros) Time: 367377,085 ms (06:07,377) Normally this is because you lack indexes on the referencing columns, so the query that scans the table to find the referencing rows is a seqscan. Thank you, Álvaro, that was indeed the case, just like Tom Lane suggested as well. I found the missing index and fixed it. Thanks :)
Re: Extremely slow DELETE with cascade foreign keys
Alvaro Herrera writes: > Rodrigo Rosenfeld Rosas wrote: >> explain analyze delete from field_values where transaction_id=226; >> QUERY PLAN >> --- >> Delete on field_values (cost=0.43..257.93 rows=481 width=6) (actual >> time=367375.805..367375.805 rows=0 loops=1) >> -> Index Scan using index_field_values_on_transaction_id on >> field_values (cost=0.43..257.93 rows=481 width=6) (actual time=0.223..4.216 >> rows=651 loops=1) >> Index Cond: (transaction_id = 226) >> Planning time: 0.234 ms >> Execution time: 367375.882 ms >> (5 registros) >> >> Time: 367377,085 ms (06:07,377) > Normally this is because you lack indexes on the referencing columns, so > the query that scans the table to find the referencing rows is a > seqscan. Actually though ... the weird thing about this is that I'd expect to see a separate line in the EXPLAIN output for time spent in the FK trigger. Where'd that go? regards, tom lane
Re: Extremely slow DELETE with cascade foreign keys
Em 05-12-2017 15:25, Tom Lane escreveu: Alvaro Herrera writes: Rodrigo Rosenfeld Rosas wrote: explain analyze delete from field_values where transaction_id=226; QUERY PLAN --- Delete on field_values (cost=0.43..257.93 rows=481 width=6) (actual time=367375.805..367375.805 rows=0 loops=1) -> Index Scan using index_field_values_on_transaction_id on field_values (cost=0.43..257.93 rows=481 width=6) (actual time=0.223..4.216 rows=651 loops=1) Index Cond: (transaction_id = 226) Planning time: 0.234 ms Execution time: 367375.882 ms (5 registros) Time: 367377,085 ms (06:07,377) Normally this is because you lack indexes on the referencing columns, so the query that scans the table to find the referencing rows is a seqscan. Actually though ... the weird thing about this is that I'd expect to see a separate line in the EXPLAIN output for time spent in the FK trigger. Where'd that go? regards, tom lane Yes, I was also hoping to get more insights through the EXPLAIN output :)
Re: Extremely slow DELETE with cascade foreign keys
Rodrigo Rosenfeld Rosas wrote: > Em 05-12-2017 15:25, Tom Lane escreveu: > > > Normally this is because you lack indexes on the referencing columns, so > > > the query that scans the table to find the referencing rows is a > > > seqscan. > > Actually though ... the weird thing about this is that I'd expect to > > see a separate line in the EXPLAIN output for time spent in the FK > > trigger. Where'd that go? > > Yes, I was also hoping to get more insights through the EXPLAIN output :) It normally does. Can you show \d of the table containing the FK? alvherre=# begin; explain analyze delete from pk where a = 505; rollback; BEGIN Duración: 0,207 ms QUERY PLAN ── Delete on pk (cost=0.00..8.27 rows=1 width=6) (actual time=0.023..0.023 rows=0 loops=1) -> Index Scan using pk_pkey on pk (cost=0.00..8.27 rows=1 width=6) (actual time=0.012..0.013 rows=1 loops=1) Index Cond: (a = 505) Trigger for constraint fk_a_fkey: time=201.580 calls=1 Total runtime: 201.625 ms (5 filas) alvherre=# \d fk Tabla «public.fk» Columna │ Tipo │ Modificadores ─┼─┼─── a │ integer │ Restricciones de llave foránea: "fk_a_fkey" FOREIGN KEY (a) REFERENCES pk(a) ON DELETE CASCADE -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Different plan chosen when in lateral subquery
I get very different plan chosen when my query is in a lateral subquery vs
standalone -- it doesn't use a key when joining on a table, instead opting
to do a hash join. Here is the query:
select distinct on (sub.entity_id, sub.note_id, sub.series_id)
entity_id, note_id, series_id
from
(
select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
inv.name
from public.portfolio_allocations alloc
JOIN contributions contrib on contrib.id = alloc.note_id
JOIN investments inv on inv.id = contrib.investment_id
where entity_id = '\x5787f132f50f7b03002cf835' and
alloc.allocated_on <= dates.date
) sub
And wrapped inside the lateral:
explain analyze
select *
from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
LATERAL (
... ...
) lat
Run by itself injecting a hard coded value for dates.date, I get the
expected plan which uses a key index on contributions:
Unique (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053
rows=2 loops=1)
-> Sort (cost=14.54..14.54 rows=2 width=39) (actual
time=0.052..0.052 rows=2 loops=1)
Sort Key: alloc.note_id, alloc.series_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.25..14.53 rows=2 width=39) (actual
time=0.030..0.042 rows=2 loops=1)
-> Nested Loop (cost=0.17..14.23 rows=2 width=52)
(actual time=0.022..0.028 rows=2 loops=1)
-> Index Scan using
portfolio_allocations_entity_id_allocated_on_idx on
portfolio_allocations alloc (cost=0.09..6.05 rows=2 width=39) (actual
time=0.012..0.014
Index Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= '2017-03-14
20:59:59.999+00'::timestamp with time zone))
-> Index Scan using
contributions_id_accrue_from_idx on contributions contrib
(cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005rows=1
loops=2)
Index Cond: (id = alloc.note_id)
-> Index Only Scan using investments_pkey on
investments inv ( cost=0.08..0.15 rows=1 width=13) (actual
time=0.005..0.006 rows=1 loops=2)
Index Cond: (id = contrib.investment_id)
Heap Fetches: 2
Planning time: 0.617 ms
Execution time: 0.100 ms
(15 rows)
But run in the lateral, it doesn't use the index:
Nested Loop (cost=14.54..24.55 rows=2000 width=47) (actual
time=0.085..0.219 rows=534 loops=1)
-> Function Scan on generate_series dates (cost=0.00..3.00
rows=1000 width=8) (actual time=0.031..0.043 rows=267 loops=1)
-> Materialize (cost=14.54..14.55 rows=2 width=39) (actual
time=0.000..0.000 rows=2 loops=267)
-> Unique (cost=14.54..14.54 rows=2 width=39) (actual
time=0.052..0.053 rows=2 loops=1)
-> Sort (cost=14.54..14.54 rows=2 width=39) (actual
time=0.051..0.052 rows=2 loops=1)
Sort Key: alloc.note_id, alloc.series_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.25..14.53 rows=2
width=39) (actual time=0.029..0.041 rows=2 loops=1)
-> Nested Loop (cost=0.17..14.23 rows=2
width=52) (actual time=0.021..0.027 rows=2 loops=1)
-> Index Scan using
portfolio_allocations_entity_id_allocated_on_idx on
portfolio_allocations alloc (cost=0.09..6.05 rows=2 width=39) (actual
time=0
Index Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= '2017-03-14
20:59:59.999+00'::timestamp with timezone))
-> Index Scan using
contributions_id_accrue_from_idx on contributions contrib
(cost=0.08..4.09 rows=1 width=26) ( actual time=0.005..0.005 rows=1 loo
Index Cond: (id =
alloc.note_id)
-> Index Only Scan using investments_pkey
on investments inv ( cost=0.08..0.15 rows=1 width=13) (actual
time=0.005..0.006 rows=1loops=2)
Index Cond: (id =
contrib.investment_id)
Heap Fetches: 2
Planning time: 0.718 ms
Execution time: 0.296 ms
(18 rows)
For reference, here are the indexes on the relevant tables:
Indexes:
"portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id,
allocated_on DESC)
"portfolio_allocations_note_id_allocated_on_idx" btree (note_id,
allocated_on DESC)
"portfolio_allocations_pnsa" btree (entity_id, note_id, series_id,
allocated_on DESC)
Indexes:
"contributions_pkey" PRIMARY KEY, b
Re: Different plan chosen when in lateral subquery
Weird, when I deleted an erroneous index it started picking a reasonable
plan. This now works as expected, for posterity here is the bad plan:
Nested Loop (cost=21281.50..21323812.82 rows=5621000 width=47) (actual
time=171.648..7233.298 rows=85615 loops=1)
-> Function Scan on generate_series dates (cost=0.00..3.00 rows=1000
width=8) (actual time=0.031..0.252 rows=267 loops=1)
-> Unique (cost=21281.50..21290.08 rows=5621 width=39) (actual
time=25.730..27.050 rows=321 loops=267)
-> Sort (cost=21281.50..21284.36 rows=5724 width=39) (actual
time=25.728..26.242 rows=6713 loops=267)
Sort Key: alloc.note_id, alloc.series_id
Sort Method: quicksort Memory: 2220kB
-> Nested Loop (cost=10775.92..21210.05 rows=5724
width=39) (actual time=1.663..21.938 rows=6713 loops=267)
-> Hash Join (cost=10775.83..20355.61 rows=5724
width=52) (actual time=1.657..5.980 rows=6713 loops=267)
Hash Cond: (alloc.note_id = contrib.id)
-> Bitmap Heap Scan on portfolio_allocations
alloc (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278
rows=6713 loops=267)
Recheck Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
date(dates.dates)))
Heap Blocks: exact=118074
-> Bitmap Index Scan on
portfolio_allocations_entity_id_allocated_on_idx (cost=0.00..69.53
rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo
Index Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
date(dates.dates)))
-> Hash (cost=9464.85..9464.85 rows=354617
width=26) (actual time=169.792..169.792 rows=354617 loops=1)
Buckets: 524288 Batches: 1 Memory Usage:
24296kB
-> Seq Scan on contributions contrib
(cost=0.00..9464.85
rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)
-> Index Only Scan using investments_pkey on
investments inv (cost=0.08..0.15 rows=1 width=13) (actual
time=0.002..0.002 rows=1 loops=1792457)
Index Cond: (id = contrib.investment_id)
Heap Fetches: 1792457
Planning time: 0.721 ms
Execution time: 7236.507 ms
On Tue, Dec 5, 2017 at 10:04 AM Alex Reece wrote:
> I get very different plan chosen when my query is in a lateral subquery vs
> standalone -- it doesn't use a key when joining on a table, instead opting
> to do a hash join. Here is the query:
>
> select distinct on (sub.entity_id, sub.note_id, sub.series_id)
>entity_id, note_id, series_id
> from
> (
> select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
> inv.name
> from public.portfolio_allocations alloc
> JOIN contributions contrib on contrib.id = alloc.note_id
> JOIN investments inv on inv.id = contrib.investment_id
> where entity_id = '\x5787f132f50f7b03002cf835' and
> alloc.allocated_on <= dates.date
> ) sub
>
> And wrapped inside the lateral:
>
> explain analyze
> select *
> from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
> current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
> LATERAL (
> ... ...
> ) lat
>
> Run by itself injecting a hard coded value for dates.date, I get the
> expected plan which uses a key index on contributions:
>
> Unique (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.052..0.053 rows=2 loops=1)
> -> Sort (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.052..0.052 rows=2 loops=1)
>Sort Key: alloc.note_id, alloc.series_id
>Sort Method: quicksort Memory: 25kB
>-> Nested Loop (cost=0.25..14.53 rows=2 width=39) (actual
> time=0.030..0.042 rows=2 loops=1)
> -> Nested Loop (cost=0.17..14.23 rows=2 width=52)
> (actual time=0.022..0.028 rows=2 loops=1)
>-> Index Scan using
> portfolio_allocations_entity_id_allocated_on_idx on
> portfolio_allocations alloc (cost=0.09..6.05 rows=2 width=39) (actual
> time=0.012..0.014
> Index Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= '2017-03-14
> 20:59:59.999+00'::timestamp with time zone))
>-> Index Scan using
> contributions_id_accrue_from_idx on contributions contrib
> (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005rows=1
> loops=2)
> Index Cond: (id = alloc.note_id)
> -> Index Only Scan using investments_pkey on
> investments inv ( cost=0.08..0.15 rows=1 width=13) (actual
> time=0.005..0.006 rows=1 loops=2)
>Index Con
Re: Extremely slow DELETE with cascade foreign keys
Em 05-12-2017 15:49, Alvaro Herrera escreveu:
Rodrigo Rosenfeld Rosas wrote:
Em 05-12-2017 15:25, Tom Lane escreveu:
Normally this is because you lack indexes on the referencing columns, so
the query that scans the table to find the referencing rows is a
seqscan.
Actually though ... the weird thing about this is that I'd expect to
see a separate line in the EXPLAIN output for time spent in the FK
trigger. Where'd that go?
Yes, I was also hoping to get more insights through the EXPLAIN output :)
It normally does. Can you show \d of the table containing the FK?
\d highlighted_text
Tabela "public.highlighted_text"
Coluna | Tipo | Collation | Nullable
| Default
--+-+---+--+--
id | integer | | not null |
nextval('highlighted_text_id_seq'::regclass)
date_created | timestamp without time zone | | not null |
last_updated | timestamp without time zone | | not null |
reference_id | integer | | not null |
highlighting | text | | |
Índices:
"highlighted_text_pkey" PRIMARY KEY, btree (id)
"highlighted_text_reference_id_idx" btree (reference_id)
Restrições de chave estrangeira:
"fk_highlighted_text_reference" FOREIGN KEY (reference_id)
REFERENCES "references"(id) ON DELETE CASCADE
The highlighted_text_reference_id_idx was previously missing.
begin; explain analyze delete from "references" where id=966539; rollback;
BEGIN
Tempo: 0,466 ms
QUERY PLAN
Delete on "references" (cost=0.43..8.45 rows=1 width=6) (actual
time=2.683..2.683 rows=0 loops=1)
-> Index Scan using references_pkey on "references"
(cost=0.43..8.45 rows=1 width=6) (actual time=2.609..2.612 rows=1 loops=1)
Index Cond: (id = 966539)
Planning time: 0.186 ms
Trigger for constraint fk_highlighted_text_reference: time=0.804 calls=1
Execution time: 3.551 ms
(6 registros)
Tempo: 4,791 ms
ROLLBACK
Tempo: 0,316 ms
drop index highlighted_text_reference_id_idx;
DROP INDEX
Tempo: 35,938 ms
begin; explain analyze delete from "references" where id=966539; rollback;
BEGIN
Tempo: 0,494 ms
QUERY PLAN
Delete on "references" (cost=0.43..8.45 rows=1 width=6) (actual
time=0.112..0.112 rows=0 loops=1)
-> Index Scan using references_pkey on "references"
(cost=0.43..8.45 rows=1 width=6) (actual time=0.071..0.074 rows=1 loops=1)
Index Cond: (id = 966539)
Planning time: 0.181 ms
Trigger for constraint fk_highlighted_text_reference: time=2513.816
calls=1
Execution time: 2513.992 ms
(6 registros)
Time: 2514,801 ms (00:02,515)
ROLLBACK
Tempo: 0,291 ms
It displayed the spent on the trigger this time. How about deleting the
field values?
begin; explain analyze delete from field_values where
transaction_id=2479; rollback;
BEGIN
Tempo: 0,461 ms
QUERY PLAN
---
Delete on field_values (cost=0.43..364.98 rows=453 width=6) (actual
time=4.732..4.732 rows=0 loops=1)
-> Index Scan using index_field_values_on_transaction_id on
field_values (cost=0.43..364.98 rows=453 width=6) (actual
time=0.137..0.949 rows=624 loops=1)
Index Cond: (transaction_id = 2479)
Planning time: 0.210 ms
Trigger for constraint field_value_booleans_field_value_id_fkey on
field_values: time=7.953 calls=624
Trigger for constraint field_value_currencies_field_value_id_fkey on
field_values: time=5.548 calls=624
Trigger for constraint field_value_jurisdictions_field_value_id_fkey
on field_values: time=6.376 calls=624
Trigger for constraint fk_field_value_date_range_field_value_id on
field_values: time=5.735 calls=624
Trigger for constraint fk_field_value_dates_field_value_id on
field_values: time=6.316 calls=624
Trigger for constraint fk_field_value_numerics_field_value_id on
field_values: time=6.368 calls=624
Trigger for constraint fk_field_value_options_field_value_id on
field_values: time=6.503 calls=624
Trigger for constraint fk_field_value_strings_field_value_id on
field_values: time=6.794 calls=624
Trigger for constraint fk_field_value_time_spans_field_value_id on
field_values: time=6.332 calls=624
Trigger for constraint fk_references_field_value_id on field_values:
time=7.382 calls=624
Trigger for constraint fk_highlighted_text_reference on references:
tim
Re: Different plan chosen when in lateral subquery
Argh, so sorry for repeated posts; I'll be very careful to review them
before posting. The "good plan" was the result of me hard coding '2017-03-14
20:59:59.999+00'::timestamp of using dates.date inside the lateral
subquery. When I correctly use dates.date, it takes 7000ms instead of
0.3ms. My questions still remain:
I have a few questions here:
- Why doesn't it use the primary key on contributions in either case,
preferring contributions_id_accrue_from_idx or none at all?
- Why isn't it choosing portfolio_allocations_pnsa, which seems like it
would prevent it from having to sort?
- What information can I gather to answer these questions on my own?
~Alex
On Tue, Dec 5, 2017 at 10:08 AM Alex Reece wrote:
> Weird, when I deleted an erroneous index it started picking a reasonable
> plan. This now works as expected, for posterity here is the bad plan:
>
> Nested Loop (cost=21281.50..21323812.82 rows=5621000 width=47) (actual
> time=171.648..7233.298 rows=85615 loops=1)
>
>-> Function Scan on generate_series dates (cost=0.00..3.00 rows=1000
> width=8) (actual time=0.031..0.252 rows=267 loops=1)
>
>-> Unique (cost=21281.50..21290.08 rows=5621 width=39) (actual
> time=25.730..27.050 rows=321 loops=267)
>
> -> Sort (cost=21281.50..21284.36 rows=5724 width=39) (actual
> time=25.728..26.242 rows=6713 loops=267)
>
>Sort Key: alloc.note_id, alloc.series_id
>
>Sort Method: quicksort Memory: 2220kB
>
>-> Nested Loop (cost=10775.92..21210.05 rows=5724
> width=39) (actual time=1.663..21.938 rows=6713 loops=267)
>
> -> Hash Join (cost=10775.83..20355.61 rows=5724
> width=52) (actual time=1.657..5.980 rows=6713 loops=267)
>
>Hash Cond: (alloc.note_id = contrib.id)
>
>-> Bitmap Heap Scan on portfolio_allocations
> alloc (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278
> rows=6713 loops=267)
>
> Recheck Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
> date(dates.dates)))
>
> Heap Blocks: exact=118074
>
> -> Bitmap Index Scan on
> portfolio_allocations_entity_id_allocated_on_idx (cost=0.00..69.53
> rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo
>
>Index Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
> date(dates.dates)))
>
>-> Hash (cost=9464.85..9464.85 rows=354617
> width=26) (actual time=169.792..169.792 rows=354617 loops=1)
>
> Buckets: 524288 Batches: 1 Memory
> Usage: 24296kB
>
> -> Seq Scan on contributions contrib
> (cost=0.00..9464.85
> rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)
>
> -> Index Only Scan using investments_pkey on
> investments inv (cost=0.08..0.15 rows=1 width=13) (actual
> time=0.002..0.002 rows=1 loops=1792457)
>
>Index Cond: (id = contrib.investment_id)
>
>Heap Fetches: 1792457
>
> Planning time: 0.721 ms
>
> Execution time: 7236.507 ms
>
>
> On Tue, Dec 5, 2017 at 10:04 AM Alex Reece wrote:
>
>> I get very different plan chosen when my query is in a lateral subquery
>> vs standalone -- it doesn't use a key when joining on a table, instead
>> opting to do a hash join. Here is the query:
>>
>> select distinct on (sub.entity_id, sub.note_id, sub.series_id)
>>entity_id, note_id, series_id
>> from
>> (
>> select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
>> inv.name
>> from public.portfolio_allocations alloc
>> JOIN contributions contrib on contrib.id = alloc.note_id
>> JOIN investments inv on inv.id = contrib.investment_id
>> where entity_id = '\x5787f132f50f7b03002cf835' and
>> alloc.allocated_on <= dates.date
>> ) sub
>>
>> And wrapped inside the lateral:
>>
>> explain analyze
>> select *
>> from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
>> current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
>> LATERAL (
>> ... ...
>> ) lat
>>
>> Run by itself injecting a hard coded value for dates.date, I get the
>> expected plan which uses a key index on contributions:
>>
>> Unique (cost=14.54..14.54 rows=2 width=39) (actual
>> time=0.052..0.053 rows=2 loops=1)
>> -> Sort (cost=14.54..14.54 rows=2 width=39) (actual
>> time=0.052..0.052 rows=2 loops=1)
>>Sort Key: alloc.note_id, alloc.series_id
>>Sort Method: quicksort Memory: 25kB
>>-> Nested Loop (cost=0.25..14.53 rows=2 width=39)
>> (actual time=0.030..0.042 rows=2 loops=1)
>> -> Nested Loop (cost=0.17..14.23 rows=2 width=52)
>> (actual time=0.022..0.028
Re: Extremely slow DELETE with cascade foreign keys
Em 05-12-2017 16:15, Rodrigo Rosenfeld Rosas escreveu:
Em 05-12-2017 15:49, Alvaro Herrera escreveu:
Rodrigo Rosenfeld Rosas wrote:
Em 05-12-2017 15:25, Tom Lane escreveu:
Normally this is because you lack indexes on the referencing
columns, so
the query that scans the table to find the referencing rows is a
seqscan.
Actually though ... the weird thing about this is that I'd expect to
see a separate line in the EXPLAIN output for time spent in the FK
trigger. Where'd that go?
Yes, I was also hoping to get more insights through the EXPLAIN
output :)
It normally does. Can you show \d of the table containing the FK?
\d highlighted_text
Tabela "public.highlighted_text"
Coluna | Tipo | Collation | Nullable
| Default
--+-+---+--+--
id | integer | | not null |
nextval('highlighted_text_id_seq'::regclass)
date_created | timestamp without time zone | | not null |
last_updated | timestamp without time zone | | not null |
reference_id | integer | | not null |
highlighting | text | | |
Índices:
"highlighted_text_pkey" PRIMARY KEY, btree (id)
"highlighted_text_reference_id_idx" btree (reference_id)
Restrições de chave estrangeira:
"fk_highlighted_text_reference" FOREIGN KEY (reference_id)
REFERENCES "references"(id) ON DELETE CASCADE
The highlighted_text_reference_id_idx was previously missing.
begin; explain analyze delete from "references" where id=966539;
rollback;
BEGIN
Tempo: 0,466 ms
QUERY PLAN
Delete on "references" (cost=0.43..8.45 rows=1 width=6) (actual
time=2.683..2.683 rows=0 loops=1)
-> Index Scan using references_pkey on "references"
(cost=0.43..8.45 rows=1 width=6) (actual time=2.609..2.612 rows=1
loops=1)
Index Cond: (id = 966539)
Planning time: 0.186 ms
Trigger for constraint fk_highlighted_text_reference: time=0.804 calls=1
Execution time: 3.551 ms
(6 registros)
Tempo: 4,791 ms
ROLLBACK
Tempo: 0,316 ms
drop index highlighted_text_reference_id_idx;
DROP INDEX
Tempo: 35,938 ms
begin; explain analyze delete from "references" where id=966539;
rollback;
BEGIN
Tempo: 0,494 ms
QUERY PLAN
Delete on "references" (cost=0.43..8.45 rows=1 width=6) (actual
time=0.112..0.112 rows=0 loops=1)
-> Index Scan using references_pkey on "references"
(cost=0.43..8.45 rows=1 width=6) (actual time=0.071..0.074 rows=1
loops=1)
Index Cond: (id = 966539)
Planning time: 0.181 ms
Trigger for constraint fk_highlighted_text_reference: time=2513.816
calls=1
Execution time: 2513.992 ms
(6 registros)
Time: 2514,801 ms (00:02,515)
ROLLBACK
Tempo: 0,291 ms
It displayed the spent on the trigger this time. How about deleting
the field values?
begin; explain analyze delete from field_values where
transaction_id=2479; rollback;
BEGIN
Tempo: 0,461 ms
QUERY PLAN
---
Delete on field_values (cost=0.43..364.98 rows=453 width=6) (actual
time=4.732..4.732 rows=0 loops=1)
-> Index Scan using index_field_values_on_transaction_id on
field_values (cost=0.43..364.98 rows=453 width=6) (actual
time=0.137..0.949 rows=624 loops=1)
Index Cond: (transaction_id = 2479)
Planning time: 0.210 ms
Trigger for constraint field_value_booleans_field_value_id_fkey on
field_values: time=7.953 calls=624
Trigger for constraint field_value_currencies_field_value_id_fkey on
field_values: time=5.548 calls=624
Trigger for constraint field_value_jurisdictions_field_value_id_fkey
on field_values: time=6.376 calls=624
Trigger for constraint fk_field_value_date_range_field_value_id on
field_values: time=5.735 calls=624
Trigger for constraint fk_field_value_dates_field_value_id on
field_values: time=6.316 calls=624
Trigger for constraint fk_field_value_numerics_field_value_id on
field_values: time=6.368 calls=624
Trigger for constraint fk_field_value_options_field_value_id on
field_values: time=6.503 calls=624
Trigger for constraint fk_field_value_strings_field_value_id on
field_values: time=6.794 calls=624
Trigger for constraint fk_field_value_time_spans_field_value_id on
field_values: time=6.332 calls=624
Trigger for constraint fk_references_field_value_id on field_values:
time=7.382 calls=624
Tr
Re: Bitmap scan is undercosted? - boolean correlation
Jeff Janes writes: > On Dec 3, 2017 15:31, "Tom Lane" wrote: >> Jeff Janes writes: >>> But I do see that ties within the logical order of the column values are >>> broken to agree with the physical order. That is wrong, right? Is there >>> any argument that this is desirable? >> Uh ... what do you propose doing instead? We'd have to do something with >> ties, and it's not so obvious this way is wrong. > Let them be tied. If there are 10 distinct values, number the values 0 to > 9, and all rows of a given distinct values get the same number for the > logical order axis. > Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to > me. Although if we switched btree to store duplicate values with tid as a > tie breaker, then maybe it wouldn't be as obviously wrong. I thought some more about this. What we really want the correlation stat to do is help us estimate how randomly an index-ordered scan will access the heap. If the values we've sampled are all unequal then there's no particular issue. However, if we have some group of equal values, we do not really know what order an indexscan will visit them in. The existing correlation calculation is making the *most optimistic possible* assumption, that such a group will be visited exactly in heap order --- and that assumption isn't too defensible. IIRC, a freshly built b-tree will behave that way, because the initial sort of a btree breaks ties using heap TIDs; but we don't maintain that property during later insertions. In any case, given that we do this calculation without regard to any specific index, we can't reasonably expect to model exactly what the index will do. It would be best to adopt some middle-of-the-road assumption about what the heap visitation order will be for a set of duplicate values: not exactly heap order, but I think we should not use a worst-case assumption either, since the btree may retain some amount of its initial ordering. BTW, I disagree that "correlation = zero" is the right answer for this particular example. If the btree is freshly built, then an index-order scan would visit all the heap pages in sequence to fetch "f" rows, and then visit them all in sequence again to fetch "t" rows, which is a whole lot better than the purely random access that zero correlation implies. So I think 0.8 or so is actually a perfectly reasonable answer when the index is fresh. The trouble is just that it'd behoove us to derate that answer somewhat for the probability that the index isn't fresh. My first thought for a concrete fix was to use the mean position of a group of duplicates for the purposes of the correlation calculation, but on reflection that's clearly wrong. We do have an idea, from the data we have, whether the duplicates are close together in the heap or spread all over. Using only mean position would fail to distinguish those cases, but really we'd better penalize the spread-all-over case. I'm not sure how to do that. Or we could leave this calculation alone and try to move towards keeping equal values in TID order in btrees. Not sure about the downsides of that, though. regards, tom lane
Re: Half billion records in one table? RDS
Why not store metadata in pg and the payload in S3? On Mon, Nov 27, 2017 at 11:58 AM Jean Baro wrote: > Hi there, > > We are creating a new DB which will behave most like a file system, I > mean, there will be no complex queries or joins running in the DB. The idea > is to grab the WHOLE set of messages for a particular user and then filter, > order, combine or full text search in the function itself (AWS Lambda). The > maximum number of messages is limited to 1.000 messages per user. So we > expect Postgres to have an amazing performance for this scenario. > > As I am not really familiar with PG (9.6, or 10, in case RDS release it > before February) I would like to share what we are planning to do for this > DB. So if you guys could share your thoughts, that would be great! :) > > Table structure: > > > > · MessageID (UUID) - PK > > · UserCountry (ISO) > > · UserRole (TEXT 15) > > · UserID (TEXT 30) – FK (although there is no constraint) > > · LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and > end_date?) > > · Channel (TEXT 15) > > · Tags (TEXT 2000) > > · Menu (TEXT 200) > > · Icon (TEXT 500) – URL to an image which will be used as an icon; > > · Title (TEXT 150) > > · *Body (JSON – up to 10K) – Meta data describing all the data to > a specific type of message. The JSON changes according to the type of > message. We are assuming most messages will use less than 1K for this > field.* > > · Delete (BOOLEAN) – Soft Delete > > · Created (Timestamp – With TZ) > > · CreatedBy (TEXT 50) > > > > Only 1 table > > · Messages > > 3 indexes: > > · MessageID PK (UUID) > > · Main fetch key (UserCountry + UserID) - * > > · End_date (To locate old messages that can be moved to another > DB - which will hold the old messages); > > > > Sizing and worst case scenario: > > > > · 500MM messages in the main DB > > · 4K queries per second (by UserID) – Max time of 500ms per > query. Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the > messages for a particular user. MAX 1000 messages per USER. > > · 1K inserts per second on average (So that in 1 hour we can > insert around 3MM messages) > > · 1K deletes per second on average (So that in 1 hour we can > remove around 3MM messages) > > > My question is: > > >- Can we use any kind of compression for PostgreSQL which would result >in reduced IO and disk size? >- We are not relying on any kind of table partitioning, is that the >best approach for this scenario? >- Is PG on RDS capable of delivering this type of performance while >requiring low maintenance? >- What about Auto Vacuum? Any suggestion how to optimize it for such a >work load (we will insert and delete millions of rows every day). > > P.S.: We are going to test all this, but if we don't get the performance > we are expecting, all optimization tips from you guys will be really > appreciated. :) > > Thanks > > > > -- Regards, /Aaron
Re: insert and query performance on big string table with pg_trgm
On Nov 21, 2017, at 12:05 AM, Matthew Hall wrote: >> Do you really need the artificial primary key, when you already have another >> column that would be used as the primary key? If you need to use this it a >> foreign key in another type, then very well might. But maintaining two >> unique indexes doesn't come free. > > OK, fair enough, I'll test with it removed and see what happens. With the integer primary key removed, it still takes ~9 hours to load the table, so it didn't seem to make a big difference. > Fixed it; I will re-test w/ 1 GB. Have you got any recommended values so I > don't screw it up? I also took this step for maintenance_work_mem. Queries on the table still take a long time with the PK removed: # explain (analyze, buffers) select * from huge_table where value ilike '%yahoo%'; Bitmap Heap Scan on huge_table (cost=593.72..68828.97 rows=18803 width=25) (actual time=3224.100..70059.839 rows=20909 loops=1) Recheck Cond: ((value)::text ~~* '%yahoo%'::text) Rows Removed by Index Recheck: 17 Heap Blocks: exact=6682 Buffers: shared hit=544 read=6760 dirtied=4034 I/O Timings: read=69709.611 -> Bitmap Index Scan on huge_table_value_trgm_idx (cost=0.00..589.02 rows=18803 width=0) (actual time=3216.545..3216.545 rows=20926 loops=1) Index Cond: ((value)::text ~~* '%yahoo%'::text) Buffers: shared hit=352 read=270 I/O Timings: read=3171.872 Planning time: 0.283 ms Execution time: 70065.157 ms (12 rows) The slow process during inserts is: postgres: username dbname [local] INSERT The slow statement example is: 2017-12-06 04:27:11 UTC [16085-10378] username@dbname LOG: duration: 5028.190 ms plan: Query Text: INSERT INTO huge_table (value) VALUES 5000 values at once ... ON CONFLICT (value) DO NOTHING Insert on huge_table (cost=0.00..75.00 rows=5000 width=40) Conflict Resolution: NOTHING Conflict Arbiter Indexes: huge_table_value_idx -> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=5000 width=40) > What is the size of the table and the gin index? The table is 10 GB. The gin index is 5.8 GB. > [From Gabor Szucs] [H]ow about adding a hash value column and creating the > unique index on that one? May block some false duplicates but the unique > index would be way smaller, speeding up inserts. The mean length of the input items is about 18 bytes. The max length of the input items is about 67 bytes. The size of the md5 would of course be 16 bytes. I'm testing it now, and I'll write another update. Matthew.
Re: insert and query performance on big string table with pg_trgm
>Buffers: shared hit=544 read=6760 dirtied=4034 >I/O Timings: read=69709.611 You has very slow (or busy) disks, not postgresql issue. Reading 6760 * 8KB in 70 seconds is very bad result. For better performance you need better disks, at least raid10 (not raid5). Much more memory in shared_buffers can help with read performance and so reduce disk utilization, but write operations still will be slow. Sergei
