date:20240130

Slow query in table where many rows were deleted. VACUUM FULL fixes it

2024-01-30 Thread Pavlos Kallis

Hi all,

I have the following table:

CREATE TABLE IF NOT EXISTS public.shortened_url
(
id character varying(12) COLLATE pg_catalog."default" NOT NULL,
created_at timestamp without time zone,
expires_at timestamp without time zone,
url text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT shortened_url_pkey PRIMARY KEY (id)
)

The table contains only the following index on PRIMARY KEY:

CREATE UNIQUE INDEX IF NOT EXISTS shortened_url_pkey
ON public.shortened_url USING btree
(id COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;

This table has approximately 5 million rows of expired URLs (expires_at <
now()), and 5 thousand rows of non-expired URLs (expires_at > now())

I deleted all expired URLs with this query:

DELETE FROM shortened_url WHERE expires_at < now().

Then, I tried to query the table for expired URLs:

SELECT * FROM shortened_url WHERE expires_at < now();

This query was very slow. It took around 1-2 minutes to run, while it had
to fetch only 5000 rows (the non-expired URLs, since the other ones were
deleted).

After that, I tried to run VACUUM ANALYZE and REINDEX to the table.
The query was still slow.

Finally, I ran VACUUM FULL and re-executed the query. Only then, it started
running fast (1-2 seconds).

Do you have observed a similar behavior with VACUUM ANALYZE / VACUUM FULL
and why this can happen?
Is this because data is compacted after VACUUM FULL and sequential disk
reads are faster?
Shouldn't VACUUM ANALYZE reclaim the disk space and make the query run fast?
Is this because RDS might do some magic? Is it something I am missing?

*Additional details*
PostgreSQL version: 14.7 on db.t3.micro RDS
PG configuration: Default of RDS

Kind Regards,
Pavlos

Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it

2024-01-30 Thread Laurenz Albe

On Tue, 2024-01-30 at 11:40 +0200, Pavlos Kallis wrote:
> I have the following table:
> 
> CREATE TABLE IF NOT EXISTS public.shortened_url
> (
>     id character varying(12) COLLATE pg_catalog."default" NOT NULL,
>     created_at timestamp without time zone,
>     expires_at timestamp without time zone,
>     url text COLLATE pg_catalog."default" NOT NULL,
>     CONSTRAINT shortened_url_pkey PRIMARY KEY (id)
> )
> 
> The table contains only the following index on PRIMARY KEY:
> 
> CREATE UNIQUE INDEX IF NOT EXISTS shortened_url_pkey
>     ON public.shortened_url USING btree
>     (id COLLATE pg_catalog."default" ASC NULLS LAST)
>     TABLESPACE pg_default;
> 
> This table has approximately 5 million rows of expired URLs (expires_at < 
> now()), and 5 thousand rows of non-expired URLs (expires_at > now())
> 
> I deleted all expired URLs with this query:
> 
> DELETE FROM shortened_url WHERE expires_at < now().
> 
> Then, I tried to query the table for expired URLs:
> 
> SELECT * FROM shortened_url WHERE expires_at < now();
> 
> This query was very slow. It took around 1-2 minutes to run, while it had to 
> fetch only 5000 rows (the non-expired URLs, since the other ones were 
> deleted).
> 
> After that, I tried to run VACUUM ANALYZE and REINDEX to the table.
> The query was still slow.
> 
> Finally, I ran VACUUM FULL and re-executed the query. Only then, it started 
> running fast (1-2 seconds).
> 
> Do you have observed a similar behavior with VACUUM ANALYZE / VACUUM FULL and 
> why this can happen? 
> Is this because data is compacted after VACUUM FULL and sequential disk reads 
> are faster? 
> Shouldn't VACUUM ANALYZE reclaim the disk space and make the query run fast?
> Is this because RDS might do some magic? Is it something I am missing?

There are too many unknowns here.  Please enable "track_io_timing" and send us
the output of EXPLAIN (ANALYZE, BUFFERS) for the slow statements.

One theory could be that there was a long running transaction or something else
that prevented VACUUM from cleaning up.  For that, the output of
"VACUUM (VERBOSE) shortened_url" would be interesting.

> Additional details
> PostgreSQL version: 14.7 on db.t3.micro RDS
> PG configuration: Default of RDS

We can only speak about real PostgreSQL...

Yours,
Laurenz Albe

Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it

2024-01-30 Thread Philip Semanchuk

> On Jan 30, 2024, at 4:40 AM, Pavlos Kallis  wrote:
> 
> Shouldn't VACUUM ANALYZE reclaim the disk space?

Hi Pavlos,
The short answer to this is “no”. That’s an important difference between VACUUM 
(also known as “plain” VACUUM) and VACUUM FULL. In some special cases plain 
VACUUM can reclaim disk space, but I think both the circumstances under which 
it can do so and the amount it can reclaim are pretty limited. An 
oversimplified but "mostly correct" way to think about it is that plain VACUUM 
can't reclaim disk space, whereas VACUUM FULL can.

This is covered in the 4th paragraph of the doc of the VACUUM command --
https://www.postgresql.org/docs/current/sql-vacuum.html

So in your case those 5m rows that you deleted were probably still clogging up 
your table until you ran VACUUM FULL.

Hope this helps
Philip

Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it

2024-01-30 Thread David Rowley

On Wed, 31 Jan 2024 at 09:09, Philip Semanchuk
 wrote:
> So in your case those 5m rows that you deleted were probably still clogging 
> up your table until you ran VACUUM FULL.

It seems more likely to me that the VACUUM removed the rows and just
left empty pages in the table.  Since there's no index on expires_at,
the only way to answer that query is to Seq Scan and Seq Scan will
need to process those empty pages.  While that processing is very fast
if the page's item pointers array is empty, it could still be slow if
the page needs to be read from disk. Laurenz's request for the explain
(analyze, buffers) output with track_io_timing on will help confirm
this.

If it is just reading empty pages that's causing this issue then
adding that missing index would improve the situation after running
just plain VACUUM each time there's a bulk delete.

David

Slow query in table where many rows were deleted. VACUUM FULL fixes it

Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it

Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it

Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it

4 matches

Site Navigation

Mail list logo

Footer information