Why Postgres doesn't use TID scan?

2018-12-17 Thread Vladimir Ryabtsev
I want to clean a large log table by chunks. I write such a query:

delete from categorization.log
where ctid in (
select ctid from categorization.log
where timestamp < now() - interval '2 month'
limit 1000
)

But I am getting the following weird plan:

[Plan 1]
Delete on log  (cost=74988058.17..77101421.77 rows=211334860 width=36)
  ->  Merge Semi Join  (cost=74988058.17..77101421.77 rows=211334860
width=36)
Merge Cond: (log.ctid = "ANY_subquery".ctid)
->  Sort  (cost=74987967.33..76044641.63 rows=422669720 width=6)
  Sort Key: log.ctid
  ->  Seq Scan on log  (cost=0.00..8651368.20 rows=422669720
width=6)
->  Sort  (cost=90.83..93.33 rows=1000 width=36)
  Sort Key: "ANY_subquery".ctid
  ->  Subquery Scan on "ANY_subquery"  (cost=0.00..41.00
rows=1000 width=36)
->  Limit  (cost=0.00..31.00 rows=1000 width=6)
  ->  Seq Scan on log log_1
(cost=0.00..11821391.10 rows=381284367 width=6)
Filter: ("timestamp" < (now() - '2
mons'::interval))

And it takes infinity to complete (with any number in LIMIT from 1 to 1000).

However if I extract CTIDs manually:

select array_agg(ctid) from (
select ctid from s.log
where timestamp < now() - interval '2 month'
limit 5
) v

and substitute the result inside the DELETE query, it does basic TID scan
and completes in just milliseconds:

explain
delete from s.log
where ctid =
any('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[])

[Plan 2]
Delete on log  (cost=0.01..20.06 rows=5 width=6)
  ->  Tid Scan on log  (cost=0.01..20.06 rows=5 width=6)
TID Cond: (ctid = ANY
('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[]))

In case the table's definition helps:

CREATE TABLE s.log
(
article_id bigint NOT NULL,
topic_id integer NOT NULL,
weight double precision NOT NULL,
cat_system character varying(50) NOT NULL,
lang character varying(5) NOT NULL,
is_final boolean NOT NULL,
comment character varying(50),
"timestamp" timestamp without time zone DEFAULT now()
)

Number of rows ~ 423M
n_live_tup = 422426725
last_vacuum = 2018-10-22
Postgres version(): PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4,
64-bit

Why does this query want to use Seq Scan and Sort on a 423M rows table?
How to fix this (reduce it to Plan 2)?

--
Vlad


Re: Why Postgres doesn't use TID scan?

2018-12-17 Thread Tom Lane
Vladimir Ryabtsev  writes:
> I want to clean a large log table by chunks. I write such a query:
> delete from categorization.log
> where ctid in (
> select ctid from categorization.log
> where timestamp < now() - interval '2 month'
> limit 1000
> )

> Why does this query want to use Seq Scan and Sort on a 423M rows table?

There's no support for using ctid as a join key in this way; specifically,
nodeTidscan.c doesn't have support for being a parameterized inner scan,
nor does tidpath.c have code to generate such a plan.  The header comments
for the latter say

 * There is currently no special support for joins involving CTID; in
 * particular nothing corresponding to best_inner_indexscan().  Since it's
 * not very useful to store TIDs of one table in another table, there
 * doesn't seem to be enough use-case to justify adding a lot of code
 * for that.

Queries like yours are kinda sorta counterexamples to that, but pretty
much all the ones I've seen seem like crude hacks (and this one is not
an exception).  Writing a bunch of code to support them feels like
solving the wrong problem.  Admittedly, it's not clear to me what the
right problem to solve instead would be.

(It's possible that I'm overestimating the amount of new code that would
be needed to implement this, however.  indxpath.c is pretty huge, but
that's mostly because there are so many cases to consider.  There'd only
be one interesting case for an inner TID scan.  Also, this comment is
ancient, predating the current approach with parameterized paths ---
in fact best_inner_indexscan doesn't exist as such anymore.  So maybe
that old judgment that it'd take a lot of added code is wrong.)

regards, tom lane



Re: Why Postgres doesn't use TID scan?

2018-12-17 Thread Vladimir Ryabtsev
I can't believe it.
I see some recommendations in Internet to do like this (e.g.
https://stackoverflow.com/questions/5170546/how-do-i-delete-a-fixed-number-of-rows-with-sorting-in-postgresql
).
Did it really work in 2011? Are you saying they broke it? It's a shame...

Anyway I think the problem is pretty clear: I want to eventually clear the
table based on the predicate but I don't want to lock it for a long time.
The table does not have a primary key.
What should be a proper solution?

--
Vlad

пн, 17 дек. 2018 г. в 17:40, Tom Lane :

> Vladimir Ryabtsev  writes:
> > I want to clean a large log table by chunks. I write such a query:
> > delete from categorization.log
> > where ctid in (
> > select ctid from categorization.log
> > where timestamp < now() - interval '2 month'
> > limit 1000
> > )
>
> > Why does this query want to use Seq Scan and Sort on a 423M rows table?
>
> There's no support for using ctid as a join key in this way; specifically,
> nodeTidscan.c doesn't have support for being a parameterized inner scan,
> nor does tidpath.c have code to generate such a plan.  The header comments
> for the latter say
>
>  * There is currently no special support for joins involving CTID; in
>  * particular nothing corresponding to best_inner_indexscan().  Since it's
>  * not very useful to store TIDs of one table in another table, there
>  * doesn't seem to be enough use-case to justify adding a lot of code
>  * for that.
>
> Queries like yours are kinda sorta counterexamples to that, but pretty
> much all the ones I've seen seem like crude hacks (and this one is not
> an exception).  Writing a bunch of code to support them feels like
> solving the wrong problem.  Admittedly, it's not clear to me what the
> right problem to solve instead would be.
>
> (It's possible that I'm overestimating the amount of new code that would
> be needed to implement this, however.  indxpath.c is pretty huge, but
> that's mostly because there are so many cases to consider.  There'd only
> be one interesting case for an inner TID scan.  Also, this comment is
> ancient, predating the current approach with parameterized paths ---
> in fact best_inner_indexscan doesn't exist as such anymore.  So maybe
> that old judgment that it'd take a lot of added code is wrong.)
>
> regards, tom lane
>


Re: Why Postgres doesn't use TID scan?

2018-12-17 Thread Tom Lane
Vladimir Ryabtsev  writes:
> I see some recommendations in Internet to do like this (e.g.
> https://stackoverflow.com/questions/5170546/how-do-i-delete-a-fixed-number-of-rows-with-sorting-in-postgresql
> ).
> Did it really work in 2011?

No, or at least not any better than today.  (For context, "git blame"
says I wrote the comment I just quoted to you in 2005.  The feature it
says isn't there wasn't there before that, either.)

> Anyway I think the problem is pretty clear: I want to eventually clear the
> table based on the predicate but I don't want to lock it for a long time.

DELETE doesn't lock the whole table.  What problem are you actually
facing?

regards, tom lane



Re: Why Postgres doesn't use TID scan?

2018-12-17 Thread Vladimir Ryabtsev
OK, good to know.
I saw some timeout errors in the code writing to the log table during my
DELETE and decided they are relevant. Probably they had nothing to do with
my actions, need to investigate.
Thanks anyway.

Best regards,
Vlad

пн, 17 дек. 2018 г. в 18:32, Tom Lane :

>
> DELETE doesn't lock the whole table.  What problem are you actually
> facing?
>
>


Re: pgbench results arent accurate

2018-12-17 Thread Mark Kirkwood

Hi, I can see two issues making you get variable results:

1/ Number of clients > scale factor

Using -c16 and -s 6 means you are largely benchmarking lock contention 
for a row in the branches table (it has 6 rows in your case). So 
randomness in *which* rows each client tries to lock will make for 
unwanted variation.



2/ Short run times

That 1st run is 5s duration. This will be massively influenced by the 
above point about randomness for locking a branches row.



I'd recommend:

- always run at least -T600

- use -s of at least 1.5x your largest -c setting (I usually use -s 100 
for testing 1-32 clients).


regards

Mark

On 17/12/18 12:58 AM, Mariel Cherkassky wrote:
As Greg suggested, update you all that each vm has its own dedicated 
esx. Every esx has it`s own local disks.
I run it one time on two different servers that has the same hardware 
and same postgresql db (version and conf). The results :

pgbench -i -s 6  pgbench -p 5432 -U postgres
 pgbench -c 16 -j 4 -T 5 -U postgres pgbench
MACHINE 1
starting vacuum...end.
transaction type: 
scaling factor: 6
query mode: simple
number of clients: 16
number of threads: 4
duration: 5 s
number of transactions actually processed: 669
latency average = 122.633 ms
tps = 130.470828 (including connections establishing)
tps = 130.620286 (excluding connections establishing)

MACHINE 2

pgbench -c 16 -j 4 -T 600 -U postgres -p 5433 pgbench
starting vacuum...end.
transaction type: 
scaling factor: 6
query mode: simple
number of clients: 16
number of threads: 4
duration: 600 s
number of transactions actually processed: 2393723
latency average = 4.011 ms
tps = 3989.437514 (including connections establishing)
tps = 3989.473036 (excluding connections establishing)

any idea what can cause such a difference ? Both of the machines have 
20core and 65GB of ram.


‫בתאריך יום ה׳, 13 בדצמ׳ 2018 ב-15:54 מאת ‪Mariel Cherkassky‬‏ 
<‪[email protected] ‬‏>:‬


Ok, I'll do that.  Thanks .

‫בתאריך יום ה׳, 13 בדצמ׳ 2018 ב-15:54 מאת ‪Greg Clough‬‏
<‪[email protected] ‬‏>:‬

Hmmm... sounds like you’ve got most of it covered.  It may be
a good idea to send that last message back to the list, as
maybe others will have better ideas.

Greg.

*From:* Mariel Cherkassky mailto:[email protected]>>
*Sent:* Thursday, December 13, 2018 1:45 PM
*To:* Greg Clough mailto:[email protected]>>
*Subject:* Re: pgbench results arent accurate

Both of the machines are the only vms in a dedicated esx for
each one. Each esx has local disks.

On Thu, Dec 13, 2018, 3:05 PM Greg Clough
mailto:[email protected]>
wrote:

> I installed a new postgres 9.6 on both of my machines.

Where is your storage?  Is it local, or on a SAN?  A SAN
will definitely have a cache, so possibly there is another
layer of cache that you’re not accounting for.

Greg Clough.





This e-mail, including accompanying communications and
attachments, is strictly confidential and only for the
intended recipient. Any retention, use or disclosure not
expressly authorised by IHSMarkit is prohibited. This
email is subject to all waivers and other terms at the
following link:
https://ihsmarkit.com/Legal/EmailDisclaimer.html

Please visit www.ihsmarkit.com/about/contact-us.html
 for
contact information on our offices worldwide.




This e-mail, including accompanying communications and
attachments, is strictly confidential and only for the
intended recipient. Any retention, use or disclosure not
expressly authorised by IHSMarkit is prohibited. This email is
subject to all waivers and other terms at the following link:
https://ihsmarkit.com/Legal/EmailDisclaimer.html

Please visit www.ihsmarkit.com/about/contact-us.html
 for contact
information on our offices worldwide.