Re: Why Postgres doesn't use TID scan?

2018-12-18 Thread Alvaro Herrera
On 2018-Dec-17, Tom Lane wrote:

> Queries like yours are kinda sorta counterexamples to that, but pretty
> much all the ones I've seen seem like crude hacks (and this one is not
> an exception).  Writing a bunch of code to support them feels like
> solving the wrong problem.  Admittedly, it's not clear to me what the
> right problem to solve instead would be.

Yeah, over the years I've confronted several times with situations where
a deletion by ctid (and sometimes updates, IIRC) was the most convenient
way out of.  It's not the kind of thing that you'd do with any
frequency, just one-offs.  It's always been a bit embarrasing that this
doesn't "work properly".  There's always been some way around it, much
slower and less convenient ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Increasing parallelism of queries while using file fdw and partitions

2018-12-18 Thread Patrick Mulrooney
Wondering if anyone had any thoughts on how to tweak my setup to get it 
to read many files at once instead of one at a time when using file fdw 
and partitions. We have a bunch of data tied up in files (each file > 4M 
rows, 5,000+ files per year) that I would like to be able to query 
directly using FDW. The files are genomic VCF format and I find that 
vcf_fdw ( https://github.com/ergo70/vcf_fdw ) works really well to read 
the data. We only want to be able to search the data as quickly as 
possible, no updates / deletes / ...


I gave an example below of the basic setup and the output of explain 
analyze. I get the same performance if I setup the table such that the 
thousands of files end up in one non-partitioned table or setup each 
file as it's own partition of the table.


I have tried increasing ( / decreasing ) the worker threads and workers, 
but don't see any change in the number of files open at any given time. 
I tried reducing the cost of parallel queries to force them to run, but 
can't get them to kick in.


Any ideas or anything I can try?

Thanks!

Pat

PostgreSQL:  PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
Multicorn: 1.3.5
VCF_FDW ( https://github.com/ergo70/vcf_fdw ) : 1.0.0


CREATE DATABASE variants;

CREATE EXTENSION multicorn;

CREATE SERVER multicorn_vcf FOREIGN DATA WRAPPER multicorn OPTIONS (wrapper 
'vcf_fdw.VCFForeignDataWrapper');

CREATE SCHEMA vcf;

CREATE TABLE vcf.variants ( ..., species text, ... ) PARTITION BY LIST ( 
species );

CREATE FOREIGN TABLE vcf.human ( ... ) SERVER multicorn_vcf OPTIONS (basedir 
'/path', species 'human', suffix '.vcf.gz');
ALTER TABLE vcf.variants ATTACH PARTITION vcf.human FOR VALUES IN ( 'human' );

CREATE FOREIGN TABLE vcf.dog ( ... ) SERVER multicorn_vcf OPTIONS (basedir 
'/path', species 'dog', suffix '.vcf.gz');
ALTER TABLE vcf.variants ATTACH PARTITION vcf.dog FOR VALUES IN ( 'dog' );

CREATE FOREIGN TABLE vcf.cat ( ... ) SERVER multicorn_vcf OPTIONS (basedir 
'/path', species 'cat', suffix '.vcf.gz');
ALTER TABLE vcf.variants ATTACH PARTITION vcf.cat FOR VALUES IN ( 'cat' );

* My real data repeats this 1000+ more times

EXPLAIN ( ANALYZE, BUFFERS ) SELECT * FROM vcf.variants WHERE chrom = '1' AND 
pos = 10120 LIMIT 1000;

On my real data I get the following results:
--

 QUERY PLAN
--
 Limit  (cost=20.00..352020.00 rows=1000 width=347) (actual 
time=445.548..101709.307 rows=20 loops=1)
   ->  Append  (cost=20.00..355520.00 rows=1010 width=347) (actual 
time=445.547..101709.285 rows=20 loops=1)
 ->  Foreign Scan on dog  (cost=20.00..352.00 rows=1 width=352) 
(actual time=198.653..198.654 rows=0 loops=1)
   Filter: ((chrom = '1'::text) AND (pos = 10120))
 ->  Foreign Scan on cat  (cost=20.00..352.00 rows=1 width=352) 
(actual time=111.840..111.840 rows=0 loops=1)
   Filter: ((chrom = '1'::text) AND (pos = 10120))
 ->  Foreign Scan on human  (cost=20.00..352.00 rows=1 
width=352) (actual time=135.050..138.534 rows=1 loops=1)
   Filter: ((chrom = '1'::text) AND (pos = 10120))
 ... repeats many more times for each partition
 Planning time: 613.815 ms
 Execution time: 101873.880 ms
(2024 rows)




SQL Perfomance during autovacuum

2018-12-18 Thread anand086
Hi All,

I am looking into a performance issue and needed your input and thoughts.

We have table (non-partitioned) of 500Gb with 11 indexes 

+--+---+--+-+--+-++++

| row_estimate |  total_bytes  | index_bytes  | toast_bytes | table_bytes  | 
total  | index  |   toast| table  |

+--+---+--+-+--+-++++

|  1.28611e+09 | 1400081645568 | 858281418752 |8192 | 541800218624 |
1304 GB | 799 GB | 8192 bytes | 505 GB |

+--+---+--+-+--+-++++


Application runs a simple sql ,

select distinct testtbl_.id as col_0_0_ from demo.test_table testtbl_ where
testtbl_.entity_id='10001' and testtbl_.last_updated>=to_date('22-10-2018',
'dd-MM-') and testtbl_.last_updated0 and testtbl_.src_name='distribute_item'
and (testtbl_.item not like 'SHIP%') order by testtbl_.id limit 1;

The Execution time for the above sql is  17841.467 ms during normal
operations but when autovacuum runs on table test_table, the same sql took
1628495.850 ms (from the postgres log). 

We have noticed this increase in execution times for the sqls only when
autovacuum runs and it runs with prevent wraparound mode. I think during the
autovacuum process the Buffers: shared hit are increasing causing increase
in execution time.

I need help with the approach to debug this issue. Is this expected
behaviour wherein sql execution timing incease during the autovacuum? If so
, what is the reason for the same? 




--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: SQL Perfomance during autovacuum

2018-12-18 Thread David Rowley
On Wed, 19 Dec 2018 at 19:04, anand086  wrote:
> We have noticed this increase in execution times for the sqls only when
> autovacuum runs and it runs with prevent wraparound mode. I think during the
> autovacuum process the Buffers: shared hit are increasing causing increase
> in execution time.
>
> I need help with the approach to debug this issue. Is this expected
> behaviour wherein sql execution timing incease during the autovacuum? If so
> , what is the reason for the same?

This is unsurprising. There are various GUC settings designed to
throttle vacuum to help minimise this problem. The auto-vacuum process
is competing for the same resources as your query is, and is likely
loading many new buffers, therefore flushing buffers out of cache that
might be useful for your query.

Showing the output of:

select name,setting from pg_Settings where name like '%vacuum%';

may be of use here.

You'll particularly want to pay attention to the settings of
autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit and
vacuum_cost_limit.  The settings of vacuum_cost_page_dirty,
vacuum_cost_page_hit, vacuum_cost_page_miss matter too, but these are
less often changed by users.

You may be able to learn exactly what's going on with the query by doing:

set track_io_timing = on;
explain (analyze, buffers, timing) 

both during the auto-vacuum run, and at a time when it's not running.

If the query plans of each match, then pay attention to the number of
buffers read and how long they took to read. If you find that these
don't explain the variation then something else is at fault, perhaps
CPU contention, or perhaps swapping due to high memory usage.

It also seems pretty strange that you should need to use DISTINCT on a
column that's named "id".

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Increasing parallelism of queries while using file fdw and partitions

2018-12-18 Thread Justin Pryzby
On Tue, Dec 18, 2018 at 08:39:36PM -0800, Patrick Mulrooney wrote:
> Wondering if anyone had any thoughts on how to tweak my setup to get it to
> read many files at once instead of one at a time when using file fdw and
> partitions.

I found this:

https://www.postgresql.org/docs/current/parallel-safety.html
|The following operations are always parallel restricted.
|Scans of foreign tables, unless the foreign data wrapper has an 
IsForeignScanParallelSafe API which indicates otherwise.

https://github.com/ergo70/vcf_fdw/blob/master/vcf_fdw/__init__.py
=> has no such API marker, since it's couple years old, same as multicorn.

Justin



Re: Increasing parallelism of queries while using file fdw and partitions

2018-12-18 Thread Patrick Mulrooney
Justin,

Thanks for the idea. I pulled down the source for multicorn and added that to 
it. I do not see parallel queries in the analyze output (unless I force it and 
then it only gets one worker), but it does look like it is reading more than 
one file at once if I go with a non-partitioned table that looks at all the 
files. Not any better if I have the table split up into partitions. 

So it’s better, but still curious if this would work with partitions. 

Thanks again. 

Pat

> On Dec 18, 2018, at 22:51, Justin Pryzby  wrote:
> 
>> On Tue, Dec 18, 2018 at 08:39:36PM -0800, Patrick Mulrooney wrote:
>> Wondering if anyone had any thoughts on how to tweak my setup to get it to
>> read many files at once instead of one at a time when using file fdw and
>> partitions.
> 
> I found this:
> 
> https://www.postgresql.org/docs/current/parallel-safety.html
> |The following operations are always parallel restricted.
> |Scans of foreign tables, unless the foreign data wrapper has an 
> IsForeignScanParallelSafe API which indicates otherwise.
> 
> https://github.com/ergo70/vcf_fdw/blob/master/vcf_fdw/__init__.py
> => has no such API marker, since it's couple years old, same as multicorn.
> 
> Justin