Re: Postgresql Sort cost Poor performance?

2019-04-02 Thread tank.zhang
1、DISTINCT response time is fast without being added
 
qis3_dp2=# SELECT COUNT(*) AS CHECKCARNUM FROM QIS_CARPASSEDSTATION A WHERE
1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD');
 checkcarnum 
-
 2142215
(1 row)

*Time: 2237.970 ms (00:02.238)*
qis3_dp2=# 

2、 Adding a DISTINCT response time was very slow

qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD');
 checkcarnum 
-
 1071367
(1 row)

*Time: 38979.246 ms (00:38.979)*





--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html




Re: Postgresql Sort cost Poor performance?

2019-04-02 Thread David Rowley
On Tue, 2 Apr 2019 at 20:00, tank.zhang <[email protected]> wrote:
> 2、 Adding a DISTINCT response time was very slow
>
> qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
> QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
> IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
> TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
> TO_DATE('2019-03-11','-MM-DD');
>  checkcarnum
> -
>  1071367
> (1 row)

That's because of how DISTINCT is implemented within an aggregate
function in PostgreSQL. Internally within the aggregate code in the
executor, a sort is performed on the entire input to the aggregate
node.  The planner is currently unable to make use of any indexes that
provide pre-sorted input.

One way to work around this would be to perform the DISTINCT and
COUNT(*) in separate stages using a subquery.

>From your original query, something like:

SELECT COUNT(SVIN) AS CHECKCARNUM,SMTOC
FROM (
SELECT SMTOC,SVIN
FROM QIS_CARPASSEDSTATION A
WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG'
AND A.SLINENO IN ( '1F' , '2F' , '3F' )
AND A.DWORKDATE >= TO_DATE('2017-02-11','-MM-DD')
AND A.DWORKDATE <= TO_DATE('2019-03-11','-MM-DD')
GROUP BY  SMTOC,SVIN
) A GROUP BY SMTOC;

An index something like:
CREATE INDEX ON QIS_CARPASSEDSTATION (SMTOC, SVIN, SSTATIONCD, DWORKDATE);

Should help speed up the subquery and provide pre-sorted input to the
outer aggregate.  If you like, you could add SLINENO to the end of the
index to allow an index-only scan which may result in further
performance improvements.

Without the index, you're forced to sort, but at least it's just one
sort instead of two.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Postgresql Sort cost Poor performance?

2019-04-02 Thread tank.zhang
Thank you  replay!

I tried to use the TMP table is very fast .thank you


qis3_dp2=# explain analyze SELECT COUNT(*),SMTOC FROM ( SELECT
DISTINCT(SVIN) AS CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1
AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD')) AS TEMP  group by SMTOC;

 
QUERY PLAN  

 
--
-
 HashAggregate  (cost=691386.41..691388.41 rows=200 width=30) (actual
time=4090.951..4091.027 rows=410 loops=1)
   Group Key: a.smtoc
   ->  HashAggregate  (cost=666561.44..676491.43 rows=992999 width=40)
(actual time=3481.712..3794.213 rows=1071367 loops=1)
 Group Key: a.svin, a.smtoc
 ->  Gather  (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.657..1722.814 rows=2142215 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on qis_carpassedstation a 
(cost=0.00..445848.83 rows=523125 width=40) (actual time=65.187..2287.739
rows=428443 loops=5)
 Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, '-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, '-MM-DD'::text)))
 Rows Removed by Filter: 1862173
 Planning Time: 0.513 ms
 Execution Time: 4147.542 ms
(12 rows)

Time: 4148.852 ms (00:04.149)
qis3_dp2=# 


qis3_dp2=# SELECT COUNT(*),SMTOC FROM ( SELECT DISTINCT(SVIN) AS
CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD =
'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD')) AS TEMP  group by SMTOC;

**Time: 3223.935 ms (00:03.224)**


2、 Before 

qis3_dp2=# explain analyze SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC
FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND
A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD') group by  SMTOC;

 
QUERY PLAN  

 
--
-
 GroupAggregate  (cost=875778.02..891475.55 rows=377 width=30) (actual
time=6400.991..33314.132 rows=410 loops=1)
   Group Key: smtoc
   ->  Sort  (cost=875778.02..881009.28 rows=2092501 width=40) (actual
time=6399.993..6626.151 rows=2142215 loops=1)
 Sort Key: smtoc
 Sort Method: quicksort  Memory: 265665kB
 ->  Gather  (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.557..2467.778 rows=2142215 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on qis_carpassedstation a 
(cost=0.00..445848.83 rows=523125 width=40) (actual time=66.908..2428.397
rows=428443 loops=5)
 Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, '-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, '-MM-DD'::text)))
 Rows Removed by Filter: 1862173
 Planning Time: 0.457 ms
 Execution Time: 5.429 ms
(13 rows)
*
Time: 6.720 ms (00:33.337)*
qis3_dp2=# 











--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html




[no subject]

2019-04-02 Thread Mariel Cherkassky
Hey,
I wanted to a few questions regarding the parallel parameters :
max_worker_processes and max_parallel_workers_per_gather.

1)Basically, max_worker_processes should be set to the number of cpus I
have in the machine ?
2)If I set max_worker_processes to X and max_parallel_workers_per_gather to
Y (X>Y) it means that I will have at max (X/2) queries that can run in
parallel. Am I right ? For example, max_worker_processes
=8,max_parallel_workers_per_gather =4, it means that at max I can have 4
queries that are running in parallel ? and at min 2 queries (or none) can
run in parallel ?
3)So If I calculate my work_mem based on the number of sessions I have :
(TOTAL_MEM/2/NUM_OF_CONNECTIONS)
I should add 8 to the NUM_OF_CONNECTIONS to have a new value for the
work_mem in order to consider queries that run in parallel..

Thanks.


Re: parallel query

2019-04-02 Thread Laurenz Albe
Mariel Cherkassky wrote:
> I wanted to a few questions regarding the parallel parameters :  
> max_worker_processes and max_parallel_workers_per_gather.
> 
> 1)Basically, max_worker_processes should be set to the number of cpus I have 
> in the machine ?

Setting it higher would not be smart.
Setting it lower can also be a good idea; it depends
on your workload.

> 2)If I set max_worker_processes to X and max_parallel_workers_per_gather to Y 
> (X>Y)
> it means that I will have at max (X/2) queries that can run in parallel. Am I 
> right ?
> For example, max_worker_processes =8,max_parallel_workers_per_gather =4, it 
> means
> that at max I can have 4 queries that are running in parallel ? and at min 2 
> queries
> (or none) can run in parallel ?

That is correct, but unless you set "max_parallel_workers_per_gather" to 1, one
query can use more than one parallel worker, and then you can have fewer
concurrent queries.

It also depends on the size of the table or index how many workers PostgreSQL 
will use.

> 3)So If I calculate my work_mem based on the number of sessions I have : 
> (TOTAL_MEM/2/NUM_OF_CONNECTIONS)
> I should add 8 to the NUM_OF_CONNECTIONS to have a new value for the work_mem 
> in order to consider queries that run in parallel..

Yes, but don't forget that one query can use "work_mem" several times if the
execution plan has several memory intensive nodes.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com