Re: Postgresql Sort cost Poor performance?
1、DISTINCT response time is fast without being added
qis3_dp2=# SELECT COUNT(*) AS CHECKCARNUM FROM QIS_CARPASSEDSTATION A WHERE
1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD');
checkcarnum
-
2142215
(1 row)
*Time: 2237.970 ms (00:02.238)*
qis3_dp2=#
2、 Adding a DISTINCT response time was very slow
qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD');
checkcarnum
-
1071367
(1 row)
*Time: 38979.246 ms (00:38.979)*
--
Sent from:
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
Re: Postgresql Sort cost Poor performance?
On Tue, 2 Apr 2019 at 20:00, tank.zhang <[email protected]> wrote: > 2、 Adding a DISTINCT response time was very slow > > qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM > QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO > IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= > TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <= > TO_DATE('2019-03-11','-MM-DD'); > checkcarnum > - > 1071367 > (1 row) That's because of how DISTINCT is implemented within an aggregate function in PostgreSQL. Internally within the aggregate code in the executor, a sort is performed on the entire input to the aggregate node. The planner is currently unable to make use of any indexes that provide pre-sorted input. One way to work around this would be to perform the DISTINCT and COUNT(*) in separate stages using a subquery. >From your original query, something like: SELECT COUNT(SVIN) AS CHECKCARNUM,SMTOC FROM ( SELECT SMTOC,SVIN FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','-MM-DD') GROUP BY SMTOC,SVIN ) A GROUP BY SMTOC; An index something like: CREATE INDEX ON QIS_CARPASSEDSTATION (SMTOC, SVIN, SSTATIONCD, DWORKDATE); Should help speed up the subquery and provide pre-sorted input to the outer aggregate. If you like, you could add SLINENO to the end of the index to allow an index-only scan which may result in further performance improvements. Without the index, you're forced to sort, but at least it's just one sort instead of two. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgresql Sort cost Poor performance?
Thank you replay!
I tried to use the TMP table is very fast .thank you
qis3_dp2=# explain analyze SELECT COUNT(*),SMTOC FROM ( SELECT
DISTINCT(SVIN) AS CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1
AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD')) AS TEMP group by SMTOC;
QUERY PLAN
--
-
HashAggregate (cost=691386.41..691388.41 rows=200 width=30) (actual
time=4090.951..4091.027 rows=410 loops=1)
Group Key: a.smtoc
-> HashAggregate (cost=666561.44..676491.43 rows=992999 width=40)
(actual time=3481.712..3794.213 rows=1071367 loops=1)
Group Key: a.svin, a.smtoc
-> Gather (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.657..1722.814 rows=2142215 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..445848.83 rows=523125 width=40) (actual time=65.187..2287.739
rows=428443 loops=5)
Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, '-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, '-MM-DD'::text)))
Rows Removed by Filter: 1862173
Planning Time: 0.513 ms
Execution Time: 4147.542 ms
(12 rows)
Time: 4148.852 ms (00:04.149)
qis3_dp2=#
qis3_dp2=# SELECT COUNT(*),SMTOC FROM ( SELECT DISTINCT(SVIN) AS
CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD =
'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD')) AS TEMP group by SMTOC;
**Time: 3223.935 ms (00:03.224)**
2、 Before
qis3_dp2=# explain analyze SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC
FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND
A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','-MM-DD') group by SMTOC;
QUERY PLAN
--
-
GroupAggregate (cost=875778.02..891475.55 rows=377 width=30) (actual
time=6400.991..33314.132 rows=410 loops=1)
Group Key: smtoc
-> Sort (cost=875778.02..881009.28 rows=2092501 width=40) (actual
time=6399.993..6626.151 rows=2142215 loops=1)
Sort Key: smtoc
Sort Method: quicksort Memory: 265665kB
-> Gather (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.557..2467.778 rows=2142215 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..445848.83 rows=523125 width=40) (actual time=66.908..2428.397
rows=428443 loops=5)
Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, '-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, '-MM-DD'::text)))
Rows Removed by Filter: 1862173
Planning Time: 0.457 ms
Execution Time: 5.429 ms
(13 rows)
*
Time: 6.720 ms (00:33.337)*
qis3_dp2=#
--
Sent from:
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
[no subject]
Hey, I wanted to a few questions regarding the parallel parameters : max_worker_processes and max_parallel_workers_per_gather. 1)Basically, max_worker_processes should be set to the number of cpus I have in the machine ? 2)If I set max_worker_processes to X and max_parallel_workers_per_gather to Y (X>Y) it means that I will have at max (X/2) queries that can run in parallel. Am I right ? For example, max_worker_processes =8,max_parallel_workers_per_gather =4, it means that at max I can have 4 queries that are running in parallel ? and at min 2 queries (or none) can run in parallel ? 3)So If I calculate my work_mem based on the number of sessions I have : (TOTAL_MEM/2/NUM_OF_CONNECTIONS) I should add 8 to the NUM_OF_CONNECTIONS to have a new value for the work_mem in order to consider queries that run in parallel.. Thanks.
Re: parallel query
Mariel Cherkassky wrote: > I wanted to a few questions regarding the parallel parameters : > max_worker_processes and max_parallel_workers_per_gather. > > 1)Basically, max_worker_processes should be set to the number of cpus I have > in the machine ? Setting it higher would not be smart. Setting it lower can also be a good idea; it depends on your workload. > 2)If I set max_worker_processes to X and max_parallel_workers_per_gather to Y > (X>Y) > it means that I will have at max (X/2) queries that can run in parallel. Am I > right ? > For example, max_worker_processes =8,max_parallel_workers_per_gather =4, it > means > that at max I can have 4 queries that are running in parallel ? and at min 2 > queries > (or none) can run in parallel ? That is correct, but unless you set "max_parallel_workers_per_gather" to 1, one query can use more than one parallel worker, and then you can have fewer concurrent queries. It also depends on the size of the table or index how many workers PostgreSQL will use. > 3)So If I calculate my work_mem based on the number of sessions I have : > (TOTAL_MEM/2/NUM_OF_CONNECTIONS) > I should add 8 to the NUM_OF_CONNECTIONS to have a new value for the work_mem > in order to consider queries that run in parallel.. Yes, but don't forget that one query can use "work_mem" several times if the execution plan has several memory intensive nodes. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
