Increasing work_mem slows down query, why?

2020-03-29 Thread Silvio Moioli
Dear list,

here is a pretty contrived case where increasing work_mem produces a worse 
plan, with much worse overall query time. I wonder why that is the case.


Problem: INSERTing a thousand new rows in a table which can easily have one 
million rows. PK is "id", which comes from a table, and we have two columns 
(called "name" and "version") which do not admit duplicates.

Schema here: 
https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/common/tables/rhnPackageCapability.sql
Indices here: 
https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/postgres/tables/rhnPackageCapability_index.sql


We want one command that returns IDs given (name, version) couples. If they are 
already in the table, they should be SELECTed, if they are not, they should be 
INSERTed.

Version is NULLable and NULL should be treated as a value.

We use:

WITH wanted_capability(ordering, name, version) AS (
  VALUES (1, 'first_name', '1.0.0'), (2, 'first_name', '1.0.1'), (1, 
'second_name', '1.0.0'), ...998 more...
)
missing_capability AS (
  SELECT wanted_capability.*
FROM wanted_capability LEFT JOIN rhnPackageCapability
  ON wanted_capability.name = rhnPackageCapability.name
AND wanted_capability.version IS NOT DISTINCT FROM 
rhnPackageCapability.version
WHERE rhnPackageCapability.id IS NULL
),
inserted_capability AS (
  INSERT INTO rhnPackageCapability(id, name, version)
SELECT nextval('rhn_pkg_capability_id_seq'), name, version FROM 
missing_capability ON CONFLICT DO NOTHING
RETURNING id, name, version
)
SELECT wanted_capability.ordering, inserted_capability.id
  FROM wanted_capability JOIN inserted_capability
ON wanted_capability.name = inserted_capability.name
  AND wanted_capability.version IS NOT DISTINCT FROM 
inserted_capability.version
UNION (
  SELECT wanted_capability.ordering, rhnPackageCapability.id
FROM wanted_capability JOIN rhnPackageCapability
  ON wanted_capability.name = rhnPackageCapability.name
AND wanted_capability.version IS NOT DISTINCT FROM 
rhnPackageCapability.version
)
  ORDER BY ordering
;


Behavior at work_mem = 5 MB is pretty good, query finishes in 200ms. Plan: 
https://explain.dalibo.com/plan/4u

Behavior at work_mem = 80 MB seems not equally good, query takes more than 13s. 
Two expensive SORTs and MERGE JOINs are done instead of HASH JOINs. Plan: 
thttps://explain.dalibo.com/plan/ORd

Adding one more INDEX on rhnCapability.name fixes the issue.

My question is: why are SORTs chosen if more work_mem is available, and why 
can't the planner predict query will be slower that way?

All of the above is reproducible on openSUSE Leap and PostgreSQL 10.12.

Ideas welcome, and thanks in advance!

Regards,
--
Silvio Moioli
SUSE Manager Development Team




Re: Increasing work_mem slows down query, why?

2020-03-29 Thread Pavel Stehule
po 30. 3. 2020 v 8:47 odesílatel Silvio Moioli  napsal:

> Dear list,
>
> here is a pretty contrived case where increasing work_mem produces a worse
> plan, with much worse overall query time. I wonder why that is the case.
>
>
> Problem: INSERTing a thousand new rows in a table which can easily have
> one million rows. PK is "id", which comes from a table, and we have two
> columns (called "name" and "version") which do not admit duplicates.
>
> Schema here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/common/tables/rhnPackageCapability.sql
> Indices here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/postgres/tables/rhnPackageCapability_index.sql
>
>
> We want one command that returns IDs given (name, version) couples. If
> they are already in the table, they should be SELECTed, if they are not,
> they should be INSERTed.
>
> Version is NULLable and NULL should be treated as a value.
>
> We use:
>
> WITH wanted_capability(ordering, name, version) AS (
>   VALUES (1, 'first_name', '1.0.0'), (2, 'first_name', '1.0.1'), (1,
> 'second_name', '1.0.0'), ...998 more...
> )
> missing_capability AS (
>   SELECT wanted_capability.*
> FROM wanted_capability LEFT JOIN rhnPackageCapability
>   ON wanted_capability.name = rhnPackageCapability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
> WHERE rhnPackageCapability.id IS NULL
> ),
> inserted_capability AS (
>   INSERT INTO rhnPackageCapability(id, name, version)
> SELECT nextval('rhn_pkg_capability_id_seq'), name, version FROM
> missing_capability ON CONFLICT DO NOTHING
> RETURNING id, name, version
> )
> SELECT wanted_capability.ordering, inserted_capability.id
>   FROM wanted_capability JOIN inserted_capability
> ON wanted_capability.name = inserted_capability.name
>   AND wanted_capability.version IS NOT DISTINCT FROM
> inserted_capability.version
> UNION (
>   SELECT wanted_capability.ordering, rhnPackageCapability.id
> FROM wanted_capability JOIN rhnPackageCapability
>   ON wanted_capability.name = rhnPackageCapability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
> )
>   ORDER BY ordering
> ;
>
>
> Behavior at work_mem = 5 MB is pretty good, query finishes in 200ms. Plan:
> https://explain.dalibo.com/plan/4u
>
> Behavior at work_mem = 80 MB seems not equally good, query takes more than
> 13s. Two expensive SORTs and MERGE JOINs are done instead of HASH JOINs.
> Plan: thttps://explain.dalibo.com/plan/ORd


please, can you send explain in text form?

Probably, there is a problem in wrong estimation. What can be expected
because CTE is optimization fence in this version


Regards

Pavel



>
> Adding one more INDEX on rhnCapability.name fixes the issue.
>
> My question is: why are SORTs chosen if more work_mem is available, and
> why can't the planner predict query will be slower that way?
>
> All of the above is reproducible on openSUSE Leap and PostgreSQL 10.12.
>
> Ideas welcome, and thanks in advance!
>
> Regards,
> --
> Silvio Moioli
> SUSE Manager Development Team
>
>
>