PostgreSQL optimizer use seq scan instead of pkey index only scan (in queries with postgres_fdw)

2019-05-06 Thread Vitaly Baranovsky
Hello all,

I faced strange behavior of PostgreSQL during the query execution.

So, I have to databases: local and foreign. There are foreign server
definitions in the local database (via postgres_fdw). The local database
has table 'local_table'. The foreign database has table 'foreign_table'.
Both of them have only 1 column: 'primary_uuid'. This column in both
databases is a primary key column. Schema on a local server that stores
remote server definitions is 'foreign_server'. Each table has 100K rows.
Vacuum analyze has been run for both servers.

When I run a query:
SELECT *
FROM
(
SELECT foreign_table.primary_uuid
   FROM foreign_server.foreign_table
UNION ALL
 SELECT local_table.primary_uuid
   FROM local_table
)
join_view
WHERE
 join_view.primary_uuid in (select
'19b2db7e-db89-48eb-90b1-0bd468a2346b'::uuid)

I expect that the server will use the pkey index for the local table. But
it uses seq scan instead!

"Hash Semi Join  (cost=100.03..3346.23 rows=51024 width=16) (actual
time=482.235..482.235 rows=0 loops=1)"
"  Output: foreign_table.primary_uuid"
"  Hash Cond: (foreign_table.primary_uuid =
('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid))"
"  ->  Append  (cost=100.00..2510.68 rows=102048 width=16) (actual
time=0.529..463.563 rows=20 loops=1)"
"->  Foreign Scan on foreign_server.foreign_table
(cost=100.00..171.44 rows=2048 width=16) (actual time=0.528..446.715
rows=10 loops=1)"
"  Output: foreign_table.primary_uuid"
"  Remote SQL: SELECT primary_uuid FROM public.foreign_table"
"->  Seq Scan on public.local_table  (cost=0.00..1829.00
rows=10 width=16) (actual time=0.021..6.358 rows=10 loops=1)"
"  Output: local_table.primary_uuid"
"  ->  Hash  (cost=0.02..0.02 rows=1 width=16) (actual time=0.006..0.006
rows=1 loops=1)"
"Output: ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid)"
"Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"->  Result  (cost=0.00..0.01 rows=1 width=16) (actual
time=0.001..0.001 rows=1 loops=1)"
"  Output: 'ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid"
"Planning Time: 0.126 ms"
"Execution Time: 482.572 ms""Execution Time: 509.315 ms"

So, as you can see, the execution time is 509 ms! It could be very fast if
PostgreSQL used primary key index!

Also, please, note, that SQL without WHERE clause has been set to the
foreign server:
"  Remote SQL: SELECT primary_uuid FROM public.foreign_table"

So, the optimizer doesn't select optimal plans for such executions :(

Looks like it's an optimizer inadequacy.

Does someone know, how to optimize this query without query rewriting
(queries like this are generated from the Data Access layer and it's hard
to rebuild that layer)?

Thank you

P.S.: Answers to standard questions:
> PostgreSQL version number you are running:
PostgreSQL 11.2, compiled by Visual C++ build 1914, 64-bit

> How you installed PostgreSQL:
By downloaded standard Windows 64 installer

> Changes made to the settings in the postgresql.conf file:
shared_preload_libraries = '$libdir/pg_stat_statements'

> Operating system and version:
Windows 10 Enterprise 64-bit

> What program you're using to connect to PostgreSQL:
pgAdmin III

> Is there anything relevant or unusual in the PostgreSQL server logs?:
Nope


P.P.S.: DDL scripts:
for the foreign database:
CREATE TABLE public.foreign_table
(
  primary_uuid uuid NOT NULL,
  CONSTRAINT "PKEY" PRIMARY KEY (primary_uuid)
)

for local database:
CREATE TABLE public.local_table
(
  primary_uuid uuid NOT NULL,
  CONSTRAINT local_table_pkey PRIMARY KEY (primary_uuid)
)

ᐧ


Re: PostgreSQL optimizer use seq scan instead of pkey index only scan (in queries with postgres_fdw)

2019-05-06 Thread Jeff Janes
On Mon, May 6, 2019 at 10:44 AM Vitaly Baranovsky 
wrote:

> Hello all,
>
> I faced strange behavior of PostgreSQL during the query execution.
>

 ...


> Also, please, note, that SQL without WHERE clause has been set to the
> foreign server:
> "  Remote SQL: SELECT primary_uuid FROM public.foreign_table"
>
> So, the optimizer doesn't select optimal plans for such executions :(
>

It works the way you want in version 12, which is currently under
development and should be released in 5 months or so.

Cheers,

Jeff

>


Re: PostgreSQL optimizer use seq scan instead of pkey index only scan (in queries with postgres_fdw)

2019-05-06 Thread Vitaly Baranovsky
Thank you, Jeff!

We'll be looking forward to the next version of Postgres in this case.

As far as I understand, you've answered about sending filtering condition
to a foreign server... Could you, please, clarify about another (the first)
part of my question? Why the server choose seq scan instead of pk key index
only scan for the local table?

Thank you
ᐧ

On Mon, May 6, 2019 at 6:32 PM Jeff Janes  wrote:

> On Mon, May 6, 2019 at 10:44 AM Vitaly Baranovsky <
> [email protected]> wrote:
>
>> Hello all,
>>
>> I faced strange behavior of PostgreSQL during the query execution.
>>
>
>  ...
>
>
>> Also, please, note, that SQL without WHERE clause has been set to the
>> foreign server:
>> "  Remote SQL: SELECT primary_uuid FROM public.foreign_table"
>>
>> So, the optimizer doesn't select optimal plans for such executions :(
>>
>
> It works the way you want in version 12, which is currently under
> development and should be released in 5 months or so.
>
> Cheers,
>
> Jeff
>
>>


Re: PostgreSQL optimizer use seq scan instead of pkey index only scan (in queries with postgres_fdw)

2019-05-06 Thread Tom Lane
Jeff Janes  writes:
> It works the way you want in version 12, which is currently under
> development and should be released in 5 months or so.

Even in older versions, the OP would get a significantly smarter
plan after setting use_remote_estimate = on.  I think the core
issue here is that we won't generate remote parameterized paths
without that:

/*
 * If we're not using remote estimates, stop here.  We have no way to
 * estimate whether any join clauses would be worth sending across, so
 * don't bother building parameterized paths.
 */
if (!fpinfo->use_remote_estimate)
return;

regards, tom lane




Re: PostgreSQL optimizer use seq scan instead of pkey index only scan (in queries with postgres_fdw)

2019-05-06 Thread Jeff Janes
On Mon, May 6, 2019 at 11:38 AM Vitaly Baranovsky 
wrote:

> Thank you, Jeff!
>
> We'll be looking forward to the next version of Postgres in this case.
>
> As far as I understand, you've answered about sending filtering condition
> to a foreign server... Could you, please, clarify about another (the first)
> part of my question? Why the server choose seq scan instead of pk key index
> only scan for the local table?
>
> Thank you
>
>
Aren't those the same thing?  The foreign server can't use the where
clause, if it doesn't get sent.

Cheers,

Jeff


Re: PostgreSQL optimizer use seq scan instead of pkey index only scan (in queries with postgres_fdw)

2019-05-06 Thread Vitaly Baranovsky
Ough, I believed I had use_remote_estimate = true in my database, but it
was false :(

With use_remote_estimate = true everything works well!

Here is explain analyze with use_remote_estimate = true:
"Nested Loop  (cost=100.45..108.97 rows=10 width=16) (actual
time=1.037..1.037 rows=0 loops=1)"
"  Output: foreign_table.primary_uuid"
"  ->  HashAggregate  (cost=0.02..0.03 rows=1 width=16) (actual
time=0.004..0.004 rows=1 loops=1)"
"Output: ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid)"
"Group Key: 'ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid"
"->  Result  (cost=0.00..0.01 rows=1 width=16) (actual
time=0.001..0.001 rows=1 loops=1)"
"  Output: 'ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid"
"  ->  Append  (cost=100.43..108.92 rows=2 width=16) (actual
time=1.032..1.032 rows=0 loops=1)"
"->  Foreign Scan on foreign_server.foreign_table
(cost=100.43..104.47 rows=1 width=16) (actual time=0.994..0.994 rows=0
loops=1)"
"  Output: foreign_table.primary_uuid"
"  Remote SQL: SELECT primary_uuid FROM public.foreign_table
WHERE (($1::uuid = primary_uuid))"
"->  Index Only Scan using local_table_pkey on public.local_table
(cost=0.42..4.44 rows=1 width=16) (actual time=0.035..0.035 rows=0 loops=1)"
"  Output: local_table.primary_uuid"
"  Index Cond: (local_table.primary_uuid =
('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid))"
"  Heap Fetches: 0"
"Planning Time: 100.619 ms"
"Execution Time: 1.243 ms"

I tried this with use_remote_estimate = true for different real queries
with a lot of joins and everything works well!
ᐧ

On Mon, May 6, 2019 at 6:53 PM Tom Lane  wrote:

> Jeff Janes  writes:
> > It works the way you want in version 12, which is currently under
> > development and should be released in 5 months or so.
>
> Even in older versions, the OP would get a significantly smarter
> plan after setting use_remote_estimate = on.  I think the core
> issue here is that we won't generate remote parameterized paths
> without that:
>
> /*
>  * If we're not using remote estimates, stop here.  We have no way
> to
>  * estimate whether any join clauses would be worth sending
> across, so
>  * don't bother building parameterized paths.
>  */
> if (!fpinfo->use_remote_estimate)
> return;
>
> regards, tom lane
>


Re: PostgreSQL optimizer use seq scan instead of pkey index only scan (in queries with postgres_fdw)

2019-05-06 Thread Jeff Janes
On Mon, May 6, 2019 at 11:53 AM Jeff Janes  wrote:

> On Mon, May 6, 2019 at 11:38 AM Vitaly Baranovsky <
> [email protected]> wrote:
>
>> Thank you, Jeff!
>>
>> We'll be looking forward to the next version of Postgres in this case.
>>
>> As far as I understand, you've answered about sending filtering condition
>> to a foreign server... Could you, please, clarify about another (the first)
>> part of my question? Why the server choose seq scan instead of pk key index
>> only scan for the local table?
>>
>> Thank you
>>
>>
> Aren't those the same thing?  The foreign server can't use the where
> clause, if it doesn't get sent.
>

Nevermind. When you said local table, I had some tunnel vision and was
thinking of the foreign table as viewed from the perspective of the foreign
server (to which it is local), not the actual local table.   That too is
"fixed" in the same commit to the 12dev branch as the other issue is:

commit 4be058fe9ec5e630239b656af21fc083371f30ed
Date:   Mon Jan 28 17:54:10 2019 -0500

In the planner, replace an empty FROM clause with a dummy RTE.


My tests are all done with empty, unanalyzed tables as I just took you DDL
without inventing my own DML, so may be different than what what you were
seeing with your populated tables.

Cheers,

Jeff

>