Hi Ingolf,
On 2. Feb 2021, at 13:05, Markhof, Ingolf
<[email protected]<mailto:[email protected]>> wrote:
Hi!
My PostgreSQL version is 11.8.
The query I am running is referring to a number of foreign tables. The first
one (table1) has to IDs, let's say ID_A and ID_B. While ID_A is unique, ID_B is
not. In my case, I am pulling formation for a value of IB_B for which about 800
rows (with unique ID_A) exist. I found:
While
select * from my_view where id_b='some value';
seemingly runs "forever" (I cancelled execution after a few hours), the
following completes in about 1 hr:
select * from my_view where ia_a in (
select id_a from table1 where id_b='some value'
);
So, I tried smaller chunks of ID_a and found the execution time is non-linear
with respect to number of IDs. For e.g. 50 ID_A's, it was completed in about 12
sec.
[...]
I then found the option fetch_size, e.g. ALTER SERVER some_server OPTIONS
(fetch_size '50000'). A chunk of 50 now executes in 2 seconds (instead of 12
before).
Thanks for the additional info. I tried to replicate this, you can find the
GitHub Gist at [1], happy to hear your feedback about it.
What I can see from the execution plans in my example is, that the postgres_fdw
pushes down that part
Remote SQL: SELECT id_a FROM public.a WHERE ((id_b = 1))
part to the remote. On the remote DB this query can result in either an
index-only scan, an index scan or a full table scan. Which method is chosen
depends on table size and indexes. Given the nature of postgres_fdw this will
be done in any case with a sequential query. For this part I would claim:
- If the planner expects few rows, it will choose an index-lookup which is a
good thing because it effectively reduces the amount of data that needs to be
queried. This would make it fast, given this is a sequential scan.
- If the planner expects many rows, it might choose a sequential scan which can
be slow depending on the overall size of the table and likely whether it is
cached or not.
So, I found the "size" of the query has a serious impact to the execution time.
I don't really understand why execution 16*50 takes 16*2 secs only, but
executing 1*800 takes about 3000 seconds...
The mentioned fetch_size parameter has a positive effect, because one can grab
many more rows and return them at the same time. Worst case (and this is just
pure assumption), on each new fetch, the query might be re-executed and thus
runtime becomes much more.
Further up in the plan, I see
Remote SQL: SELECT id_a, id_b FROM public.a
which is the "SELECT * FROM my_view" part. Meaning, here it will definitely do
a full table scan on remote since it cannot push down the IN condition. I don't
really see right now why this query at all is slower than your original form.
In my experiment it is not, but maybe I am doing something wrong in the schema.
One thought would be however, that the full table scan on the remote is more
efficient than pushing down the filter and thus it returns faster.
To really figure out more, I would suggest to increase the logging level on
your remote server in order to see which queries are really executed. Even
better to maybe use auto_explain to fetch plans and see whether these claims
apply.
Best,
Sebastian
--
[1]: https://gist.github.com/sdressler/9a93d66b7052dc75ec45c0a4bf5c61de
Sebastian Dressler, Solution Architect, Swarm64
+49 30 994 0496 72 | [email protected]<mailto:[email protected]>