Re: FDW, too long to run explain
On Monday, 4 February 2019 09:14:14 EET Vijaykumar Jain wrote: > Hi, Hi, > with pg v10.1 > we use writes directly to shards, and reads via FDW from all shards (RO) > our DB size is ~ 500GB each shard, and tables are huge too. > 1 table ~ 200GB, 1 ~55GB, 1 ~40GB and a lot of small tables, but large > indices on large table. > > the sharding was done based on a key to enable shard isolation at app layer > using a fact table. > select id,shard from fact_table; > > server resources are, > 32GB mem, 8 vcpu, 500GB SSD. > > the FDW connect to each other shard via FDW fronted by haproxy -> pgbouncer > -> postgresql. > Hope this is good enough background :) > > now we have some long running queries via FDW that take minutes and get > killed explain runs as idle in transaction on remote servers. (we set > use_remote_estimate = true ) > when the query is run on individual shards directly, it runs pretty > quickly, > but when run via FDW, it takes very long. > i even altered fetch_sie to 1, so that in case some filters do not get > pushed, those can be applied on the FDW quickly. In general, the plans via FDW are not the same as the ones running locally. We're having similar issues and the reason seems to be that queries via FDW are optimized for startup cost or few rows. > Regards, > Vijay -- Regards, Peter
Shared hosting with FDW on AWS RDS
Hi, We'd like to configure an RDS server for shared hosting. The idea is that every customer will be using a different database and FDW will be configured, so that the remote tables have access to the full data, but materialized views will be pulling from them data specific to each customer. So far, everything seems to work fine and be secure, as we've revoked access to the remote tables for the customer users, but I'm feeling a bit uneasy considering that the credentials for full access are stored in each database. My understanding is that remote user mapping is designed so that this will not be an issue, but I was wondering if access to the metadata schema might allow to circumvent this restriction. Also, I was wondering if someone has experience hardening databases on RDS, as the so called superuser does not have the right to revoke access from the metadata schema. Comments and suggestions are welcome. -- Regards, Peter
Re: FDW, too long to run explain
Related to this question: Postgresql cursors are in most cases I've tried extremely slow. The cause is as described in my previous answer, in my experience. Is there any plan or way to improve this situation? For example, for FDW one would expect the plan on the remote side to be similar, if not identical, to the one locally, with the exception of the setup cost. -- Regards, Peter
Re: FDW, too long to run explain
On Sunday, 17 February 2019 20:58:47 EET Jeff Janes wrote: > A question for the PostgreSQL hackers would be, Is it necessary and > desirable that the EXPLAIN be issued in the same transaction as the > eventual DECLARE and FETCHes? I don't think it is. I guess if the foreign > side table definition got changed between EXPLAIN and DECLARE it would > cause problems, but changing the foreign side definition out of sync with > the local side can cause problems anyway, so is that important to preserve? Won't separate transactions cause issues if the statistics of the table change in the meantime in a way that affects the plan? > Cheers, > > Jeff -- Regards, Peter
FDW and remote cursor_tuple_fraction
Hi, One reason the remote plans via FDW are different than the plans of queries running directly on the remote server is that a cursor is used, which is optimized for low number of rows according to cursor_tuple_fraction. Is there any way to set parameters that affect the queries on the remote via FDW? If not, is there any plan to introduce such setting? -- Regards, Peter
Re: Design of a reliable task processing queue
On Saturday, 18 January 2025 12:44:07 EET Alex Burkhart wrote: > Hey team, > > I'm looking for help to organize locks and transaction for a reliable task > queue. > > REQUIREMENTS > > 1. Pending actions are persisted to a database. There's a trace once they > are done. > 2. Application workers pick actions one by one. At any given time, each > action can be assigned to at most one worker (transaction). > 3. If multiple actions have same "lock_id", only one of them is processed > at the time. That has to be action with smallest id. Why reinvent the wheel and not use production-ready code from projects such as que (Ruby), pgqueuer (Python)? -- Regards, Peter