Re: PostGreSQL Replication and question on maintenance
Any reply on this please ?. On Fri, Nov 15, 2019 at 9:10 AM github kran wrote: > > > On Thu, Nov 14, 2019 at 11:42 PM Pavel Stehule > wrote: > >> these numbers looks crazy high - how much memory has your server - more >> than 1TB? >> > > The cluster got 244 GB of RAM and storage capacity it has is 64 TB. > >> >> >> pá 15. 11. 2019 v 6:26 odesílatel github kran >> napsal: >> >>> >>> Hello postGreSQL Community , Hope everyone is doing great !!. *Background* We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for our day to day activities to write and read data. We have 2 clusters running PostgreSQL engine , one cluster keeps data up to 60 days and another cluster retains data beyond 1 year. The data is partitioned close to a week( ~evry 5 days a partition) and we have around 5 partitions per month per each table and we have 2 tables primarily so that will be 10 tables a week. So in the cluster-1 we have around 20 partitions and in cluster-2 we have around 160 partitions ( data from 2018). We also want to keep the data for up to 2 years in the cluster-2 to serve the data needs of the customer and so far we reached upto 1 year of maintaining this data. *Current activity* We have a custom weekly migration DB script job that moves data from 1 cluster to another cluster what it does is the below things. 1) COPY command to copy the data from cluster-1 and split that data into binary files 2) Writing the binary data into the cluster-2 table 3) Creating indexes after the data is copied. *Problem what we have right now. * When the migration activity runs(weekly) from past 2 times , we saw the cluster read replica instance has restarted as it fallen behind the master(writer instance). Everything after that worked seamlessly but we want to avoid the replica getting restarted. To avoid from restart we started doing smaller binary files and copy those files to the cluster-2 instead of writing 1 big file of 450 million records. We were successful in the recent migration as the reader instance didn’t restart after we split 1 big file into multiple files to copy the data over but did restart after the indexes are created on the new table as it could be write intensive. *DB parameters set on migration job* work_mem set to 8 GB and maintenace_work_mem=32 GB. >>> >> >> >> Indexes per table = 3 total indexes for 2 tables = 5 *DB size* Cluster-2 = 8.6 TB Cluster-1 = 3.6 TB Peak Table relational rows = 400 - 480 million rows Average table relational rows = 300 - 350 million rows. Per table size = 90 -95 GB , per table index size is about 45 GB *Questions* 1) Can we decrease the maintenace_work_mem to 16 GB and will it slow down the writes to the cluster , with that the reader instance can sync the data slowly ?. 2) Based on the above use case what are your recommendations to keep the data longer up to 2 years ? 3) What other recommendations you recommend ?. Appreciate your replies. THanks githubkran >
Re: access to original-statement predicates in an INSTEAD-OF row trigger
On 11/15/19 17:38, Adrian Klaver wrote: > On 11/15/19 1:54 PM, John Lumby wrote: >> Adrian Klaver wrote : >>> >> We need to run with Read Committed. >> >> I am looking for a solution which does not alter the application or >> overall behaviour, >> but just addresses detecting which predicates to apply in some way. > > Not sure how that could be pulled off with Read Committed as it would > involve predicting the future from the POV of the transaction. > No need for time-travel! Let me re-phrase the question in a simpler fashion : How can a row trigger access the original SQL statement at the root of the current operation? Either in string form or any other form (e.g. parse tree). >> >> Cheers, John >> > >
Function performance degrades after repeated execution
Hi, We've been troubleshooting a slow running function in our postgres database. I've been able to boil it down to the simplest function possible. It looks like this: FOR rec IN select 1 as matchval FROM table1 t1, table2 t2 join table3 t3 on t3.col = t2.col WHERE t1.col = id LOOP IF rec.matchval > 0 THEN co := co + 1; END IF; if co % 100 = 0 then raise notice 'match value %', co; end if; END LOOP; Here's the interesting parts: - The result of that query returns about 13,000 rows. - If I open a PSQL session and execute the function it returns almost immediately. - If I execute the same function 4 more times in the same session (a total of 5 times) it returns immediately. - On the 6th execution it slows down. It processes 100 records every 1.5 minutes. - On every subsequent execution from the same session (after 5 times) it is slow. - It reliably slows down after 5 consecutive executions. - If I exit the PSQL session and open a new one the function returns immediately (up to the 6th execution.) - If I replace the function from a separate session after executing it 5 times, it returns immediately up to 5 executions. - The CPU spikes to 100% after the 5 execution. I'm attempting to understand what is causing the slow down after 5 consecutive executions. But I'm having a hard time getting insight. We are on PostgreSQL 9.6.15. We've tried: - Increase logging to debug5 but don't get any helpful feedback there. - Reviewing the execution plan of the query. Seems fine when running it outside of the function. - Turn on temp file logging -- but no temp files are logged. Any ideas for where we might get insight? Or clues as to what is happening? Thank you.
Re: Function performance degrades after repeated execution
so 16. 11. 2019 v 16:06 odesílatel Dave Roberge napsal: > Hi, > > We've been troubleshooting a slow running function in our postgres > database. I've been able to boil it down to the simplest function possible. > It looks like this: > > FOR rec IN select 1 as matchval FROM table1 t1, table2 t2 > join table3 t3 on t3.col = t2.col > WHERE t1.col = id > LOOP > IF rec.matchval > 0 THEN > co := co + 1; > END IF; > if co % 100 = 0 then > raise notice 'match value %', co; > end if; > END LOOP; > > Here's the interesting parts: > - The result of that query returns about 13,000 rows. > - If I open a PSQL session and execute the function it returns almost > immediately. > - If I execute the same function 4 more times in the same session (a total > of 5 times) it returns immediately. > - On the 6th execution it slows down. It processes 100 records every 1.5 > minutes. > - On every subsequent execution from the same session (after 5 times) it > is slow. > - It reliably slows down after 5 consecutive executions. > - If I exit the PSQL session and open a new one the function returns > immediately (up to the 6th execution.) > - If I replace the function from a separate session after executing it 5 > times, it returns immediately up to 5 executions. > - The CPU spikes to 100% after the 5 execution. > > I'm attempting to understand what is causing the slow down after 5 > consecutive executions. But I'm having a hard time getting insight. We are > on PostgreSQL 9.6.15. > 6 time slow execution is known issue - it is unwanted effect of query plan cache. See part of doc https://www.postgresql.org/docs/9.6/sql-prepare.html Prepared statements can use generic plans rather than re-planning with each set of supplied EXECUTE values. This occurs immediately for prepared statements with no parameters; otherwise it occurs only after five or more executions produce plans whose estimated cost average (including planning overhead) is more expensive than the generic plan cost estimate. Once a generic plan is chosen, it is used for the remaining lifetime of the prepared statement. Using EXECUTE values which are rare in columns with many duplicates can generate custom plans that are so much cheaper than the generic plan, even after adding planning overhead, that the generic plan might never be used. in postgresql 9.6 you can use dynamic query as workaround - it generates only one shot plans, and it should be ok every time FOR rec IN EXECUTE 'SELECT ... https://www.postgresql.org/docs/9.6/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING Regards Pavel > We've tried: > - Increase logging to debug5 but don't get any helpful feedback there. > - Reviewing the execution plan of the query. Seems fine when running it > outside of the function. > - Turn on temp file logging -- but no temp files are logged. > > Any ideas for where we might get insight? Or clues as to what is happening? > > Thank you. > >
Re: access to original-statement predicates in an INSTEAD-OF row trigger
John Lumby writes: > How can a row trigger access the original SQL statement at the root of > the current operation? It can't; at least not in any way that'd be reliable or maintainable. I concur with the upthread recommendation that switching to serializable mode would be a more manageable way of dealing with concurrent-update problems. regards, tom lane
Re: Function performance degrades after repeated execution
On 11/16/19 8:22 AM, Dave Roberge wrote: Hi, We've been troubleshooting a slow running function in our postgres database. I've been able to boil it down to the simplest function possible. It looks like this: FOR rec IN select 1 as matchval FROM table1 t1, table2 t2 join table3 t3 on t3.col = t2.col WHERE t1.col = id LOOP IF rec.matchval > 0 THEN co := co + 1; END IF; if co % 100 = 0 then raise notice 'match value %', co; end if; END LOOP; Where are you joining table1 to either table2 or table3? -- Angular momentum makes the world go 'round.
Re: Function performance degrades after repeated execution
so 16. 11. 2019 v 16:46 odesílatel Ron napsal: > On 11/16/19 8:22 AM, Dave Roberge wrote: > > Hi, > > > > We've been troubleshooting a slow running function in our postgres > database. I've been able to boil it down to the simplest function possible. > It looks like this: > > > > FOR rec IN select 1 as matchval FROM table1 t1, table2 t2 > >join table3 t3 on t3.col = t2.col > >WHERE t1.col = id > > LOOP > >IF rec.matchval > 0 THEN > > co := co + 1; > >END IF; > >if co % 100 = 0 then > > raise notice 'match value %', co; > >end if; > > END LOOP; > > Where are you joining table1 to either table2 or table3? > good shot - there is maybe unwanted cartesian product Pavel > -- > Angular momentum makes the world go 'round. > > >
Re: PostGreSQL Replication and question on maintenance
On Thu, Nov 14, 2019 at 12:23 PM github kran wrote: > >> >> *Problem what we have right now. * >> >> When the migration activity runs(weekly) from past 2 times , we saw the >> cluster read replica instance has restarted as it fallen behind the >> master(writer instance). >> > I can't figure out what your setup is here. You must be using logical replication (not physical) or you wouldn't be able to write to the replica at all. But if you are using logical replication, why do you also need these weekly jobs? Why isn't logical replication taking care of it? > Everything >> >> after that worked seamlessly but we want to avoid the replica getting >> restarted. To avoid from restart we started doing smaller binary files and >> copy those files to the cluster-2 >> > Who restarted it? I am not aware of any case where the replica responds to falling behind by restarting itself. With physical replication, it can start cancelling queries, but you don't seem to be using physical replication. Cheers, Jeff >
Re: Weird ranking results with ts_rank
On Fri, Nov 15, 2019 at 1:31 AM Javier Ayres wrote: > Hi everybody. > > I'm implementing a solution that uses PostgreSQL's full text search > capabilities and I have come across a particular set of results for ts_rank > that don't seem to make sense according to the documentation. > While the documentation doesn't come out and say, my interpretation is that ts_rank assumes there is a match in the first place, and by implication is undefined/unspecified if there is no match. select to_tsvector('search for a text box') @@ to_tsquery('circle | lot <-> box'); ?column? -- f (1 row) Cheers, Jeff
REINDEX VERBOSE unknown option
Hello, according to https://www.postgresql.org/docs/11/sql-reindex.html VERBOSE option is valid for REINDEX command for 11.3 PostgreSQL server. Anyway I'm getting error using VERBOSE option. project_production=# REINDEX VERBOSE TABLE sales; ERROR: syntax error at or near "VERBOSE" LINE 1: REINDEX VERBOSE TABLE sales; Time: 0.235 ms I'm wondering if I'm doing anything wrong or actual documentation is wrong. Any ideas?
Re: REINDEX VERBOSE unknown option
Ahh, I just tried to do the same with reindexdb cli tool and the actual syntax is REINDEX (VERBOSE) TABLE sales; Sorry for unnecessary question. Anyway maybe we can add this to documentation as a example. I can prepare patch for this if welcomed. so 16. 11. 2019 v 18:40 odesílatel Josef Šimánek napsal: > Hello, > > according to https://www.postgresql.org/docs/11/sql-reindex.html VERBOSE > option is valid for REINDEX command for 11.3 PostgreSQL server. Anyway I'm > getting error using VERBOSE option. > > project_production=# REINDEX VERBOSE TABLE sales; > ERROR: syntax error at or near "VERBOSE" > LINE 1: REINDEX VERBOSE TABLE sales; > > Time: 0.235 ms > > I'm wondering if I'm doing anything wrong or actual documentation is > wrong. Any ideas? >
Re: here does postgres take its timezone information from?
On Sat, Nov 16, 2019 at 7:13 PM Tom Lane wrote: > Palle Girgensohn writes: > > 15 nov. 2019 kl. 21:32 skrev Thomas Munro : > >> Ugh. It doesn't have the old backward compatibility names like > >> US/Pacific installed by default, which is a problem if that's what > >> initdb picked for your cluster (or you've stored references to any of > >> those names in other ways). > > > One quick fix is to revert the change. Tom thinks this is not reason to > > revert. Would it be enough to edit the postgresql.conf to use the correct > > "modern" name for US/Pacific (PST?)? In rhar case, an update note might be > > sufficient? > > I think the "official" name of that zone is America/Los_Angeles. > But initdb might seize on the US/Pacific alias, if available, > because it's shorter. We've seen related problems with other > time zone names, though usually it was just cosmetic and not a > reason for the postmaster to fail to start. > > Yes, changing the zone name in postgresql.conf should be a sufficient > fix. In theory, a FreeBSD user ought to know the "official" alias > for their zone, since the rest of the system would expect that. > So this is slightly tedious if initdb chose a non-official alias, > but I don't think it's reason to panic. Perhaps the best thing would be to revert this for the older PostgreSQL releases so that people doing minor version upgrades are inconvenienced by a system that can't start up after "pkg upgrade", but do it for 12 since not many people will be using that yet?
Re: ERROR: COPY escape must be a single one-byte character (multi-delimiter appears to work on Postgres 9.0 but does not on Postgres 9.2)
> I am unable to edit this Talend job, as it's very old and we do not have the > source code for the job anymore. I am unable to see what the actual delimiter Compiled talend jobs produce jars file with java .class files in which the SQL statements are in plain text. You should be at least able to get the copy statement (which is plain text SQL), and also being able to modify it. On Wed, Nov 13, 2019 at 07:40:59PM -0500, Brandon Ragland wrote: > Hello, > > I have a Talend enterprise job that loads data into a PostgreSQL database via > the COPY command. When migrating to a new server this command fails with the > following error message: org.postgresql.util.PSQLException:ERROR: COPY escape > must be a single one-byte character > > The thing is, I looked over the documentation for both Postgres 9.0 and 9.2. > Both documentations say that multi-byte delimiters are not allowed. So I'm > very > confused on why this job works perfectly on Postgres 9.0 but not on 9.2. > > I am unable to edit this Talend job, as it's very old and we do not have the > source code for the job anymore. I am unable to see what the actual delimiter > is. I am also unable to see exactly how the COPY command is being run, such as > whether it's pushing directly to the server via the Postgres driver, or if > it's > created a temporary CSV file somewhere and then loading the data into the > server. I believe the reason we have multi byte delimiters setup is due to the > use of various special characters in a few of the columns for multiple tables. > > I am not aware of any edits to the source code of the old 9.0 Postgres server. > > The reason we are migrating servers is due to the end of life for CentOS 5. > The > new server runs CentOS 7. I believe that both servers are using the default > Postgres versions that come in the default CentOS repositories. I know for > sure > that the CentOS 7 server is indeed running the default Postgres version, as I > installed it myself through yum. > > Any help would be greatly appreciated. > > Also, is there a way to copy the old Postgres server, dependencies, and > executables to our new server, in case the source was modified? > > Brandon Ragland > Software Engineer > BREAKFRONT SOFTWARE > Office: 704.688.4085 | Mobile: 240.608.9701 | Fax: 704.973.0607 -- nicolas
Re: pg12 rc1 on CentOS8 depend python2
Hi, On Fri, 2019-09-27 at 10:50 +0900, keisuke kuroda wrote: > CentOS8 does not have python2 installed by default, But PostgreSQL is > dependent on python2. > > Do we need to install python2 when we use PostgreSQL on CentOS8? For the archives: I fixed this in 12.1 packages. Core package do not depend on PythonX. Regards, -- Devrim Gündüz Open Source Solution Architect, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR signature.asc Description: This is a digitally signed message part
Re: pg12 rc1 on CentOS8 depend python2
Hi, On Fri, 2019-09-27 at 09:38 -0400, Tom Lane wrote: > Another idea might be to bundle them into the plpython package > instead of contrib (and similarly for the plperl transforms). This went into the last week's minor updates. Regards, -- Devrim Gündüz Open Source Solution Architect, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR signature.asc Description: This is a digitally signed message part
Re: REINDEX VERBOSE unknown option
so 16. 11. 2019 v 18:43 odesílatel Josef Šimánek napsal: > Ahh, I just tried to do the same with reindexdb cli tool and the > actual syntax is REINDEX (VERBOSE) TABLE sales; Sorry for unnecessary > question. Anyway maybe we can add this to documentation as a example. I can > prepare patch for this if welcomed. > Documentation patch is good idea. Pavel > so 16. 11. 2019 v 18:40 odesílatel Josef Šimánek > napsal: > >> Hello, >> >> according to https://www.postgresql.org/docs/11/sql-reindex.html VERBOSE >> option is valid for REINDEX command for 11.3 PostgreSQL server. Anyway I'm >> getting error using VERBOSE option. >> >> project_production=# REINDEX VERBOSE TABLE sales; >> ERROR: syntax error at or near "VERBOSE" >> LINE 1: REINDEX VERBOSE TABLE sales; >> >> Time: 0.235 ms >> >> I'm wondering if I'm doing anything wrong or actual documentation is >> wrong. Any ideas? >> >
Function's execute overhead reducing
Hi all! The best way to describe my question is to show the code as first: create table tst( id int primary key, j1 jsonb, j2 jsonb ); insert into tst select ser, jsonb_build_object( floor(random() * 10 + 1), floor(random() * 1000 + 1), floor(random() * 10 + 1), floor(random() * 1000 + 1), floor(random() * 10 + 1), floor(random() * 1000 + 1) ), jsonb_build_object( floor(random() * 10 + 1), floor(random() * 1000 + 1), floor(random() * 10 + 1), floor(random() * 1000 + 1), floor(random() * 10 + 1), floor(random() * 1000 + 1) ) from generate_series(1, 50) ser; analyze tst; -- original func is a bit complicated. But it doesn't matter here create or replace function tst_func(a jsonb, b jsonb) returns bigint stable language sql as $$ select sum( ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int ) from jsonb_each(a) _a $$; -- get plain data explain analyze select id, j1, j2 from tst; -- use subquery (the same code as in function) explain analyze select id, j1, j2, ( select sum( ((_a.value::text)::int - (coalesce(j2->>_a.key, '0'))::int)::int ) from jsonb_each(j1) _a ) from tst; -- use function explain analyze select id, j1, j2, tst_func(j1, j2) from tst; select version(); And after run it I got following results: CREATE TABLE INSERT 0 50 ANALYZE CREATE FUNCTION QUERY PLAN -- Seq Scan on tst (cost=0.00..13558.00 rows=50 width=108) (actual time=0.009..40.348 rows=50 loops=1) Planning time: 0.189 ms Execution time: 56.356 ms (3 rows) QUERY PLAN -- Seq Scan on tst (cost=0.00..1644808.00 rows=50 width=116) (actual time=0.021..1966.190 rows=50 loops=1) SubPlan 1 -> Aggregate (cost=3.25..3.26 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=50) -> Function Scan on jsonb_each _a (cost=0.00..1.00 rows=100 width=64) (actual time=0.002..0.002 rows=3 loops=50) Planning time: 0.072 ms Execution time: 1982.192 ms (6 rows) QUERY PLAN - Seq Scan on tst (cost=0.00..138558.00 rows=50 width=116) (actual time=0.072..5308.897 rows=50 loops=1) Planning time: 0.067 ms Execution time: 5328.196 ms (3 rows) version - PostgreSQL 9.6.13 on x86_64-pc-linux-gnu (Ubuntu 9.6.13-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit (1 row) As you see, subquery version needs 2 seconds when func version needs more than 5. And it's sad to see. I love functions and actually this func (its production version) is widely used across our project. I tried to alter function as immutable and even parallel safe but it doesn't help. I understand that functions execution is not free, but why so much? So, my question is: is there any way to make this function works faster? Something similar to prepared statement or maybe rules (create rule) or whatever else is available. PS current situation maybe solved by denormalization with precalculations of function and storing results along with data, but it's not the way i would like to use, because it leads to more issues to solve (invalidation, for instance)