slow insert speeds with bytea
Is there anything I can to increase insert speeds for bytea? Currently running postgres 9.6.15 I have a few tables without a bytea and a few with bytea. There is a large performance difference with inserts between the two. I'm inserting a byte[] that's usually less than 1MB on content. The content itself is actually just utf8 string data. For the non-bytea table, inserts can be as high as 40k rows/sec, whereas the bytea table is closer to 4k/sec or less. If this is just a limitation of postgres, then that's fine but the performance delta is so significant that i feel like i'm missing something
Conditional return of aggregated data
Hi, I have some aggregated statistics which are currently being queried as follows: create view stats_age as SELECT a.category, a.statcount, b.a_desc FROM reg_aggregated_stats a,regs_select_age b where a.category='age' and a.datapoint::smallint=b.a_val order by a.datapoint asc; However, as these relate to event registrations, a suggestion has been made that the database really should be returning nothing until a certain number of registrations has been met (in order to avoid privacy infringing inferrence from what should be an otherwise fully irreversibly aggregated dataset). Now, the queries are wrapped in PL/PGSQL functions anyway, so I could make a second call to Postgres to find out sum(statcount) and then conditionally return based on that. But is there a smarter way to do this out of a single SQL query ? My initial idea was something along the lines of : select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number; But as I soon discovered that's not valid syntax! Hence ideas welcome from those smarter than me. Thanks ! Laura
Re: Conditional return of aggregated data
Hallo Laura, i don't know if i understand your question fully, but this might be helpfull?: FILTER SELECT count(*) AS unfiltered, count(*) FILTER (WHERE i < 5) AS filtered FROM generate_series(1,10) AS s(i); unfiltered | filtered +-- 10 |4 (1 row) https://www.postgresql.org/docs/current/sql-expressions.html hth, Wim Laura Smith schreef op ma 02-12-2019 om 11:10 [+]: > Hi, > > I have some aggregated statistics which are currently being queried > as follows: > > create view stats_age as > SELECT a.category, >a.statcount, >b.a_desc > FROM reg_aggregated_stats a,regs_select_age b where a.category='age' > and a.datapoint::smallint=b.a_val order by a.datapoint asc; > > However, as these relate to event registrations, a suggestion has > been made that the database really should be returning nothing until > a certain number of registrations has been met (in order to avoid > privacy infringing inferrence from what should be an otherwise fully > irreversibly aggregated dataset). > > Now, the queries are wrapped in PL/PGSQL functions anyway, so I could > make a second call to Postgres to find out sum(statcount) and then > conditionally return based on that. > > But is there a smarter way to do this out of a single SQL query ? > > My initial idea was something along the lines of : > select (select sum(statcount) from stats_residence) as > aggstat,statcount,short_name_en from stats_residence where > aggstat>some_number; > > But as I soon discovered that's not valid syntax! Hence ideas welcome > from those smarter than me. > > Thanks ! > > Laura > > -- mvg, Wim Bertels -- Lector UC Leuven-Limburg -- My only love sprung from my only hate! Too early seen unknown, and known too late! -- William Shakespeare, "Romeo and Juliet"
Re: Conditional return of aggregated data
> My initial idea was something along the lines of : > select (select sum(statcount) from stats_residence) as > aggstat,statcount,short_name_en from stats_residence where > aggstat>some_number; Am I missing something basic. The above can be done using GROUP BY and HAVING clause.
Re: Conditional return of aggregated data
> > > My initial idea was something along the lines of : > > select (select sum(statcount) from stats_residence) as > > aggstat,statcount,short_name_en from stats_residence where > > aggstat>some_number; > > Am I missing something basic. The above can be done using > GROUP BY and HAVING clause. or this with t as (select (select sum(statcount) from stats_residence) as aggstat, statcount,short_name_en from stats_residence ) select * from t where aggstat > some_number Apology if I did not understand the question correctly.
Re: Conditional return of aggregated data
> > or this > with t as > (select (select sum(statcount) from stats_residence) as aggstat, > statcount,short_name_en from stats_residence > ) > select * from t where aggstat > some_number > > Apology if I did not understand the question correctly. Hi Ravi, Thanks for helping show me the way. You're quite right, a CTE did the trick Laura
Re: Conditional return of aggregated data
On Mon, 2 Dec 2019 at 12:11, Laura Smith wrote: > > My initial idea was something along the lines of : > select (select sum(statcount) from stats_residence) as > aggstat,statcount,short_name_en from stats_residence where > aggstat>some_number; > One option is to move the aggregate to the where-clause. If you also need the value in your select-list, you can just repeat the subselect there, usually the planner is smart enough to figure out that it can just re-use the result. select short_name_en from stats_residence where (select sum(statcount) from stats_residence) >some_number; -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Connection terminated but client didn't realise
David Wheeler writes: > We have a query that our system runs nightly to refresh materialised views. > This takes some time to execute (~25 minutes) and then it will usually return > the results to the application and everything is golden. However occasionally > we see something like the below, where the query finishes, but the connection > gets unexpectedly closed from Postgres’ perspective. From the application’s > perspective the connection is still alive, and it sits there forever waiting > for the result. Is the application remote from the database server? My gut reaction to this type of report is "something timed out the network connection", but there would have to be a router or firewall or the like in between to make that a tenable explanation. If that is the issue, you should be able to fix it by making the server's TCP keepalive settings more aggressive. regards, tom lane
Re: pgbackrest concerns and doubts.
Hi Ajay, On 12/2/19 1:52 AM, Ajay Pratap wrote: I am using Postgresql 10 and pgbackrest 2.18 version on centos 7.6 I have few doubts regard pgbackrest. a) what is the log rotation policy for pgbackrest logs. I see it logs on default path /var/log/pgbackrest/-.log what is the log rotation policy of each logs. use case: if i am writing a script to parse the logs and gatter some info, i should be aware of log rotation or if logs doesn't rotate a single file could be huge to parse. Or should I simply use /logrotate/ pgBackRest does not have any built-in log rotation policies since this is best implemented per OS. Some packages have logrotate scripts and others don't. RHEL doesn't, but you can see a logrotate example in the Debian/Ubuntu package at: https://salsa.debian.org/postgresql/pgbackrest/blob/master/debian/pgbackrest.logrotate b) since pgbackrest takes physical backup, what are the impact if I upgrades minor postgres version(10.5 to 10.10) and impact on postgres major version(10.10 to 12.X) Minor PostgreSQL upgrades require no special action in pgBackRest. We test with each minor upgrade to ensure there are no regressions. Unless you have a specific reason not to, it is always best to be running the most recent PostgreSQL minor version. Major version upgrades will require a pgBackRest stanza-upgrade to be run after the PostgreSQL upgrade is complete. For more information see: https://pgbackrest.org/user-guide-centos7.html#upgrade-stanza. Regards, -- -David da...@pgmasters.net
Re: slow insert speeds with bytea
On 12/1/19 9:59 AM, Alex O'Ree wrote: Is there anything I can to increase insert speeds for bytea? Currently running postgres 9.6.15 I have a few tables without a bytea and a few with bytea. There is a large performance difference with inserts between the two. I'm inserting a byte[] that's usually less than 1MB on content. The content itself is actually just utf8 string data. For the non-bytea table, inserts can be as high as 40k rows/sec, whereas the bytea table is closer to 4k/sec or less. If this is just a limitation of postgres, then that's fine but the performance delta is so significant that i feel like i'm missing something It would help to have more information: 1) The schema of the table e.g. the output of \d in psql. 2) The actual INSERT query. 3) An EXPLAIN ANALYZE of the INSERT query. -- Adrian Klaver adrian.kla...@aklaver.com
Re: slow insert speeds with bytea
On 01/12/2019 17:59, Alex O'Ree wrote: Is there anything I can to increase insert speeds for bytea? Currently running postgres 9.6.15 I have a few tables without a bytea and a few with bytea. There is a large performance difference with inserts between the two. I'm inserting a byte[] that's usually less than 1MB on content. The content itself is actually just utf8 string data. For the non-bytea table, inserts can be as high as 40k rows/sec, whereas the bytea table is closer to 4k/sec or less. 4k 1MB rows/sec would be 4GB a second. It would need to be a fairly decent IO system to manage that speed, let alone ten times that. What's the typical row size of the tables with bytea fields vs the other tables? What are your IO stats during the two sorts of insert? Cheers, Steve
Re: using replace function
Hi Osvaldo, Rod, David and Tom Thanks for your attention. Lastly it is not that difficult. For me the more natural way is Rod's suggestion, to use replace with where. I don't know why I didn't think of this solution. The usage of regexp_replace is very nice too and I will take a time to learn it. Thanks a lot to you all Antonio Em sex., 29 de nov. de 2019 às 01:22, Tom Lane escreveu: > "David G. Johnston" writes: > > On Thursday, November 28, 2019, Rob Sargent > wrote: > >> I want to replace a string (not a substring) in a field but making sure > >> that the string in the full field. > > > I’d probably do something like: > > Select case when a.col = ‘value’ then ‘new value’ else a.col end from a; > > Yeah, this. You could wrap it up in a SQL function if you want the > same level of notational convenience as replace(). > > Another possibility is regexp_replace with an anchored pattern, but > that would potentially require escaping regexp metacharacters in the > pattern, so the PITA factor is high. And I doubt it'd be faster than > the CASE solution. > > regards, tom lane > -- Antônio Olinto Ávila da Silva Biólogo / Oceanógrafo Instituto de Pesca (Fisheries Institute) São Paulo, Brasil
Preformance on upcoming Linux kernel
Just wanted to point out this article: https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.5-Early-Performance Looks like something has changed between 5.4 and 5.5 that is degrading postgresql performance. -- "They that would give up essential liberty for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin
Upgrading from V11 to V12 on Debian install
I have several machines that have version 11 instances on them. These are all Debian or Ubuntu machines. All of them are pointed to the Postgres repositories fro the Postgres binaries deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main dpkg -l shows: ii postgresql-11 11.6-1.pgdg100+1 ii postgresql-12 12.1-1.pgdg100+1 ii postgresql-client-11 11.6-1.pgdg100+1 ii postgresql-client-12 12.1-1.pgdg100+1 ii postgresql-client-common 210.pgdg100+1 ii postgresql-common 210.pgdg100+1 ii postgresql-contrib12+210.pgdg100+1 ii postgresql-server-dev-11 11.6-1.pgdg100+1 Googling around, I found this link: https://scottlinux.com/2015/11/14/upgrade-postgresql-on-debian-jessie-with-pg_upgradecluster/ I plan on upgrading one of the test instances right away. Is the information in this link the best way to do this in a Debian environment? I did look at the pg_upgrade documentation, and it looks like I could really make a mess if I misunderstand even one step. "They that would give up essential liberty for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin
Re: Upgrading from V11 to V12 on Debian install
On 12/2/19 11:28 AM, stan wrote: I have several machines that have version 11 instances on them. These are all Debian or Ubuntu machines. All of them are pointed to the Postgres repositories fro the Postgres binaries deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main dpkg -l shows: ii postgresql-11 11.6-1.pgdg100+1 ii postgresql-12 12.1-1.pgdg100+1 ii postgresql-client-11 11.6-1.pgdg100+1 ii postgresql-client-12 12.1-1.pgdg100+1 ii postgresql-client-common 210.pgdg100+1 ii postgresql-common 210.pgdg100+1 ii postgresql-contrib12+210.pgdg100+1 ii postgresql-server-dev-11 11.6-1.pgdg100+1 Googling around, I found this link: https://scottlinux.com/2015/11/14/upgrade-postgresql-on-debian-jessie-with-pg_upgradecluster/ I plan on upgrading one of the test instances right away. Is the information in this link the best way to do this in a Debian environment? I did look at the pg_upgrade documentation, and it looks like I could really make a mess if I misunderstand even one step. Yes as pointed out previously: https://www.postgresql.org/message-id/20191107161401.ga24...@hjp.at For more info: man pg_upgradecluster "They that would give up essential liberty for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin -- Adrian Klaver adrian.kla...@aklaver.com
Re: Connection terminated but client didn't realise
> Is the application remote from the database server? My gut reaction to this > type of report is "something timed out the network connection", but there > would have to be a router or firewall or the like in between to make that a > tenable explanation. > If that is the issue, you should be able to fix it by making the server's > TCP keepalive settings more aggressive. Yes the application server is separate from the database server, and the application is running within docker which I suspect adds some complexity too. I had suspicions about something in the middle closing the connection too, but your email has clarified my thinking a bit. TCP Keepalive appears to be enabled on the application server and within docker, and the client holds the allegedly dead connection for much longer (24h) than the keepalive should take to kill it (<3h), so I think the next step is to try to identify the connection at the OS level with netstat to see what state it's in. Thanks for your help. Regards, David On 2/12/19, 11:17 pm, "Tom Lane" wrote: David Wheeler writes: > We have a query that our system runs nightly to refresh materialised views. This takes some time to execute (~25 minutes) and then it will usually return the results to the application and everything is golden. However occasionally we see something like the below, where the query finishes, but the connection gets unexpectedly closed from Postgres’ perspective. From the application’s perspective the connection is still alive, and it sits there forever waiting for the result. Is the application remote from the database server? My gut reaction to this type of report is "something timed out the network connection", but there would have to be a router or firewall or the like in between to make that a tenable explanation. If that is the issue, you should be able to fix it by making the server's TCP keepalive settings more aggressive. regards, tom lane