slow insert speeds with bytea

2019-12-02 Thread Alex O'Ree
Is there anything I can to increase insert speeds for bytea? Currently
running postgres 9.6.15

I have a few tables without a bytea and a few with bytea. There is a large
performance difference with inserts between the two. I'm inserting a byte[]
that's usually less than 1MB on content. The content itself is actually
just utf8 string data.

For the non-bytea table, inserts can be as high as 40k rows/sec, whereas
the bytea table is closer to 4k/sec or less.

If this is just a limitation of postgres, then that's fine but the
performance delta is so significant that i feel like i'm missing something


Conditional return of aggregated data

2019-12-02 Thread Laura Smith
Hi,

I have some aggregated statistics which are currently being queried as follows:

create view stats_age as
SELECT a.category,
   a.statcount,
   b.a_desc
FROM reg_aggregated_stats a,regs_select_age b where a.category='age' and 
a.datapoint::smallint=b.a_val order by a.datapoint asc;

However, as these relate to event registrations, a suggestion has been made 
that the database really should be returning nothing until a certain number of 
registrations has been met (in order to avoid privacy infringing inferrence 
from what should be an otherwise fully irreversibly aggregated dataset).

Now, the queries are wrapped in PL/PGSQL functions anyway, so I could make a 
second call to Postgres to find out sum(statcount) and then conditionally 
return based on that.

But is there a smarter way to do this out of a single SQL query ?

My initial idea was something along the lines of :
 select (select sum(statcount) from stats_residence) as 
aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;

But as I soon discovered that's not valid syntax! Hence ideas welcome from 
those smarter than me.

Thanks !

Laura




Re: Conditional return of aggregated data

2019-12-02 Thread Wim Bertels
Hallo Laura,

i don't know if i understand your question fully,
but this might be helpfull?: FILTER

SELECT
count(*) AS unfiltered,
count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);
 unfiltered | filtered
+--
 10 |4
(1 row)

https://www.postgresql.org/docs/current/sql-expressions.html

hth,
Wim

Laura Smith schreef op ma 02-12-2019 om 11:10 [+]:
> Hi,
> 
> I have some aggregated statistics which are currently being queried
> as follows:
> 
> create view stats_age as
> SELECT a.category,
>a.statcount,
>b.a_desc
> FROM reg_aggregated_stats a,regs_select_age b where a.category='age'
> and a.datapoint::smallint=b.a_val order by a.datapoint asc;
> 
> However, as these relate to event registrations, a suggestion has
> been made that the database really should be returning nothing until
> a certain number of registrations has been met (in order to avoid
> privacy infringing inferrence from what should be an otherwise fully
> irreversibly aggregated dataset).
> 
> Now, the queries are wrapped in PL/PGSQL functions anyway, so I could
> make a second call to Postgres to find out sum(statcount) and then
> conditionally return based on that.
> 
> But is there a smarter way to do this out of a single SQL query ?
> 
> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as
> aggstat,statcount,short_name_en from stats_residence where
> aggstat>some_number;
> 
> But as I soon discovered that's not valid syntax! Hence ideas welcome
> from those smarter than me.
> 
> Thanks !
> 
> Laura
> 
> 
-- 
mvg,
Wim Bertels
--
Lector
UC Leuven-Limburg
--
My only love sprung from my only hate!
Too early seen unknown, and known too late!
-- William Shakespeare, "Romeo and Juliet"



Re: Conditional return of aggregated data

2019-12-02 Thread Ravi Krishna


> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as 
> aggstat,statcount,short_name_en from stats_residence where 
> aggstat>some_number;

Am I missing something basic.  The above can be done using
GROUP BY and HAVING clause.




Re: Conditional return of aggregated data

2019-12-02 Thread Ravi Krishna


>
> > My initial idea was something along the lines of :
> >  select (select sum(statcount) from stats_residence) as 
> > aggstat,statcount,short_name_en from stats_residence where 
> > aggstat>some_number;
>
> Am I missing something basic.  The above can be done using
> GROUP BY and HAVING clause.
or this
with t as
(select (select sum(statcount) from stats_residence) as aggstat,
 statcount,short_name_en from stats_residence
)
select * from t where aggstat > some_number

Apology if I did not understand the question correctly.




Re: Conditional return of aggregated data

2019-12-02 Thread Laura Smith
>
> or this
> with t as
> (select (select sum(statcount) from stats_residence) as aggstat,
> statcount,short_name_en from stats_residence
> )
> select * from t where aggstat > some_number
>
> Apology if I did not understand the question correctly.


Hi Ravi,

Thanks for helping show me the way.  You're quite right, a CTE did the trick

Laura




Re: Conditional return of aggregated data

2019-12-02 Thread Alban Hertroys
On Mon, 2 Dec 2019 at 12:11, Laura Smith 
wrote:

>
> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as
> aggstat,statcount,short_name_en from stats_residence where
> aggstat>some_number;
>

One option is to move the aggregate to the where-clause. If you also need
the value in your select-list, you can just repeat the subselect there,
usually the planner is smart enough to figure out that it can just re-use
the result.

select short_name_en from stats_residence where (select sum(statcount) from
stats_residence) >some_number;

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Connection terminated but client didn't realise

2019-12-02 Thread Tom Lane
David Wheeler  writes:
> We have a query that our system runs nightly to refresh materialised views. 
> This takes some time to execute (~25 minutes) and then it will usually return 
> the results to the application and everything is golden. However occasionally 
> we see something like the below, where the query finishes, but the connection 
> gets unexpectedly closed from Postgres’ perspective. From the application’s 
> perspective the connection is still alive, and it sits there forever waiting 
> for the result.

Is the application remote from the database server?  My gut reaction
to this type of report is "something timed out the network connection",
but there would have to be a router or firewall or the like in between
to make that a tenable explanation.

If that is the issue, you should be able to fix it by making the
server's TCP keepalive settings more aggressive.

regards, tom lane




Re: pgbackrest concerns and doubts.

2019-12-02 Thread David Steele

Hi Ajay,

On 12/2/19 1:52 AM, Ajay Pratap wrote:


I am using Postgresql 10 and pgbackrest 2.18 version on centos 7.6
I have few doubts regard pgbackrest.
a) what is the log rotation policy for pgbackrest logs. I see it logs on 
default path /var/log/pgbackrest/-.log  what is 
the log rotation policy of each logs.
use case: if i am writing a script to parse the logs and gatter some 
info, i should be aware of log rotation or if logs doesn't rotate a 
single file could be huge to parse.

Or should I simply use /logrotate/


pgBackRest does not have any built-in log rotation policies since this 
is best implemented per OS.  Some packages have logrotate scripts and 
others don't.  RHEL doesn't, but you can see a logrotate example in the 
Debian/Ubuntu package at: 
https://salsa.debian.org/postgresql/pgbackrest/blob/master/debian/pgbackrest.logrotate



b) since pgbackrest takes physical backup,
  what are the impact if I upgrades minor postgres version(10.5 to 10.10)
and impact on postgres major version(10.10 to 12.X)


Minor PostgreSQL upgrades require no special action in pgBackRest.  We 
test with each minor upgrade to ensure there are no regressions.  Unless 
you have a specific reason not to, it is always best to be running the 
most recent PostgreSQL minor version.


Major version upgrades will require a pgBackRest stanza-upgrade to be 
run after the PostgreSQL upgrade is complete.  For more information see: 
https://pgbackrest.org/user-guide-centos7.html#upgrade-stanza.


Regards,
--
-David
da...@pgmasters.net




Re: slow insert speeds with bytea

2019-12-02 Thread Adrian Klaver

On 12/1/19 9:59 AM, Alex O'Ree wrote:
Is there anything I can to increase insert speeds for bytea? Currently 
running postgres 9.6.15


I have a few tables without a bytea and a few with bytea. There is a 
large performance difference with inserts between the two. I'm inserting 
a byte[] that's usually less than 1MB on content. The content itself is 
actually just utf8 string data.


For the non-bytea table, inserts can be as high as 40k rows/sec, whereas 
the bytea table is closer to 4k/sec or less.


If this is just a limitation of postgres, then that's fine but the 
performance delta is so significant that i feel like i'm missing something


It would help to have more information:

1) The schema of the table e.g. the output of \d in psql.

2) The actual INSERT query.

3) An EXPLAIN ANALYZE of the INSERT query.


--
Adrian Klaver
adrian.kla...@aklaver.com




Re: slow insert speeds with bytea

2019-12-02 Thread Steve Atkins



On 01/12/2019 17:59, Alex O'Ree wrote:
Is there anything I can to increase insert speeds for bytea? Currently 
running postgres 9.6.15


I have a few tables without a bytea and a few with bytea. There is a 
large performance difference with inserts between the two. I'm 
inserting a byte[] that's usually less than 1MB on content. The 
content itself is actually just utf8 string data.


For the non-bytea table, inserts can be as high as 40k rows/sec, 
whereas the bytea table is closer to 4k/sec or less.


4k 1MB rows/sec would be 4GB a second. It would need to be a fairly 
decent IO system to manage that speed, let alone ten times that.


What's the typical row size of the tables with bytea fields vs the other 
tables?


What are your IO stats during the two sorts of insert?

Cheers,
  Steve




Re: using replace function

2019-12-02 Thread Antonio Silva
Hi Osvaldo, Rod, David and Tom

Thanks for your attention.  Lastly it is not that difficult.

For me the more natural way is Rod's suggestion, to use replace with where.
I don't know why I didn't think of this solution.

The usage of regexp_replace is very nice too and I will take a time to
learn it.

Thanks a lot to you all

Antonio

Em sex., 29 de nov. de 2019 às 01:22, Tom Lane  escreveu:

> "David G. Johnston"  writes:
> > On Thursday, November 28, 2019, Rob Sargent 
> wrote:
> >> I want to replace a string (not a substring) in a field but making sure
> >> that the string in the full field.
>
> > I’d probably do something like:
> > Select case when a.col = ‘value’ then ‘new value’ else a.col end from a;
>
> Yeah, this.  You could wrap it up in a SQL function if you want the
> same level of notational convenience as replace().
>
> Another possibility is regexp_replace with an anchored pattern, but
> that would potentially require escaping regexp metacharacters in the
> pattern, so the PITA factor is high.  And I doubt it'd be faster than
> the CASE solution.
>
> regards, tom lane
>


-- 
Antônio Olinto Ávila da Silva
Biólogo / Oceanógrafo
Instituto de Pesca (Fisheries Institute)
São Paulo, Brasil


Preformance on upcoming Linux kernel

2019-12-02 Thread stan
Just wanted to point out this article:


https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.5-Early-Performance

Looks like something has changed between 5.4 and 5.5 that is degrading
postgresql performance.


-- 
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin




Upgrading from V11 to V12 on Debian install

2019-12-02 Thread stan
I have several machines that have version 11 instances on them. These are
all Debian or Ubuntu machines. All of them are pointed to the Postgres
repositories fro the Postgres binaries

deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main

dpkg -l shows:

ii  postgresql-11 11.6-1.pgdg100+1
ii  postgresql-12 12.1-1.pgdg100+1
ii  postgresql-client-11  11.6-1.pgdg100+1
ii  postgresql-client-12  12.1-1.pgdg100+1   
ii  postgresql-client-common  210.pgdg100+1  
ii  postgresql-common 210.pgdg100+1 
ii  postgresql-contrib12+210.pgdg100+1
ii  postgresql-server-dev-11  11.6-1.pgdg100+1  


Googling around, I found this link:

https://scottlinux.com/2015/11/14/upgrade-postgresql-on-debian-jessie-with-pg_upgradecluster/

I plan on upgrading one of the test instances right away. Is the
information in this link the best way to do this in a Debian environment? I
did look at the  pg_upgrade documentation, and it looks like I could really
make a mess if I misunderstand even one step.



 
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin




Re: Upgrading from V11 to V12 on Debian install

2019-12-02 Thread Adrian Klaver

On 12/2/19 11:28 AM, stan wrote:

I have several machines that have version 11 instances on them. These are
all Debian or Ubuntu machines. All of them are pointed to the Postgres
repositories fro the Postgres binaries

deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main

dpkg -l shows:

ii  postgresql-11 11.6-1.pgdg100+1
ii  postgresql-12 12.1-1.pgdg100+1
ii  postgresql-client-11  11.6-1.pgdg100+1
ii  postgresql-client-12  12.1-1.pgdg100+1
ii  postgresql-client-common  210.pgdg100+1
ii  postgresql-common 210.pgdg100+1
ii  postgresql-contrib12+210.pgdg100+1
ii  postgresql-server-dev-11  11.6-1.pgdg100+1


Googling around, I found this link:

https://scottlinux.com/2015/11/14/upgrade-postgresql-on-debian-jessie-with-pg_upgradecluster/

I plan on upgrading one of the test instances right away. Is the
information in this link the best way to do this in a Debian environment? I
did look at the  pg_upgrade documentation, and it looks like I could really
make a mess if I misunderstand even one step.


Yes as pointed out previously:

https://www.postgresql.org/message-id/20191107161401.ga24...@hjp.at

For more info:

man pg_upgradecluster





  
"They that would give up essential liberty for temporary safety deserve

neither liberty nor safety."
-- Benjamin Franklin






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Connection terminated but client didn't realise

2019-12-02 Thread David Wheeler
>  Is the application remote from the database server?  My gut reaction to this 
> type of report is "something timed out the network connection", but there 
> would have to be a router or firewall or the like in between to make that a 
> tenable explanation.
>  If that is the issue, you should be able to fix it by making the server's 
> TCP keepalive settings more aggressive.

Yes the application server is separate from the database server, and the 
application is running within docker which I suspect adds some complexity too. 
I had suspicions about something in the middle closing the connection too, but 
your email has clarified my thinking a bit. 

TCP Keepalive appears to be enabled on the application server and within 
docker, and the client holds the allegedly dead connection for much longer 
(24h) than the keepalive should take to kill it (<3h), so I think the next step 
is to try to identify the connection at the OS level with netstat to see what 
state it's in. 

Thanks for your help. 


Regards, 
 
David

On 2/12/19, 11:17 pm, "Tom Lane"  wrote:

David Wheeler  writes:
> We have a query that our system runs nightly to refresh materialised 
views. This takes some time to execute (~25 minutes) and then it will usually 
return the results to the application and everything is golden. However 
occasionally we see something like the below, where the query finishes, but the 
connection gets unexpectedly closed from Postgres’ perspective. From the 
application’s perspective the connection is still alive, and it sits there 
forever waiting for the result.

Is the application remote from the database server?  My gut reaction
to this type of report is "something timed out the network connection",
but there would have to be a router or firewall or the like in between
to make that a tenable explanation.

If that is the issue, you should be able to fix it by making the
server's TCP keepalive settings more aggressive.

regards, tom lane