Re: Could not resolve host name error in psycopg2

2020-04-17 Thread Paul Förster
Hi Adrian,

> On 17. Apr, 2020, at 03:00, Adrian Klaver  wrote:
> 
> Huh? Leaving open connections is not considered a good thing. In other words 
> a connection should last for as long as it takes to get it's task done and 
> then it should close.

I basically agree on this, but there are two big "but"s:

- recurring monitoring connections flood the logs unless they connect and never 
disconnect again.

- applications with hundreds or thousands of users may flood the logs, even 
though a pool may be used. If said pool doesn't keep its connections open most 
of the time you will notice that the database cluster is very busy logging 
connections.

Do you really want that?

Cheers,
Paul



Pgbackrest restore options

2020-04-17 Thread Konireddy Rajashekar
Hi Teaam,

Doubt regarding the backrest restore options,

is there any way to ensure the backups are good as expected, instead of
restoring the whole cluster or database is there any other possibility to
ensure it by restoring a single file?  or any other such mechanism

Thanks,
Raj


Is there a script and sample data to test the performance of postgresql?

2020-04-17 Thread Kiwoong Choi
Hi,
Is there a script and sample data to test the performance of postgresql?

Regards,
Kiwoong






Re: Using unlogged tables for web sessions

2020-04-17 Thread Steve Atkins



On 16/04/2020 19:39, Stephen Carboni wrote:

Hello.

I was wondering if anyone was using unlogged tables for website
sessions in production. I'm interested if it breaks the prevailing
opinion that you don't put sessions in PG.


I generally put sessions in postgresql, with regular tables, when I'm 
using persistent server-side sessions, rather than just stashing all the 
user data in a signed cookie or local storage.


It ... works fine? It drastically simplifies app deployment to have a 
single point of persistent storage, and having one that you can easily 
interrogate by hand simplifies development and debugging. Reads are a 
single indexed query, writes are typically HOT, so the IO and overhead 
aren't drastically different from any other persistent store.


A lot of webapp development advice is based on "Ah, but how will you 
scale it up to Facebook levels of traffic?" and then skipping over the 
answer "Um, I won't ever need to. And if I did I'd have enough revenue 
to hire someone very, very good to rearchitect my session storage.".


Cheers,
  Steve






Re: Is there a script and sample data to test the performance of postgresql?

2020-04-17 Thread David Rowley
On Fri, 17 Apr 2020 at 20:13, Kiwoong Choi  wrote:
> Is there a script and sample data to test the performance of postgresql?

There is pgbench.  https://www.postgresql.org/docs/current/pgbench.html

David




easy way to acquire height / width from images (PNG, JPEG) stored as bytea?

2020-04-17 Thread Achilleas Mantzios

Hello Dear List,

we have a table holding email attachments as bytea, and we would like to 
filter out images of small dimensions, which are not of any value to our 
logic.


I took a look at pg_image extension, tested it, and it proved 
problematic, it killed my 200+ days uptime FreeBSD box :( . I dropped 
the extension and uninstalled this as soon as fsck finally finished.


So I would like to ask you, basically we have PNGs and JPEGs, is there 
an easy way of parsing their headers and getting info about their 
dimensions?


I could write a C function for that. For PNG it is quite easy but for 
JPEG it gets a little bit complicated, albeit doable, just asking for 
something out of the box. Currently we load images (in our java 
enterprise system) and filter them in Java, but this brings wildfly down 
to its knees pretty easy and quickly.



Thank you and happy Easter.





Re: easy way to acquire height / width from images (PNG, JPEG) stored as bytea?

2020-04-17 Thread Adam Brusselback
Why not extract and store that metadata with the image rather than trying
to extract it to filter on at query time? That way you can index your
height and width columns to speed up that filtering if necessary.

You may be able to write a wrapper for a command line tool like imagemagic
or something so you can call that from a function to determine the size if
you did want to stick with extracting that at query time.


Re: performance of first exec of prepared statement

2020-04-17 Thread Ted Toth
On Thu, Apr 16, 2020 at 8:09 PM Rob Sargent  wrote:

> On 4/16/20 6:15 PM, Adrian Klaver wrote:
> > On 4/16/20 4:59 PM, Ted Toth wrote:
> >>
> >>
> >> On Thu, Apr 16, 2020 at 6:29 PM Ted Toth  >> > wrote:
> >>
> >> I've noticed that the first exec of an INSERT prepared statement
> >> takes ~5 time longer (I'm using libpq in C and wrapping the calls to
> >> time them) then subsequent exec's is this the expected behavior and
> >> if so is there any thing I can do to mitigate this affect?
> >>
> >> Ted
> >>
> >>
> >> For example (in my environment) I'm seeing the prepare take ~10ms,
> >> the first exec take ~30 ms and subsequent exec's take ~4 ms.
> >>
> >
> > I don't have an answer. I believe though that to help those that might
> > it would be helpful to show the actual code.
> >
> >
> You expect the subsequent calls to benefit from the cached query parse
> and planning.  What does you query cost without begin wrapped in a
> prepared statement (preferably from a cold start).
>
>
>
> I thought that's what the PQprepare call was supposed to do i.e.
parsing/planning.

It's a bit difficult to get an unprepared query cost since there are a lot
of columns :(
#define INSERT_SQL "INSERT INTO t () VALUES
($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87);"


Replication issue

2020-04-17 Thread Sonam Sharma
I have setup db replication. And added below parameter in jdbc file . The
connection is failing and it's throwing error like db doesn't exist on
secondary server.

jdbc:postgresql://:50001,
172.29.195.216:50001/crowd?targetServerType=master&reWriteBatchedInserts=true



.PSQLException: FATAL: database ",172.29.195.216 <+17229195216>:50001/crowd"
does not exist


Can someone help on this issue?


Re: performance of first exec of prepared statement

2020-04-17 Thread Ted Toth
On Fri, Apr 17, 2020 at 8:28 AM Ted Toth  wrote:

>
>
> On Thu, Apr 16, 2020 at 8:09 PM Rob Sargent  wrote:
>
>> On 4/16/20 6:15 PM, Adrian Klaver wrote:
>> > On 4/16/20 4:59 PM, Ted Toth wrote:
>> >>
>> >>
>> >> On Thu, Apr 16, 2020 at 6:29 PM Ted Toth > >> > wrote:
>> >>
>> >> I've noticed that the first exec of an INSERT prepared statement
>> >> takes ~5 time longer (I'm using libpq in C and wrapping the calls
>> to
>> >> time them) then subsequent exec's is this the expected behavior and
>> >> if so is there any thing I can do to mitigate this affect?
>> >>
>> >> Ted
>> >>
>> >>
>> >> For example (in my environment) I'm seeing the prepare take ~10ms,
>> >> the first exec take ~30 ms and subsequent exec's take ~4 ms.
>> >>
>> >
>> > I don't have an answer. I believe though that to help those that might
>> > it would be helpful to show the actual code.
>> >
>> >
>> You expect the subsequent calls to benefit from the cached query parse
>> and planning.  What does you query cost without begin wrapped in a
>> prepared statement (preferably from a cold start).
>>
>>
>>
>> I thought that's what the PQprepare call was supposed to do i.e.
> parsing/planning.
>
> It's a bit difficult to get an unprepared query cost since there are a lot
> of columns :(
> #define INSERT_SQL "INSERT INTO t () VALUES
> ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87);"
>
> Ah prepare does the parsing and execute does the planning.

Another related question is that my implementation uses strings for all
values is there a performance benefit to using actual values and specifying
their datatype?


Re: performance of first exec of prepared statement

2020-04-17 Thread Rob Sargent


> On Apr 17, 2020, at 7:34 AM, Ted Toth  wrote:
> 
> 
> 
> 
>> On Fri, Apr 17, 2020 at 8:28 AM Ted Toth  wrote:
>> 
>> 
>>> On Thu, Apr 16, 2020 at 8:09 PM Rob Sargent  wrote:
>>> On 4/16/20 6:15 PM, Adrian Klaver wrote:
>>> > On 4/16/20 4:59 PM, Ted Toth wrote:
>>> >>
>>> >>
>>> >> On Thu, Apr 16, 2020 at 6:29 PM Ted Toth >> >> > wrote:
>>> >>
>>> >> I've noticed that the first exec of an INSERT prepared statement
>>> >> takes ~5 time longer (I'm using libpq in C and wrapping the calls to
>>> >> time them) then subsequent exec's is this the expected behavior and
>>> >> if so is there any thing I can do to mitigate this affect?
>>> >>
>>> >> Ted
>>> >>
>>> >>
>>> >> For example (in my environment) I'm seeing the prepare take ~10ms, 
>>> >> the first exec take ~30 ms and subsequent exec's take ~4 ms.
>>> >>
>>> >
>>> > I don't have an answer. I believe though that to help those that might 
>>> > it would be helpful to show the actual code.
>>> >
>>> >
>>> You expect the subsequent calls to benefit from the cached query parse 
>>> and planning.  What does you query cost without begin wrapped in a 
>>> prepared statement (preferably from a cold start).
>>> 
>>> 
>>> 
>> I thought that's what the PQprepare call was supposed to do i.e. 
>> parsing/planning.
>> 
>> It's a bit difficult to get an unprepared query cost since there are a lot 
>> of columns :(
>> #define INSERT_SQL "INSERT INTO t () VALUES 
>> ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87);"
>>  
>> 
> Ah prepare does the parsing and execute does the planning.
> 
> Another related question is that my implementation uses strings for all 
> values is there a performance benefit to using actual values and specifying 
> their datatype?

The cost in this example is 87-N toSting() calls on client plus 87-N 
valueOf(Sting) calls on server where N is the number of textual columns
> 


Re: Replication issue

2020-04-17 Thread Adrian Klaver

On 4/17/20 6:31 AM, Sonam Sharma wrote:
I have setup db replication. And added below parameter in jdbc file . 
The connection is failing and it's throwing error like db doesn't exist 
on secondary server.


jdbc:postgresql://:50001,172.29.195.216:50001/crowd?targetServerType=master&reWriteBatchedInserts=true 



.PSQLException: FATAL: database ",172.29.195.216 
:50001/crowd" does not exist



Can someone help on this issue?



Set up replication using what?

The URL above does not look correct to me. Pretty sure the ',' does not 
belong there.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Could not resolve host name error in psycopg2

2020-04-17 Thread Adrian Klaver

On 4/17/20 12:02 AM, Paul Förster wrote:

Hi Adrian,


On 17. Apr, 2020, at 03:00, Adrian Klaver  wrote:

Huh? Leaving open connections is not considered a good thing. In other words a 
connection should last for as long as it takes to get it's task done and then 
it should close.


I basically agree on this, but there are two big "but"s:

- recurring monitoring connections flood the logs unless they connect and never 
disconnect again.

- applications with hundreds or thousands of users may flood the logs, even 
though a pool may be used. If said pool doesn't keep its connections open most 
of the time you will notice that the database cluster is very busy logging 
connections.


But most pools can grow and shrink in response to demand, so at some 
point there are connect/disconnect cycles.




Do you really want that?


No. The issue at hand though was the idea that an application(Django in 
this case) would open a connection once and never reconnect. That is 
unrealistic.




Cheers,
Paul




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: easy way to acquire height / width from images (PNG, JPEG) stored as bytea?

2020-04-17 Thread Achilleas Mantzios

On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:

Why not extract and store that metadata with the image rather than 
trying to extract it to filter on at query time? That way you can 
index your height and width columns to speed up that filtering if 
necessary.


Yes I thought of that, but those are coming automatically from our mail 
server (via synonym), we have written an alias : a program that parses 
and stores emails. This is generic, I wouldn't like to add specific code 
(or specific columns)  just for image attachments. However I dig the 
idea of the indexes.
You may be able to write a wrapper for a command line tool like 
imagemagic or something so you can call that from a function to 
determine the size if you did want to stick with extracting that at 
query time.
As I describe above, those attachments are nowhere as files. They are 
email attachments. Also we got about half TB of them.


Re: Replication issue

2020-04-17 Thread Julien Rouhaud
On Fri, Apr 17, 2020 at 4:02 PM Adrian Klaver  wrote:
>
> On 4/17/20 6:31 AM, Sonam Sharma wrote:
> > I have setup db replication. And added below parameter in jdbc file .
> > The connection is failing and it's throwing error like db doesn't exist
> > on secondary server.
> >
> > jdbc:postgresql://:50001,172.29.195.216:50001/crowd?targetServerType=master&reWriteBatchedInserts=true
> > 
> >
> > .PSQLException: FATAL: database ",172.29.195.216
> > :50001/crowd" does not exist
> >
> >
> > Can someone help on this issue?
> >
>
> Set up replication using what?
>
> The URL above does not look correct to me. Pretty sure the ',' does not
> belong there.

This was probably intended to be the "connection failover" feature
(https://jdbc.postgresql.org/documentation/head/connect.html).  I'm
wondering if the issue is just that the driver is too old to support
that, but I have no idea when this was introduced in jdbc.




Re: Could not resolve host name error in psycopg2

2020-04-17 Thread Paul Förster
Hi Adrian,

> On 17. Apr, 2020, at 16:10, Adrian Klaver  wrote:
> 
> But most pools can grow and shrink in response to demand, so at some point 
> there are connect/disconnect cycles.

yes, but it is a difference whether you see occasional growing and shrinking 
pool behavior, or the logs are flooded with connect/disconnect messages by 
short client connects. A relatively small number of connect/disconnect messages 
coming from a pool is mostly acceptable while huge numbers are not.

And as for monitoring applications, I would be a big fan of a parameter like 
log_exclude_users='user1,user2,...' to list usernames who should not appear in 
log files.

Cheers,
Paul



Re: easy way to acquire height / width from images (PNG, JPEG) stored as bytea?

2020-04-17 Thread Steve Atkins



On 17/04/2020 13:37, Achilleas Mantzios wrote:

Hello Dear List,

we have a table holding email attachments as bytea, and we would like 
to filter out images of small dimensions, which are not of any value 
to our logic.


I took a look at pg_image extension, tested it, and it proved 
problematic, it killed my 200+ days uptime FreeBSD box :( . I dropped 
the extension and uninstalled this as soon as fsck finally finished.


If running an extension crashed your server you should look at how / 
why, especially if it corrupted your filesystem.


That shouldn't happen on a correctly configured system, so the 
underlying issue might cause you other problems. Crashing postgresql, 
sure, but not anything that impacts the rest of the server.


Cheers,
  Steve





Re: Replication issue

2020-04-17 Thread Sonam Sharma
Yes, you are right. It's for connection failover from application end. I
have set up db replication using repmgr. Version : 9.5



On Fri, Apr 17, 2020, 8:15 PM Julien Rouhaud  wrote:

> On Fri, Apr 17, 2020 at 4:02 PM Adrian Klaver 
> wrote:
> >
> > On 4/17/20 6:31 AM, Sonam Sharma wrote:
> > > I have setup db replication. And added below parameter in jdbc file .
> > > The connection is failing and it's throwing error like db doesn't exist
> > > on secondary server.
> > >
> > > jdbc:postgresql://:50001,
> 172.29.195.216:50001/crowd?targetServerType=master&reWriteBatchedInserts=true
> > > <
> http://172.29.195.216:50001/crowd?targetServerType=master&reWriteBatchedInserts=true
> >
> > >
> > > .PSQLException: FATAL: database ",172.29.195.216
> > > :50001/crowd" does not exist
> > >
> > >
> > > Can someone help on this issue?
> > >
> >
> > Set up replication using what?
> >
> > The URL above does not look correct to me. Pretty sure the ',' does not
> > belong there.
>
> This was probably intended to be the "connection failover" feature
> (https://jdbc.postgresql.org/documentation/head/connect.html).  I'm
> wondering if the issue is just that the driver is too old to support
> that, but I have no idea when this was introduced in jdbc.
>


Re: easy way to acquire height / width from images (PNG, JPEG) stored as bytea?

2020-04-17 Thread Achilleas Mantzios

On 17/4/20 5:47 μ.μ., Steve Atkins wrote:



If running an extension crashed your server you should look at how / 
why, especially if it corrupted your filesystem.
That shouldn't happen on a correctly configured system, so the 
underlying issue might cause you other problems. Crashing postgresql, 
sure, but not anything that impacts the rest of the server.


Hello, This machine runs several extensions with no issues (even pljava 
for Christ's sake, our heavy modified version of DBMirror, and lots of 
our own C functions included among others), two bhyve VMs running 
ubuntu, and one jail. + it functions as my workstation as well (wildfly, 
eclipse, etc). And it can run for years, without reboot.


Apparently lousy memory management (consumed all 32GB of RAM + 8GB swap) 
by pg_image didn't crash postgresql but brought the system to its knees. 
Plus this extension was lastly touched in 2013, go figure.



Cheers,
  Steve








Re: easy way to acquire height / width from images (PNG, JPEG) stored as bytea?

2020-04-17 Thread Imre Samu
> it killed my 200+ days uptime FreeBSD box :( .
> As I describe above, those attachments are nowhere as files.
> They are email attachments. Also we got about half TB of them.

it is possible - that  some image is a "decompression bomb" ?

*"Because of the efficient compression method used in Portable Network
Graphics (PNG) files, a small PNG file can expand tremendously, acting as a
"decompression bomb". Malformed PNG chunks can consume a large amount of
CPU and wall-clock time and large amounts of memory, up to all memory
available on a system, causing a Denial of Service (DoS). Libpng-1.4.1 has
been revised to use less CPU time and memory, and provides functions that
applications can use to further defend against such files."*
https://libpng.sourceforge.io/decompression_bombs.html
https://stackoverflow.com/questions/33602308/how-to-check-png-file-if-its-a-decompression-bomb

Regards,
 Imre


Achilleas Mantzios  ezt írta (időpont: 2020.
ápr. 17., P, 16:39):

> On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:
>
> Why not extract and store that metadata with the image rather than trying
> to extract it to filter on at query time? That way you can index your
> height and width columns to speed up that filtering if necessary.
>
> Yes I thought of that, but those are coming automatically from our mail
> server (via synonym), we have written an alias : a program that parses and
> stores emails. This is generic, I wouldn't like to add specific code (or
> specific columns)  just for image attachments. However I dig the idea of
> the indexes.
>
> You may be able to write a wrapper for a command line tool like imagemagic
> or something so you can call that from a function to determine the size if
> you did want to stick with extracting that at query time.
>
> As I describe above, those attachments are nowhere as files. They are
> email attachments. Also we got about half TB of them.
>


Re: easy way to acquire height / width from images (PNG, JPEG) stored as bytea?

2020-04-17 Thread Achilleas Mantzios


On 17/4/20 6:14 μ.μ., Imre Samu wrote:

> it killed my 200+ days uptime FreeBSD box :( .
> As I describe above, those attachments are nowhere as files.
> They are email attachments. Also we got about half TB of them.

it is possible - that  some image is a "decompression bomb" ?

/"Because of the efficient compression method used in Portable Network 
Graphics (PNG) files, a small PNG file can expand tremendously, acting 
as a "decompression bomb". *Malformed PNG chunks can consume a large 
amount of CPU and wall-clock time and large amounts of memory, up to 
all memory available on a system, causing a Denial of Service (DoS).* 
Libpng-1.4.1 has been revised to use less CPU time and memory, and 
provides functions that applications can use to further defend against 
such files."/

https://libpng.sourceforge.io/decompression_bombs.html
https://stackoverflow.com/questions/33602308/how-to-check-png-file-if-its-a-decompression-bomb

Regards,
 Imre



Thank you  a lot Imre. Great info.





Achilleas Mantzios > ezt írta (időpont: 2020. ápr. 
17., P, 16:39):


On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:


Why not extract and store that metadata with the image rather
than trying to extract it to filter on at query time? That way
you can index your height and width columns to speed up that
filtering if necessary.


Yes I thought of that, but those are coming automatically from our
mail server (via synonym), we have written an alias : a program
that parses and stores emails. This is generic, I wouldn't like to
add specific code (or specific columns) just for image
attachments. However I dig the idea of the indexes.

You may be able to write a wrapper for a command line tool like
imagemagic or something so you can call that from a function to
determine the size if you did want to stick with extracting that
at query time.

As I describe above, those attachments are nowhere as files. They
are email attachments. Also we got about half TB of them.



Re: Using unlogged tables for web sessions

2020-04-17 Thread John DeSoi



> On Apr 16, 2020, at 1:39 PM, Stephen Carboni  
> wrote:
> 
> I was wondering if anyone was using unlogged tables for website
> sessions in production. I'm interested if it breaks the prevailing
> opinion that you don't put sessions in PG.

Popular CMS systems like Drupal use the database for web sessions. Probably 
only logged tables unless you did some extra customization of the installation. 
I'm not aware of any issues with it performance or otherwise.

John DeSoi, Ph.D.