date:20240917

Re: Failing to allocate memory when I think it shouldn't

2024-09-17 Thread Thomas Ziegler

Hello Christoph,

Thanks for your answer and the suggestions, it already helped me out a lot!

On 2024-09-14 22:11, Christoph Moench-Tegeder wrote:

Hi,

## Thomas Ziegler (thomas.zieg...@holmsecurity.com):

There's a lot of information missing here. Let's start from the top.

I have had my database killed by the kernel oom-killer. After that I
set turned off memory over-committing and that is where things got weird.

What exactly did you set? When playing with vm.overcommit, did you
understand "Committed Address Space" and the workings of the
overcommit accounting? This is the document:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/mm/overcommit-accounting.rst
Hint: when setting overcommit_memory=2 you might end up with way
less available adress space than you thought you would. Also keep
an eye on /proc/meminfo - it's sometimes hard to estimate "just off
your cuff" what's in memory and how it's mapped. (Also, anything
else on that machine which might hog memory?).

I set overcommit_memory=2, but completely missed 'overcommit_ratio'.
That is most probably why the database got denied the RAM a lot sooner
than I expected.

Finally, there's this:

2024-09-12 05:18:36.073 UTC [1932776] LOG: background worker "parallel worker"
(PID 3808076) exited with exit code 1
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
2024-09-12 05:18:36.083 UTC [1932776] LOG: background worker "parallel worker"
(PID 3808077) was terminated by signal 6: Aborted

That "std::bad_alloc" sounds a lot like C++ and not like the C our
database is written in. My first suspicion would be that you're using
LLVM-JIT (unless you have other - maybe even your own - C++ extensions
in the database?) and that in itself can use a good chunk of memory.
And it looks like that exception bubbled up as a signal 6 (SIGABRT)
which made the process terminate immediately without any cleanup,
and after that the server has no other chance than to crash-restart.

Except for pgAudit, I don't have any extensions, so it is probably the
JIT. I had no idea there was a JIT, even it should have been obvious.
Thanks for pointing this out!

Is the memory the JIT takes limited by 'work_mem' or will it just take
as much memory as it needs?

I recommend starting with understanding the actual memory limits
as set by your configuration (personally I believe that memory
overcommit is less evil than some people think). Have a close look
at /proc/meminfo and if possible disable JIT and check if it changes
anything. Also if possible try starting with only a few active
connections and increase load carefully once a steady state (in
terms of memory usage) has been reached.

Yes, understanding the memory limits is what I was trying to do.
I was questioning my understanding but it seems it was Linux that
tripped me,

or more like my lack of understanding there, rather than the database.
Memory management and /proc/meminfo still manages to confuse me.

Again, thanks for your help!

Cheers,
Thomas

p.s.: To anybody who stumbles upon this in the future,
if you set `overcommit_memory=2`, don't forget `overcommit_ratio`.

Re: update faster way

2024-09-17 Thread Alvaro Herrera

On 2024-Sep-14, yudhi s wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows.

Another option is to not update anything, and instead create a view on
top of the partitioned table (joined to the reference table) that
returns the reference value instead of the original number value from
the column; when the application wants to receive those reference
values, it queries the view instead of the partitioned table directly.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)

question on plain pg_dump file usage

2024-09-17 Thread Zwettler Markus (OIZ)

I have to do an out-of-place Postgres migration from PG12 to PG16 using:

pg_dump -F p -f dump.sql ...
sed -i "s/old_name/new_name/g"
psql -f dump.sql ...

Both databases are on UTF-8.

I wonder if there could be character set conversion errors here, as the data is 
temporarily written to a plain text file.

Thanks, Markus

Re: load fom csv

2024-09-17 Thread Andy Hartman

I have bad data in an Int field...

 Thanks for your help.

On Tue, Sep 17, 2024 at 1:55 AM Muhammad Usman Khan 
wrote:

> Hi,
> Try the following options:
>
>
>- Check if psql is working independently:
>psql -h $pgServer -d $pgDatabase -U $pgUser -c "SELECT 1;"
>- Check for permission issues on the CSV file
>- Run the command manually without variables
>psql -h your_host -d your_db -U your_user -c "\COPY your_table FROM
>'C:/path/to/your/file.csv' DELIMITER ',' CSV HEADER;"
>- set a timeout using the PGCONNECT_TIMEOUT environment variable:
>$env:PGCONNECT_TIMEOUT=30
>
>
>
> On Mon, 16 Sept 2024 at 20:35, Andy Hartman 
> wrote:
>
>> I'm trying to run this piece of code from Powershell and it just sits
>> there and never comes back. There are only 131 records in the csv.
>>
>> $connectionString =
>> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
>> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>>
>> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>>
>>
>> how can I debug this?
>>
>> Table layout
>>
>> [image: image.png]
>>
>

Re: question on plain pg_dump file usage

2024-09-17 Thread Ron Johnson

On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) <
markus.zwett...@zuerich.ch> wrote:

> I have to do an out-of-place Postgres migration from PG12 to PG16 using:
>
>
>
> pg_dump -F p -f dump.sql …
>
> sed -i "s/old_name/new_name/g"
>
> psql -f dump.sql …
>
>
>
> Both databases are on UTF-8.
>
>
>
> I wonder if there could be character set conversion errors here, as the
> data is temporarily written to a plain text file.
>
>
Why must it be a plain text dump instead of a custom or directory dump?
Restoring to a new (and differently-named( database is perfectly doable.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: IO related waits

2024-09-17 Thread Greg Sabino Mullane

On Mon, Sep 16, 2024 at 11:56 PM veem v  wrote:

> So what can be the caveats in this approach, considering transactions
> meant to be ACID compliant as financial transactions.
>

Financial transactions need to be handled with care. Only you know your
business requirements, but as Christophe pointed out, disabling
synchronous commit means your application may think a particular
transaction has completed when it has not. Usually that's a big red flag
for financial applications.

we are using row by row transaction processing for inserting data into the
> postgres database and commit is performed for each row.

This is a better place to optimize. Batch many rows per transaction. Remove
unused indexes.

flushing of the WAL to the disk has to happen anyway(just that it will be
> delayed now), so can this method cause contention in the database storage
> side if the speed in which the data gets ingested from the client is not
> getting written to the disk , and if it can someway impact the data
> consistency for the read queries?
>

Not quite clear what you are asking here re data consistency. The data will
always be consistent, even if synchronous_commit is disabled. The only
danger window is on a server crash.

(Keep in mind that RDS is not Postgres, so take tuning recommendations and
advice with a grain of salt.)

Cheers,
Greg

WG: [Extern] Re: question on plain pg_dump file usage

2024-09-17 Thread Zwettler Markus (OIZ)


Von: Ron Johnson 
Gesendet: Dienstag, 17. September 2024 14:44
An: PG-General Mailing List 
Betreff: [Extern] Re: question on plain pg_dump file usage

On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) 
mailto:markus.zwett...@zuerich.ch>> wrote:
I have to do an out-of-place Postgres migration from PG12 to PG16 using:

pg_dump -F p -f dump.sql …
sed -i "s/old_name/new_name/g"
psql -f dump.sql …

Both databases are on UTF-8.

I wonder if there could be character set conversion errors here, as the data is 
temporarily written to a plain text file.

Why must it be a plain text dump instead of a custom or directory dump?  
Restoring to a new (and differently-named( database is perfectly doable.

--
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

--- Externe Email: Vorsicht mit Anhängen, Links oder dem Preisgeben von 
Informationen ---

Because I can simply change the application from the old to the new structure 
then.

-Regards, Markus

Re: question on plain pg_dump file usage

2024-09-17 Thread Greg Sabino Mullane

On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) <
markus.zwett...@zuerich.ch> wrote:

> pg_dump -F p -f dump.sql …
>
> sed -i "s/old_name/new_name/g"
>
> psql -f dump.sql …
>

Why not rename afterwards? Just "pg_dump mydb | psql -h newhost -f -" and
rename things via ALTER. Certainly much safer than a global replace via sed.

Cheers,
Greg

WG: question on plain pg_dump file usage

2024-09-17 Thread Zwettler Markus (OIZ)


Von: Zwettler Markus (OIZ) 
Gesendet: Dienstag, 17. September 2024 14:22
An: PG-General Mailing List 
Betreff: question on plain pg_dump file usage

I have to do an out-of-place Postgres migration from PG12 to PG16 using:

pg_dump -F p -f dump.sql ...
sed -i "s/old_name/new_name/g"
psql -f dump.sql ...

Both databases are on UTF-8.

I wonder if there could be character set conversion errors here, as the data is 
temporarily written to a plain text file.

Thanks, Markus




I found that the tempory plain text file is also encoded in UTF-8.

There should not be any problem so. Agreed?

$ file -bi dump.sql
text/plain; charset=utf-8

Re: question on plain pg_dump file usage

2024-09-17 Thread Tom Lane

"Zwettler Markus (OIZ)"  writes:
> I have to do an out-of-place Postgres migration from PG12 to PG16 using:
> pg_dump -F p -f dump.sql ...
> sed -i "s/old_name/new_name/g"
> psql -f dump.sql ...

> Both databases are on UTF-8.

> I wonder if there could be character set conversion errors here, as the data 
> is temporarily written to a plain text file.

The dump script will include a "SET client_encoding" command to
prevent that.

regards, tom lane

Re: question on plain pg_dump file usage

2024-09-17 Thread Adrian Klaver


On 9/17/24 05:43, Ron Johnson wrote:
On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) 
mailto:markus.zwett...@zuerich.ch>> wrote:





Why must it be a plain text dump instead of a custom or directory dump?  
Restoring to a new (and differently-named( database is perfectly doable.


Because of this:

sed -i "s/old_name/new_name/g"



--
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

2024-09-17 Thread Adrian Klaver


On 9/17/24 08:13, Andy Hartman wrote:

Still when I try to run from my powershell script it hangs...



And the Postgres log shows?

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

2024-09-17 Thread Andy Hartman

I don't see anything in LOG

On Tue, Sep 17, 2024 at 11:23 AM Adrian Klaver 
wrote:

> On 9/17/24 08:13, Andy Hartman wrote:
> > Still when I try to run from my powershell script it hangs...
> >
>
> And the Postgres log shows?
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: load fom csv

2024-09-17 Thread Adrian Klaver

On 9/17/24 08:35, Andy Hartman wrote:

I don't see anything in LOG

1) Are you logging connections/disconnection per?:

https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

If not do so as it will show you if a connection is being attempted.

2) Do the commands work when run in psql or supplied directly to psql 
outside of PowerShell?

On Tue, Sep 17, 2024 at 11:23 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:

On 9/17/24 08:13, Andy Hartman wrote:
 > Still when I try to run from my powershell script it hangs...
 >

And the Postgres log shows?

-- 
Adrian Klaver

adrian.kla...@aklaver.com 

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: IO related waits

2024-09-17 Thread veem v

On Tue, 17 Sept 2024 at 18:43, Greg Sabino Mullane 
wrote:

>
> This is a better place to optimize. Batch many rows per transaction.
> Remove unused indexes.
>
> flushing of the WAL to the disk has to happen anyway(just that it will be
>> delayed now), so can this method cause contention in the database storage
>> side if the speed in which the data gets ingested from the client is not
>> getting written to the disk , and if it can someway impact the data
>> consistency for the read queries?
>>
>
> Not quite clear what you are asking here re data consistency. The data
> will always be consistent, even if synchronous_commit is disabled. The only
> danger window is on a server crash.
>
> (Keep in mind that RDS is not Postgres, so take tuning recommendations and
> advice with a grain of salt.)
>
>
>
Thank you Greg.

Yes, our Java application was doing row by row commit and we saw that from
pg_stat_database from the column "xact_commit" which was closely the same
as the sum of tup_inserted, tup_updated, tup_deleted column. And also we
verified in pg_stats_statements the number against the "calls" column were
matching to the "rows" column for the INSERT queries, so it means also we
are inserting exactly same number of rows as the number of DB calls, so it
also suggest that we are doing row by row operations/dmls.

 And we then asked the application tema to make the inserts in batches, but
still seeing those figures in these above two views are not changing much
the number "xact_commit" is staying almost same and also even the "calls"
and the "rows" column in pg_stats_statements also staying almost same. So
does it mean that the application code is somehow still doing the same row
by row processing or we are doing something wrong in the above analysis?

And another thing we noticed , even after the data load finished , even
then the "xact_commit" was keep increasing along with "tup_fetched", so
does it mean that its doing some kind of implicit commit even for the fetch
type queries which must be "select" queries i believe? Also not sure if its
expected, but here in postgres i have seen unless we put a code within
begin and end block , it's default gets committed even we just run it on
the console , it doesn't ask for a explicit commit/rollback ,so not sure if
that is someway playing a role here.

Regards
Veem

>
>

Re: IO related waits

2024-09-17 Thread Adrian Klaver


On 9/16/24 20:55, veem v wrote:



On Tue, 17 Sept 2024 at 03:41, Adrian Klaver > wrote:



Are you referring to this?:


https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/
 


If not then you will need to be more specific.


Yes, I was referring to this one. So what can be the caveats in this 
approach, considering transactions meant to be ACID compliant as 
financial transactions.Additionally I was not aware of the parameter 
"synchronous_commit" in DB side which will mimic the synchronous commit.


Would both of these mimic the same asynchronous behaviour and achieves 
the same, which means the client data load throughput will increase 
because the DB will not wait for those data to be written to the WAL and 
give a confirmation back to the client and also the client will not wait 
for the DB to give a confirmation back on the data to be persisted in 
the DB or not?. Also, as in the backend the flushing of the WAL to the 
disk has to happen anyway(just that it will be delayed now), so can this 
method cause contention in the database storage side if the speed in 
which the data gets ingested from the client is not getting written to 
the disk , and if it can someway impact the data consistency for the 
read queries?


This is not something that I am that familiar with. I suspect though 
this is more complicated then you think. From the link above:


" Prerequisites #

As illustrated in the section above, implementing proper asynchronous 
I/O to a database (or key/value store) requires a client to that 
database that supports asynchronous requests. Many popular databases 
offer such a client.


In the absence of such a client, one can try and turn a synchronous 
client into a limited concurrent client by creating multiple clients and 
handling the synchronous calls with a thread pool. However, this 
approach is usually less efficient than a proper asynchronous client.

"

Which means you need to on Flink end:

1) Use Flink async I/O .

2) Find a client that supports async or fake it by using multiple 
synchronous clients.


On Postgres end there is this:

https://www.postgresql.org/docs/current/wal-async-commit.html

That will return a success signal to the client quicker if 
synchronous_commit is set to off. Though the point of the Flink async 
I/O is not to wait for the response before moving on, so I am not sure 
how much synchronous_commit = off would help.


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

2024-09-17 Thread Andy Hartman

The command work outside of powershell  yes

On Tue, Sep 17, 2024 at 11:39 AM Adrian Klaver 
wrote:

> On 9/17/24 08:35, Andy Hartman wrote:
> > I don't see anything in LOG
>
> 1) Are you logging connections/disconnection per?:
>
>
> https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
>
> If not do so as it will show you if a connection is being attempted.
>
> 2) Do the commands work when run in psql or supplied directly to psql
> outside of PowerShell?
>
>
> >
> > On Tue, Sep 17, 2024 at 11:23 AM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 9/17/24 08:13, Andy Hartman wrote:
> >  > Still when I try to run from my powershell script it hangs...
> >  >
> >
> > And the Postgres log shows?
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: load fom csv

2024-09-17 Thread Adrian Klaver


On 9/17/24 09:21, Andy Hartman wrote:

The command work outside of powershell  yes


Then you are going to need to use whatever debugging tools PowerShell 
has available to step through the script to figure out where the problem is.



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

2024-09-17 Thread Andy Hartman

Still when I try to run from my powershell script it hangs...

On Tue, Sep 17, 2024 at 8:31 AM Andy Hartman 
wrote:

> I have bad data in an Int field...
>
>  Thanks for your help.
>
> On Tue, Sep 17, 2024 at 1:55 AM Muhammad Usman Khan 
> wrote:
>
>> Hi,
>> Try the following options:
>>
>>
>>- Check if psql is working independently:
>>psql -h $pgServer -d $pgDatabase -U $pgUser -c "SELECT 1;"
>>- Check for permission issues on the CSV file
>>- Run the command manually without variables
>>psql -h your_host -d your_db -U your_user -c "\COPY your_table FROM
>>'C:/path/to/your/file.csv' DELIMITER ',' CSV HEADER;"
>>- set a timeout using the PGCONNECT_TIMEOUT environment variable:
>>$env:PGCONNECT_TIMEOUT=30
>>
>>
>>
>> On Mon, 16 Sept 2024 at 20:35, Andy Hartman 
>> wrote:
>>
>>> I'm trying to run this piece of code from Powershell and it just sits
>>> there and never comes back. There are only 131 records in the csv.
>>>
>>> $connectionString =
>>> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
>>> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>>>
>>> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>>>
>>>
>>> how can I debug this?
>>>
>>> Table layout
>>>
>>> [image: image.png]
>>>
>>

Re: Failing to allocate memory when I think it shouldn't

2024-09-17 Thread Christoph Moench-Tegeder

Hi,

## Thomas Ziegler (thomas.zieg...@holmsecurity.com):

> Except for pgAudit, I don't have any extensions, so it is probably the
> JIT. I had no idea there was a JIT, even it should have been obvious.
> Thanks for pointing this out!

There is - it even has it's own chapter in the documentation:
https://www.postgresql.org/docs/current/jit.html
Most importantly, you can disable JIT per session ("SET jit=off")
or globally in the configuration file (jit=off, reload is
sufficient) or with any of the other usual configuration mechanisms.
If that fixes your problem, congratulations (and the problem is
somewhere down between bytecode generation and what and how llvm
(in its particular version) generates from that).

> Is the memory the JIT takes limited by 'work_mem' or will it just take
> as much memory as it needs?

The latter.

Regards,
Christoph

-- 
Spare Space

Re: IO related waits

2024-09-17 Thread veem v

On Tue, 17 Sept 2024 at 21:24, Adrian Klaver 
wrote:

>
> Which means you need to on Flink end:
>
> 1) Use Flink async I/O .
>
> 2) Find a client that supports async or fake it by using multiple
> synchronous clients.
>
> On Postgres end there is this:
>
> https://www.postgresql.org/docs/current/wal-async-commit.html
>
> That will return a success signal to the client quicker if
> synchronous_commit is set to off. Though the point of the Flink async
> I/O is not to wait for the response before moving on, so I am not sure
> how much synchronous_commit = off would help.
>
>
 Got it. So it means their suggestion was to set the asynch_io at flink
level but not DB level, so that the application will not wait for the
commit response from the database. But in that case , won't it overload the
DB with more and more requests if database will keep doing the commit (
with synchronous_commit=ON)  and waiting for getting the response back from
its storage for the WAL's to be flushed to the disk, while the application
will not wait for its response back(for those inserts) and keep flooding
the database with more and more incoming Insert requests?

Additionally as I mentioned before, we see that from "pg_stat_database"
from the column "xact_commit" , it's almost matching with the sum of
"tup_inserted", "tup_updated", "tup_deleted" column. And also we verified
in pg_stats_statements the  "calls" column is same as the "rows" column for
the INSERT queries, so it means also we are inserting exactly same number
of rows as the number of DB calls, so doesn't it suggest that we are
doing row by row operations/dmls.

Also after seeing above and asking application team to do the batch commit
,we are still seeing the similar figures from pg_stat_database and
pg_stat_statements, so does it mean that we are looking into wrong stats?
or the application code change has not been done accurately? and we see
even when no inserts are running from the application side, we do see
"xact_commit" keep increasing along with "tup_fetched" , why so?

Finally we see in postgres here, even if we just write a DML statement it
does commit that by default, until we explicitly put it in a "begin... end"
block. Can that be the difference between how a "commit" gets handled in
postgres vs other databases?

Re: load fom csv

2024-09-17 Thread Rob Sargent




> On Sep 17, 2024, at 12:25 PM, Adrian Klaver  wrote:
> 
> On 9/17/24 09:21, Andy Hartman wrote:
>> The command work outside of powershell  yes
> 
> Then you are going to need to use whatever debugging tools PowerShell has 
> available to step through the script to figure out where the problem is.
> 
> 

Visual Studio can run/debug PS 1 files. I am not at my desk but have done ps1 
oneliner queries against mssql

Suggest echoing ALL vars used in psql command

> 
>

Re: load fom csv

2024-09-17 Thread Andy Hartman

I'll echo vars and see if something looks strange.

THanks.

On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent  wrote:

>
>
> > On Sep 17, 2024, at 12:25 PM, Adrian Klaver 
> wrote:
> >
> > On 9/17/24 09:21, Andy Hartman wrote:
> >> The command work outside of powershell  yes
> >
> > Then you are going to need to use whatever debugging tools PowerShell
> has available to step through the script to figure out where the problem is.
> >
> >
>
> Visual Studio can run/debug PS 1 files. I am not at my desk but have done
> ps1 oneliner queries against mssql
>
> Suggest echoing ALL vars used in psql command
>
> >
> >
>

Re: IO related waits

2024-09-17 Thread Adrian Klaver


On 9/17/24 12:34, veem v wrote:


On Tue, 17 Sept 2024 at 21:24, Adrian Klaver > wrote:



Which means you need to on Flink end:

1) Use Flink async I/O .

2) Find a client that supports async or fake it by using multiple
synchronous clients.

On Postgres end there is this:

https://www.postgresql.org/docs/current/wal-async-commit.html


That will return a success signal to the client quicker if
synchronous_commit is set to off. Though the point of the Flink async
I/O is not to wait for the response before moving on, so I am not sure
how much synchronous_commit = off would help.


  Got it. So it means their suggestion was to set the asynch_io at flink 
level but not DB level, so that the application will not wait for the 
commit response from the database. But in that case , won't it overload 
the DB with more and more requests if database will keep doing the 
commit ( with synchronous_commit=ON)  and waiting for getting the 
response back from its storage for the WAL's to be flushed to the disk, 
while the application will not wait for its response back(for those 
inserts) and keep flooding the database with more and more incoming 
Insert requests?


My point is this is a multi-layer cake with layers:

1) Flink asycnc io

2) Database client async/sync

3) Postgres sync status.

That is a lot of moving parts and determining whether it is suitable is 
going to require rigorous testing over a representative data load.



See more below.



Additionally as I mentioned before, we see that from "pg_stat_database" 
from the column "xact_commit" , it's almost matching with the sum of 
"tup_inserted", "tup_updated", "tup_deleted" column. And also we 
verified in pg_stats_statements the  "calls" column is same as the 
"rows" column for the INSERT queries, so it means also we are inserting 
exactly same number of rows as the number of DB calls, so doesn't it 
suggest that we are doing row by row operations/dmls.


Also after seeing above and asking application team to do the batch 
commit ,we are still seeing the similar figures from pg_stat_database 
and pg_stat_statements, so does it mean that we are looking into wrong 
stats? or the application code change has not been done accurately? and 
we see even when no inserts are running from the application side, we do 
see "xact_commit" keep increasing along with "tup_fetched" , why so?


Finally we see in postgres here, even if we just write a DML statement 
it does commit that by default, until we explicitly put it in a 
"begin... end" block. Can that be the difference between how a "commit" 
gets handled in postgres vs other databases?


It does if autocommit is set in the client, that is common to other 
databases also:


https://dev.mysql.com/doc/refman/8.4/en/commit.html

https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16

You probably need to take a closer look at the client/driver you are 
using and the code that interacting with it.


In fact I would say you need to review the entire data transfer process 
to see if there are performance gains that can be obtained without 
adding an entirely new async component.







--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Failing to allocate memory when I think it shouldn't

Re: update faster way

question on plain pg_dump file usage

Re: load fom csv

Re: question on plain pg_dump file usage

Re: IO related waits

WG: [Extern] Re: question on plain pg_dump file usage

Re: question on plain pg_dump file usage

WG: question on plain pg_dump file usage

Re: question on plain pg_dump file usage

Re: question on plain pg_dump file usage

Re: load fom csv

Re: load fom csv

Re: load fom csv

Re: IO related waits

Re: IO related waits

Re: load fom csv

Re: load fom csv

Re: load fom csv

Re: Failing to allocate memory when I think it shouldn't

Re: IO related waits

Re: load fom csv

Re: load fom csv

Re: IO related waits

24 matches

Site Navigation

Mail list logo

Footer information