Re: Can't change tcp_keepalives_idle

2024-08-06 Thread Muhammad Imtiaz
Hi,

This parameter is supported only on systems that support TCP_KEEPINTVL. And
please ensure that you are not accidentally connecting through a
Unix-domain socket.  If the result of this query is null then it is a Unix
socket based connection.

SELECT client_addr FROM pg_stat_activity WHERE pid = pg_backend_pid();


Regards,
Muhammad Imtiaz

On Sun, Aug 4, 2024 at 4:16 PM Abraham, Danny  wrote:

> Hi,
>
> Change tcp_keepalives_idle in postgresql.conf from 0 to 300 , then pg_ctl
> reload but it stays with the old 0 value.
> Tried restart and it did not work. PG Log says it was changed.
> Reset_cal marked OK, but no real change ( ins how)
>
> -[ RECORD 1 ]---+-
> name| tcp_keepalives_idle
> setting | 0
> unit| s
> category| Connections and Authentication / Connection Settings
> short_desc  | Time between issuing TCP keepalives.
> extra_desc  | A value of 0 uses the system default.
> context | user
> vartype | integer
> source  | session
> min_val | 0
> max_val | 2147483647
> enumvals|
> boot_val| 0
> reset_val   | 1800
> sourcefile  |
> sourceline  |
> pending_restart | f
>
> Thanks
>
> Danny
>
>
>


Re: Building v17 Beta2 on Windows

2024-08-06 Thread Dominique Devienne
On Mon, Aug 5, 2024 at 2:26 PM David Rowley  wrote:
> Meson is now the only available method for Visual Studio builds."

Thanks David. My colleague figured it out, thanks to your pointers. Cheers, --DD




Re: Windows installation problem at post-install step

2024-08-06 Thread Sandeep Thakkar
On Tue, Aug 6, 2024 at 10:57 AM Thomas Munro  wrote:

> On Mon, Aug 5, 2024 at 8:50 PM Sandeep Thakkar
>  wrote:
> > This issue is seen only on v16 and not the back branches (tested on 15
> and 14) and also confirmed by @Ertan Küçükoglu at
> https://github.com/EnterpriseDB/edb-installers/issues/127#issuecomment-2268371442
>
> Does that mean you can reproduce the problem with initdb.exe directly
> in a shell?  That is, remove the EDB installer from the picture and
> compare v15 and v16 with the exact command line options that
> initcluster.vbs is using, or perhaps just:
>
> initdb.exe --locale="Turkish,Türkiye" --encoding=UTF-8 -D pgdata
>
> yes, here is the output:

> c:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A
> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" --locale
> "Turkish,Türkiye" -W
> The files belonging to this database system will be owned by user
> "sandeep".
> This user must also own the server process.
>
> The database cluster will be initialized with locale
> "Turkish_Türkiye.1254".
> The default text search configuration will be set to "turkish".
>
> Data page checksums are disabled.
>
> Enter new superuser password:
> Enter it again:
>
> fixing permissions on existing directory c:/Program
> Files/PostgreSQL/16/data ... ok
> creating subdirectories ... ok
> selecting dynamic shared memory implementation ... windows
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting default time zone ... UTC
> creating configuration files ... ok
> running bootstrap script ... ok
> performing post-bootstrap initialization ... child process was terminated
> by exception 0xC409
> initdb: removing contents of data directory "c:/Program
> Files/PostgreSQL/16/data"
>



> . o O (Why does that locale name have a comma?)  If v15 works and v16
> breaks, perhaps you could try comparing the output with the attached
> patch?  It will give a hex dump of the contents of the locale name at
> various points in the program, to see if/where it was corrupted, which
> might also be a bit less confusing than looking at script output via
> email (I don't even know how many onion layers of transcoding are
> involved...)
>

here is the output:
v15:

c:\Program Files\PostgreSQL\15\bin>initdb.exe --encoding=UTF-8 -A
> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\15\data" --locale
> "Turkish,Türkiye" -W
> XXX debug raw: getopt optarg  = "Turkish,Türkiye"
> XXX debug hex: getopt optarg  = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69
> 79 65 }
> XXX debug txt: getopt optarg  = { T  u  r  k  i  s  h  ,  T  ?  r  k  i  y
>  e  }
> XXX debug raw: getopt optarg  = "Turkish,Türkiye"
> XXX debug hex: getopt optarg  = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69
> 79 65 }
> XXX debug txt: getopt optarg  = { T  u  r  k  i  s  h  ,  T  ?  r  k  i  y
>  e  }
> The files belonging to this database system will be owned by user
> "sandeep".
> This user must also own the server process.
>
> XXX debug raw: setlocales lc_ctype  = "Turkish,Türkiye"
> XXX debug hex: setlocales lc_ctype  = { 54 75 72 6b 69 73 68 2c 54 fc 72
> 6b 69 79 65 }
> XXX debug txt: setlocales lc_ctype  = { T  u  r  k  i  s  h  ,  T  ?  r  k
>  i  y  e  }
> XXX debug raw: setlocales cannonname  = "Turkish_Türkiye.1254"
> XXX debug hex: setlocales cannonname  = { 54 75 72 6b 69 73 68 5f 54 fc 72
> 6b 69 79 65 2e 31 32 35 34 }
> XXX debug txt: setlocales cannonname  = { T  u  r  k  i  s  h  _  T  ?  r
>  k  i  y  e  .  1  2  5  4  }
> XXX debug raw: setup_locale_encoding  = "Turkish_Türkiye.1254"
> XXX debug hex: setup_locale_encoding  = { 54 75 72 6b 69 73 68 5f 54 fc 72
> 6b 69 79 65 2e 31 32 35 34 }
> XXX debug txt: setup_locale_encoding  = { T  u  r  k  i  s  h  _  T  ?  r
>  k  i  y  e  .  1  2  5  4  }
> The database cluster will be initialized with locale
> "Turkish_Türkiye.1254".
> The default text search configuration will be set to "turkish".
>
> Data page checksums are disabled.
>
> Enter new superuser password:
> Enter it again:
>
> fixing permissions on existing directory c:/Program
> Files/PostgreSQL/15/data ... ok
> creating subdirectories ... ok
> selecting dynamic shared memory implementation ... windows
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting default time zone ... UTC
> creating configuration files ... ok
> running bootstrap script ... ok
> performing post-bootstrap initialization ... ok
> syncing data to disk ... ok
>
> Success. You can now start the database server using:
>
> pg_ctl -D ^"c^:^\Program^ Files^\PostgreSQL^\15^\data^" -l logfile
> start
>



 v16:

> C:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A
> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" --locale
> "Turkish,Türkiye" -W XXX debug raw: getopt optarg = "Turkish,Türkiye" XXX
> debug hex: getopt optarg = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 79 65 }
> XXX debug txt: getopt optarg = { T u r k i s 

Re: Windows installation problem at post-install step

2024-08-06 Thread Sandeep Thakkar
On Tue, Aug 6, 2024 at 4:06 PM Sandeep Thakkar <
sandeep.thak...@enterprisedb.com> wrote:

>
>
> On Tue, Aug 6, 2024 at 10:57 AM Thomas Munro 
> wrote:
>
>> On Mon, Aug 5, 2024 at 8:50 PM Sandeep Thakkar
>>  wrote:
>> > This issue is seen only on v16 and not the back branches (tested on 15
>> and 14) and also confirmed by @Ertan Küçükoglu at
>> https://github.com/EnterpriseDB/edb-installers/issues/127#issuecomment-2268371442
>>
>> Does that mean you can reproduce the problem with initdb.exe directly
>> in a shell?  That is, remove the EDB installer from the picture and
>> compare v15 and v16 with the exact command line options that
>> initcluster.vbs is using, or perhaps just:
>>
>> initdb.exe --locale="Turkish,Türkiye" --encoding=UTF-8 -D pgdata
>>
>> yes, here is the output:
>
>> c:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A
>> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" --locale
>> "Turkish,Türkiye" -W
>> The files belonging to this database system will be owned by user
>> "sandeep".
>> This user must also own the server process.
>>
>> The database cluster will be initialized with locale
>> "Turkish_Türkiye.1254".
>> The default text search configuration will be set to "turkish".
>>
>> Data page checksums are disabled.
>>
>> Enter new superuser password:
>> Enter it again:
>>
>> fixing permissions on existing directory c:/Program
>> Files/PostgreSQL/16/data ... ok
>> creating subdirectories ... ok
>> selecting dynamic shared memory implementation ... windows
>> selecting default max_connections ... 100
>> selecting default shared_buffers ... 128MB
>> selecting default time zone ... UTC
>> creating configuration files ... ok
>> running bootstrap script ... ok
>> performing post-bootstrap initialization ... child process was terminated
>> by exception 0xC409
>> initdb: removing contents of data directory "c:/Program
>> Files/PostgreSQL/16/data"
>>
>
>
>
>> . o O (Why does that locale name have a comma?)  If v15 works and v16
>> breaks, perhaps you could try comparing the output with the attached
>> patch?  It will give a hex dump of the contents of the locale name at
>> various points in the program, to see if/where it was corrupted, which
>> might also be a bit less confusing than looking at script output via
>> email (I don't even know how many onion layers of transcoding are
>> involved...)
>>
>
> here is the output:
> v15:
>
> c:\Program Files\PostgreSQL\15\bin>initdb.exe --encoding=UTF-8 -A
>> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\15\data" --locale
>> "Turkish,Türkiye" -W
>> XXX debug raw: getopt optarg  = "Turkish,Türkiye"
>> XXX debug hex: getopt optarg  = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69
>> 79 65 }
>> XXX debug txt: getopt optarg  = { T  u  r  k  i  s  h  ,  T  ?  r  k  i
>>  y  e  }
>> XXX debug raw: getopt optarg  = "Turkish,Türkiye"
>> XXX debug hex: getopt optarg  = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69
>> 79 65 }
>> XXX debug txt: getopt optarg  = { T  u  r  k  i  s  h  ,  T  ?  r  k  i
>>  y  e  }
>> The files belonging to this database system will be owned by user
>> "sandeep".
>> This user must also own the server process.
>>
>> XXX debug raw: setlocales lc_ctype  = "Turkish,Türkiye"
>> XXX debug hex: setlocales lc_ctype  = { 54 75 72 6b 69 73 68 2c 54 fc 72
>> 6b 69 79 65 }
>> XXX debug txt: setlocales lc_ctype  = { T  u  r  k  i  s  h  ,  T  ?  r
>>  k  i  y  e  }
>> XXX debug raw: setlocales cannonname  = "Turkish_Türkiye.1254"
>> XXX debug hex: setlocales cannonname  = { 54 75 72 6b 69 73 68 5f 54 fc
>> 72 6b 69 79 65 2e 31 32 35 34 }
>> XXX debug txt: setlocales cannonname  = { T  u  r  k  i  s  h  _  T  ?  r
>>  k  i  y  e  .  1  2  5  4  }
>> XXX debug raw: setup_locale_encoding  = "Turkish_Türkiye.1254"
>> XXX debug hex: setup_locale_encoding  = { 54 75 72 6b 69 73 68 5f 54 fc
>> 72 6b 69 79 65 2e 31 32 35 34 }
>> XXX debug txt: setup_locale_encoding  = { T  u  r  k  i  s  h  _  T  ?  r
>>  k  i  y  e  .  1  2  5  4  }
>> The database cluster will be initialized with locale
>> "Turkish_Türkiye.1254".
>> The default text search configuration will be set to "turkish".
>>
>> Data page checksums are disabled.
>>
>> Enter new superuser password:
>> Enter it again:
>>
>> fixing permissions on existing directory c:/Program
>> Files/PostgreSQL/15/data ... ok
>> creating subdirectories ... ok
>> selecting dynamic shared memory implementation ... windows
>> selecting default max_connections ... 100
>> selecting default shared_buffers ... 128MB
>> selecting default time zone ... UTC
>> creating configuration files ... ok
>> running bootstrap script ... ok
>> performing post-bootstrap initialization ... ok
>> syncing data to disk ... ok
>>
>> Success. You can now start the database server using:
>>
>> pg_ctl -D ^"c^:^\Program^ Files^\PostgreSQL^\15^\data^" -l logfile
>> start
>>
>
>
>
>  v16:
>
>> C:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A
>> scram-sha-256 -U postgres -D "c:\Program Files\P

Re: Windows installation problem at post-install step

2024-08-06 Thread Peter J. Holzer
On 2024-08-06 16:06:51 +0530, Sandeep Thakkar wrote:
> On Tue, Aug 6, 2024 at 10:57 AM Thomas Munro  wrote:
> 
> On Mon, Aug 5, 2024 at 8:50 PM Sandeep Thakkar
>  wrote:
> > This issue is seen only on v16 and not the back branches (tested on 15
> and 14) and also confirmed by @Ertan Küçükoglu at https://github.com/
> EnterpriseDB/edb-installers/issues/127#issuecomment-2268371442
> 
> Does that mean you can reproduce the problem with initdb.exe directly
> in a shell?  That is, remove the EDB installer from the picture and
> compare v15 and v16 with the exact command line options that
> initcluster.vbs is using, or perhaps just:
> 
> initdb.exe --locale="Turkish,Türkiye" --encoding=UTF-8 -D pgdata
> 
> 
> yes, here is the output:
> 
> c:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A
> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" 
> --locale
> "Turkish,Türkiye" -W
> The files belonging to this database system will be owned by user
> "sandeep".
> This user must also own the server process.
> 
> The database cluster will be initialized with locale
> "Turkish_Türkiye.1254".

I assume that "1254" here is the code page.
But you specified --encoding=UTF-8 above, so your default locale uses a
different encoding than the template databases. I would expect that to
cause problems if the template databases contain any charecters where
the encodings differ (such as "ü" in the locale name).


> c:\Program Files\PostgreSQL\15\bin>initdb.exe --encoding=UTF-8 -A
> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\15\data" 
> --locale
> "Turkish,Türkiye" -W
> XXX debug raw: getopt optarg  = "Turkish,Türkiye"
> XXX debug hex: getopt optarg  = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 
> 79
> 65 }
> XXX debug txt: getopt optarg  = { T  u  r  k  i  s  h  ,  T  ?  r  k  i  y
>  e  }

This also looks like window-1254 (or at least some ISO-8859 variant) to
me.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


libpq version macro to use or not PQsocketPoll

2024-08-06 Thread Dominique Devienne
Hi. Now that v17beta2 is part of my dev-env, I'm trying out the new API.
And I discover there's no version macro for conditional compilation in LIBPQ...

I found https://www.postgresql.org/message-id/968815.1623936849%40sss.pgh.pa.us
where Tom thinks that a horrible idea, and instead proposes a new
_HAS_ flag instead.
Except of course there's on such flag for the new PQsocketPoll in beta2...
I see only 3 such _HAS_ flags.

I'm not sure what's so wrong with version macro as such.
- GCC has one.
- [SQLite has one][1].
- Why can't LIBPQ have one?

That's exactly the kind of issues they are designed to solve for code
that needs to be portable across versions. What am I missing?

Thanks, --DD

[1]: https://www.sqlite.org/c3ref/c_source_id.html




Re: libpq version macro to use or not PQsocketPoll

2024-08-06 Thread Tom Lane
Dominique Devienne  writes:
> Hi. Now that v17beta2 is part of my dev-env, I'm trying out the new API.
> And I discover there's no version macro for conditional compilation in 
> LIBPQ...

Indeed, that's an oversight, and there's a number of other things
we added to libpq-fe.h between 16 and 17 that probably deserve
their own LIBPQ_HAS symbols.

> I'm not sure what's so wrong with version macro as such.

A couple of things:
* It doesn't help if we forget to update it, so it's not really
  better than HAS_ symbols on that score.
* Usage is error-prone (you might test for the wrong cutoff version)
  and not very readable.
* We can't retroactively make such a symbol appear in old copies
  of libpq-fe.h.

regards, tom lane




Re: libpq version macro to use or not PQsocketPoll

2024-08-06 Thread Dominique Devienne
On Tue, Aug 6, 2024 at 4:31 PM Tom Lane  wrote:
> Dominique Devienne  writes:
> > Hi. Now that v17beta2 is part of my dev-env, I'm trying out the new API.
> > And I discover there's no version macro for conditional compilation in 
> > LIBPQ...
>
> Indeed, that's an oversight, and there's a number of other things
> we added to libpq-fe.h between 16 and 17 that probably deserve
> their own LIBPQ_HAS symbols.

OK, thanks. Glad to know they'll be in beta3 or the final release.

> > I'm not sure what's so wrong with version macro as such.
> A couple of things:
> * It doesn't help if we forget to update it, so it's not really
>   better than HAS_ symbols on that score.

I think it's easier to track/spot as part of the release process,
and probably even automate it (or its checking).

It's also a very common practice.
If DRH does it for SQLite, then it has to be OK :)

> * Usage is error-prone (you might test for the wrong cutoff version)
>   and not very readable.

That's the client's responsibility.
The "vendor" just has to increment/update the macros on every release.

> * We can't retroactively make such a symbol appear in old copies
>   of libpq-fe.h.

Sure. But we can start at some point. And eventually assume
the macros exist, because older versions w/o them are no longer
supported by your client code. And each client code base decides that.

Again, the "vendor" just has to start defining the macros,
and keep updating them. That's all. The rest is up to the client.

My $0.02. I can use _HAS_ flags too.

They are orthogonal in the first place, in fact.
We can have both. Also, version macros don't
multiply the way _HAS_ macros do, over time.

It's neve too late to start adding them :). --DD




data checksums

2024-08-06 Thread bruno vieira da silva
Hello.
I've been dealing with some database corruption events, so i've been
wondering to enable data checksums on my deployments.

so my question is why data checksums aren't enabled by default on pg? the
pg doc
mentions a considerable performance penality, how considerable it is?

Thanks
-- 
Bruno Vieira da Silva


Re: data checksums

2024-08-06 Thread Ron Johnson
On Tue, Aug 6, 2024 at 11:12 AM bruno vieira da silva 
wrote:

> Hello.
> I've been dealing with some database corruption events, so i've been
> wondering to enable data checksums on my deployments.
>
> so my question is why data checksums aren't enabled by default on pg? the
> pg doc
> mentions a considerable performance penality, how considerable it is?
>

I'm going to hazard a guess that the relevant line from the docs were
written many many years ago, back when servers were much slower.

We certainly didn't notice any speed degradation when going from PG 9.6.24
to PG 14.12.  Or... we just use enough hardware and flash-only SAN disks to
make the overhead minuscule.

-- 
Death to America, and butter sauce.
Iraq lobster!


Re: data checksums

2024-08-06 Thread Christophe Pettus



> On Aug 6, 2024, at 08:11, bruno vieira da silva  wrote:
> 
> so my question is why data checksums aren't enabled by default on pg?

At this point, mostly historical reasons.  They're also superfluous if your 
underlying file system or storage hardware does storage-level corruption checks 
(which most don't).

> the pg doc 
> mentions a considerable performance penality, how considerable it is?

That line is probably somewhat out of date at this point.  We haven't seen a 
significant slowdown in enabling them on any modern hardware.  I always turn 
them on, except on the type of filesystems/hardware mentioned above.



Re: data checksums

2024-08-06 Thread Greg Sabino Mullane
The penalty is not "considerable", but it is not completely trivial either.
But it's more on the trivial side. Part of the problem is that it is hard
to measure, as it is very workload dependent. As to why it is not the
default, Postgres is very careful and conservative by default, and not
everybody was convinced that enabling checksums is worth the tradeoff,
especially (IIRC) with the lack of real-world examples of people
discovering issues thanks to these checksums. So yes, please enable and
share with us if the checksums catch anything.

I think the argument is worth re-opening again, because (as far as I know),
all of the companies providing Postgres support, whether completely
cloud-managed, setting up a complex HA cluster, or just providing tuning
advice, have enabled checksums by default for many, many years. So a big +1
from me to finally making it the default. It's a de-facto default anyway at
this point.

Cheers,
Greg


Re: Windows installation problem at post-install step

2024-08-06 Thread Thomas Munro
On Tue, Aug 6, 2024 at 10:38 PM Sandeep Thakkar
 wrote:
> On Tue, Aug 6, 2024 at 4:06 PM Sandeep Thakkar 
>  wrote:

[v15]

>>> XXX debug raw: setup_locale_encoding  = "Turkish_Türkiye.1254"
>>> XXX debug hex: setup_locale_encoding  = { 54 75 72 6b 69 73 68 5f 54 fc 72 
>>> 6b 69 79 65 2e 31 32 35 34 }
>>> XXX debug txt: setup_locale_encoding  = { T  u  r  k  i  s  h  _  T  ?  r  
>>> k  i  y  e  .  1  2  5  4  }
>>> The database cluster will be initialized with locale "Turkish_Türkiye.1254".

[v16]

>>> XXX debug raw: setup_locale_encoding  = "Turkish_Türkiye.1254"
>>> XXX debug hex: setup_locale_encoding  = { 54 75 72 6b 69 73 68 5f 54 fc 72 
>>> 6b 69 79 65 2e 31 32 35 34 }
>>> XXX debug txt: setup_locale_encoding  = { T  u  r  k  i  s  h  _  T  ?  r  
>>> k  i  y  e  .  1  2  5  4  }
>>> The database cluster will be initialized with locale "Turkish_Türkiye.1254".

OK so we see that the "Turkish,Türkiye" -> "Turkish_Türkiye.1254"
transformation happens in the "cannonname" step, and then the final
values are identical between the versions.

>>> performing post-bootstrap initialization ... child process was terminated 
>>> by exception 0xC409

If I understand correctly that's abort(), so can you please run it
with -d so we can maybe see some information about that?  Also -n
might be useful to stop it from deleting the data directory at the end
in case something useful for diagnosis is in there.




Standard of data storage and transformation

2024-08-06 Thread yudhi s
Hi All,
We are having a use case in which we are having transaction data for
multiple customers in one of the Postgres databases(version 15.4) and we
are consuming it from multiple sources(batch file processing, kafka event
processing etc). It's currently stored in normalized form postgres
with constraints, indexes, partitions defined. This postgres database is
holding the transaction data for around a month or so. There are use cases
of running online transaction search reports which will be mostly real time
reporting and also some daily transaction batch reports based on customers
and also month end reports for customers. In target state it will hold
Approx. ~400 million transactions/day which can be billions of rows across
multiple related parent/child tables.

There is another requirement to send these customer transaction data to an
olap system which is in a snowflake database and there it will be persisted
for many years. The lag between the data in postgres/oltp and in snowflake
will be ~1hr. And any reporting api can query postgres for <1 month worth
of transaction data and if it needs to scan for >1month worth of
transaction data, it will point to the snowflake database.

Now the question which we are wondering is , should we send the data as is
in normalized table form to snowflake and then there we transform/flatten
the data to support the reporting use case or should we first flatten or
transform the data in postgres itself and make it as another structure( for
example creating materialized views on top of base table) and only then
move that data to the snowflake? What is the appropriate standard and
downside if we do anything different.

Regards
Yudhi


Re: Windows installation problem at post-install step

2024-08-06 Thread Thomas Munro
On Tue, Aug 6, 2024 at 11:44 PM Peter J. Holzer  wrote:
> I assume that "1254" here is the code page.
> But you specified --encoding=UTF-8 above, so your default locale uses a
> different encoding than the template databases. I would expect that to
> cause problems if the template databases contain any charecters where
> the encodings differ (such as "ü" in the locale name).

It's weird, but on Windows, PostgreSQL allows UTF-8 encoding with any
locale, and thus apparent contradictions:

/* See notes in createdb() to understand these tests */
if (!(locale_enc == user_enc ||
  locale_enc == PG_SQL_ASCII ||
  locale_enc == -1 ||
#ifdef WIN32
  user_enc == PG_UTF8 ||
#endif
  user_enc == PG_SQL_ASCII))
{
pg_log_error("encoding mismatch");

... and createdb's comments say that is acceptable because:

 * 3. selected encoding is UTF8 and platform is win32. This is because
 * UTF8 is a pseudo codepage that is supported in all locales since it's
 * converted to UTF16 before being used.

At the time PostgreSQL was ported to Windows, UTF-8 was not a
supported encoding in "char"-based system interfaces like strcoll_l(),
and the port had to convert to "wchar_t" interfaces and call (in that
example) wcscoll_l().  On modern Windows it is, and there are two
locale names, with and without ".UTF-8" suffix (cf. glibc systems that
have "en_US" and "en_US.UTF-8" where the suffix-less version uses
whatever traditional encoding was used for that language before UTF-8
ate the world).

If we were doing the Windows port today, we'd probably not have that
special case for Windows, and we wouldn't have the wchar_t
conversions.  Then I think we'd allow only:

--locale=tr-TR (defaults to --encoding=WIN1254)
--locale=tr-TR --encoding=WIN1254
--locale-tr-TR.UTF-8
--locale=tr-TR.UTF-8 --encoding=UTF-8

If we come up with an automated (or even manual but documented) way to
perform the "Turkish_Türkiye.1254" -> "tr-TR" upgrade as Dave was
suggesting upthread, we'll probably want to be careful to tidy up
these contradictory settings.  For example I guess that American
databases initialised by EDB's installer must be using
--locale="English_United States.1252" and --encoding=UTF-8, and should
be changed to "en-US.UTF-8", while those initialised by letting
initdb.exe pick the encoding must be using --locale="English_United
States.1252" and --encoding=WIN1252 (implicit) and should be changed
to "en-US" to match the WIN1252 encoding.

Only if we did that update would we be able to consider removing the
extra UTF-16 conversions that are happening very frequently inside
PostgreSQL code, which is a waste of CPU cycles and programmer sanity.
(But that's all just speculation from studying the locale code -- I've
never really used Windows.)




Re: Standard of data storage and transformation

2024-08-06 Thread Ron Johnson
On Tue, Aug 6, 2024 at 5:07 PM yudhi s  wrote:

> Hi All,
> We are having a use case in which we are having transaction data for
> multiple customers in one of the Postgres databases(version 15.4) and we
> are consuming it from multiple sources(batch file processing, kafka event
> processing etc). It's currently stored in normalized form postgres
> with constraints, indexes, partitions defined. This postgres database is
> holding the transaction data for around a month or so. There are use cases
> of running online transaction search reports which will be mostly real time
> reporting and also some daily transaction batch reports based on customers
> and also month end reports for customers. In target state it will hold
> Approx. ~400 million transactions/day which can be billions of rows across
> multiple related parent/child tables.
>
> There is another requirement to send these customer transaction data to an
> olap system which is in a snowflake database and there it will be persisted
> for many years. The lag between the data in postgres/oltp and in snowflake
> will be ~1hr. And any reporting api can query postgres for <1 month worth
> of transaction data and if it needs to scan for >1month worth of
> transaction data, it will point to the snowflake database.
>
> Now the question which we are wondering is , should we send the data as is
> in normalized table form to snowflake and then there we transform/flatten
> the data to support the reporting use case or should we first flatten or
> transform the data in postgres itself and make it as another structure( for
> example creating materialized views on top of base table) and only then
> move that data to the snowflake? What is the appropriate standard and
> downside if we do anything different.
>

Some thoughts:
0) How big are the records?
1) Materialized views add disk space overhead.
2) Materialized views are for when you query the same static over and over
again.
3) IIUC, you'll be moving the data from PG to Snowflake just once.
4) Writing an MV to disk and then reading it only once doubles the IO
requirements.
5) Thus, my first thought would be to extract the data from PG using a
denormalizing "plain" view.
5a) If you can't make that Fast Enough, then obviously you must pull the
normalized data from PG and denorm it elsewhere.  You know your situation
better than us.
6) Indices will be critical: not too many, but not too few.
7) Obviously consider partitioning, but note that too many partitions can
make query planning MUCH slower.
7a) 31 days cut into hours means 744 partitions.  That's a LOT.
7b) Partitioning every THREE hours means only 248 child tables.  A lot, but
much more manageable.
7c) That might well kill reporting performance, though, if it's for example
one customer across many partitions.
8) You (hopefully) know what kind of queries will be run.  Maybe partition
by customer (or whatever) range and THEN by an hour range.
8a) You'd have to simultaneously run multiple extract jobs (on for each
"customer" range), but that might not be too onerous, since then each hour
partition would be smaller.
9) Testing.  Nothing beats full-scale testing.

-- 
Death to America, and butter sauce!
Iraq lobster...


Streaming replication issue post upgrade from version 11 to 14 on windows 2016 Server

2024-08-06 Thread Vamsi Chava
Hi Team,

we have upgraded postgresql version 11 to 14 on windows server 2016, post
upgrade we configured streaming replication. the data is not getting
replicated. but compared to version 11, 14 is very different in
configuring replication. Tried from documentation from postgres site. no
luck, Any help appreciated. Tried many other sites for Windows side
replication configuration notes. No luck


Thankyou
vamshi.chava


Re: data checksums

2024-08-06 Thread Laurenz Albe
On Tue, 2024-08-06 at 09:29 -0700, Christophe Pettus wrote:
> 
> > On Aug 6, 2024, at 08:11, bruno vieira da silva  
> > wrote:
> > 
> > so my question is why data checksums aren't enabled by default on pg?
> 
> At this point, mostly historical reasons.  They're also superfluous if your 
> underlying
> file system or storage hardware does storage-level corruption checks (which 
> most don't).

I am surprised by that.  Would you say that most storage systems will happily 
give you a
garbage block if there was a hardware problem somewhere?

> > the pg doc 
> > mentions a considerable performance penality, how considerable it is?
> 
> That line is probably somewhat out of date at this point.  We haven't seen a 
> significant
> slowdown in enabling them on any modern hardware.  I always turn them on, 
> except on the
> type of filesystems/hardware mentioned above.

Turning data checksums on will write WAL for hint bits, which can significantly 
increase
the amount of WAL written.

Yours,
Laurenz Albe




Re: data checksums

2024-08-06 Thread Christophe Pettus



> On Aug 6, 2024, at 19:45, Laurenz Albe  wrote:
> I am surprised by that.  Would you say that most storage systems will happily 
> give you a
> garbage block if there was a hardware problem somewhere?

"Most" is hard for me to judge.  HDDs can have uncorrected and undetected 
errors, definitely.  ZFS, for example, can correct those (within limits); XFS 
doesn't try.  I have been told that SSDs can have uncorrected/undetected errors 
as well, but I don't know details on that.

> Turning data checksums on will write WAL for hint bits, which can 
> significantly increase
> the amount of WAL written.

I was curious about that, so I just did a quick experiment using pgbench, with 
identical databases except for checksums.  They both generated the same amount 
of WAL within 10% or so, so I don't think the impact is huge.  (And you need 
the hint bits for pg_rewind, which is a nice thing to have in your back pocket 
if required.)



ANALYZE on partitioned tables vs on individual partitions

2024-08-06 Thread Michael Harris
Hello Experts,

Our application has a database with a large number of partitioned tables
used to store time series data.

It is partitioned by time: new data is populated into the current
partition, old partitions eventually get dropped, and the partitions in
between are largely static.

I had not realized until recently that the partitioned tables themselves
are not analyzed by the autovacuum processes. I am now wondering if we
should be manually analyzing those.

The problem is they are quite enormous and take a long time to analyze - I
tested one yesterday and it took ~30mins, and we have hundreds of them. It
might not be practical to regularly analyze them.

My questions are:

  1. What is the consequence of not having good statistics on partitioned
table level, if you do have good statistics on the partition level?

  2. Which planning operations rely on partitioned table level statistics?

My reason for asking is I'm trying to understand the tradeoff between the
IO cost associated with frequently analyzing as opposed to possibly
non-optimal plans being chosen.

Thanks in advance, and sorry if this has all been covered before. I could
not find anything definitive in the docs, and while I did find a few
references by googling, none really explained the answers to those
questions too well.

Regards
Mike


Re: Streaming replication issue post upgrade from version 11 to 14 on windows 2016 Server

2024-08-06 Thread Muhammad Ikram
Hi Vamsi,

We have been doing these steps to setup streaming replication


--Primay-Server

CREATE ROLE repl REPLICATION LOGIN PASSWORD 'your_password';

postgresql.conf

listen_addresses = '*'
wal_level = replica
archive_mode = on
archive_command = 'copy %p \\path_to_archive\\%f'
max_wal_senders = 3
wal_keep_size = 1024

--pg_hba.conf

host replication repl standby_server_ip/32 auth_method


Creating standby

PGPASSWORD=repl pg_basebackup -F p -C -S pslot1 -h localhost -p 5333 -D
path_to_standby_data_directory   -U repl -P -v -R -X stream


--standby-postgresql.conf

primary_conninfo = 'host=primary_server_ip port=5432 user=repl
password=your_password'
restore_command = 'copy \\path_to_archive\\%f %p'  # if you are using
archive logs


Hope I did not forget any step.

Regards,
Muhammad Ikram


On Wed, Aug 7, 2024 at 6:11 AM Vamsi Chava 
wrote:

> Hi Team,
>
> we have upgraded postgresql version 11 to 14 on windows server 2016, post
> upgrade we configured streaming replication. the data is not getting
> replicated. but compared to version 11, 14 is very different in
> configuring replication. Tried from documentation from postgres site. no
> luck, Any help appreciated. Tried many other sites for Windows side
> replication configuration notes. No luck
>
>
> Thankyou
> vamshi.chava
>


-- 
Muhammad Ikram


Re: ANALYZE on partitioned tables vs on individual partitions

2024-08-06 Thread Christophe Pettus



> On Aug 6, 2024, at 21:13, Michael Harris  wrote:
> 
>   1. What is the consequence of not having good statistics on partitioned 
> table level, if you do have good statistics on the partition level?

The child partitions are just tables, and all of the same issues that can arise 
from bad statistics on a table can arise on them: specifically, bad query 
plans.  (There are also some consequences to leaving a table unvacuumed, even 
an append-only table.)  The root table in declarative partitioning has no rows, 
so there really is no such thing as "statistics at the partition level."

>   2. Which planning operations rely on partitioned table level statistics?

Just about every query uses the information that ANALYZE gathers in some way.  
If the query can't be executed in any other conceivable way than a sequential 
scan, or if it is a simple lookup on a unique column, the statistics will not 
contribute to the plan, but essentially every other plan will use them in one 
way or another.

Child partitions should be autovacuumed and autoanalyzed just like any other 
table; they are not prohibited from autovacuum in any way by default.  It's 
probably a good idea to investigate why they are not being picked up by 
autovacuum.  If they are created by a bulk load process, it's not a bad idea to 
do a VACUUM ANALYZE on them once the bulk load is complete.



Re: ANALYZE on partitioned tables vs on individual partitions

2024-08-06 Thread David Rowley
On Wed, 7 Aug 2024 at 16:44, Christophe Pettus  wrote:
> Child partitions should be autovacuumed and autoanalyzed just like any other 
> table; they are not prohibited from autovacuum in any way by default.  It's 
> probably a good idea to investigate why they are not being picked up by 
> autovacuum.  If they are created by a bulk load process, it's not a bad idea 
> to do a VACUUM ANALYZE on them once the bulk load is complete.

I think the complaint was about no autovacuum on the partitioned
table, not the partitions.  This is expected as we don't track the
counters (in particular n_mod_since_analyze) shown in
pg_stat_all_tables at the partitioned table level, so the trigger
points that normally cause autovacuum to analyze or vacuum a table
just won't be triggered for a partitioned table.  For VACUUM, that's
fine as, as you mentioned, no rows are stored. But for analyze, that
does present a problem.

To name the aspects of planning that rely on statistics of the
partitioned table, basically anything above the Append or MergeAppend
which joins the partitioned results together. So that doesn't include
the scans of each partition and any quals that are pushed down to the
scan level as those are able to use the partition level statistics.
However, it does include things like joins, group by, distinct as
those require n_distinct estimates for the partitioned table. It's not
all bad though as the row estimates for each individual partition will
be totalled up through the Append / MergeAppend simply by adding up
the row estimates for each Append / MergeAppend child plan. So, it's
really only an estimation problem for any node that comes after a join
node or a group by node as the output rows for those nodes will depend
on a good n_distinct estimate for the partitioned table.

Partition-wise joins and aggregates do change things a bit as those
features do permit moving those operations below the Append / Merge
Append, in which case the statistics for the individual partition can
be used.

You could consider manually setting the n_distinct_inherited estimates
for the columns that you join on or group by in the partitioned table.
You might find that you're able to choose a suitable value for that if
you review the documentation for that setting. In particular, please
review what is mentioned about using negative numbers for that
setting. You may be able to choose a value that scales correctly with
the row estimate that doesn't get outdated as you add more rows to the
partitions. You'll need to determine that based on the data you're
storing.

David