Re: Can't change tcp_keepalives_idle
Hi, This parameter is supported only on systems that support TCP_KEEPINTVL. And please ensure that you are not accidentally connecting through a Unix-domain socket. If the result of this query is null then it is a Unix socket based connection. SELECT client_addr FROM pg_stat_activity WHERE pid = pg_backend_pid(); Regards, Muhammad Imtiaz On Sun, Aug 4, 2024 at 4:16 PM Abraham, Danny wrote: > Hi, > > Change tcp_keepalives_idle in postgresql.conf from 0 to 300 , then pg_ctl > reload but it stays with the old 0 value. > Tried restart and it did not work. PG Log says it was changed. > Reset_cal marked OK, but no real change ( ins how) > > -[ RECORD 1 ]---+- > name| tcp_keepalives_idle > setting | 0 > unit| s > category| Connections and Authentication / Connection Settings > short_desc | Time between issuing TCP keepalives. > extra_desc | A value of 0 uses the system default. > context | user > vartype | integer > source | session > min_val | 0 > max_val | 2147483647 > enumvals| > boot_val| 0 > reset_val | 1800 > sourcefile | > sourceline | > pending_restart | f > > Thanks > > Danny > > >
Re: Building v17 Beta2 on Windows
On Mon, Aug 5, 2024 at 2:26 PM David Rowley wrote: > Meson is now the only available method for Visual Studio builds." Thanks David. My colleague figured it out, thanks to your pointers. Cheers, --DD
Re: Windows installation problem at post-install step
On Tue, Aug 6, 2024 at 10:57 AM Thomas Munro wrote: > On Mon, Aug 5, 2024 at 8:50 PM Sandeep Thakkar > wrote: > > This issue is seen only on v16 and not the back branches (tested on 15 > and 14) and also confirmed by @Ertan Küçükoglu at > https://github.com/EnterpriseDB/edb-installers/issues/127#issuecomment-2268371442 > > Does that mean you can reproduce the problem with initdb.exe directly > in a shell? That is, remove the EDB installer from the picture and > compare v15 and v16 with the exact command line options that > initcluster.vbs is using, or perhaps just: > > initdb.exe --locale="Turkish,Türkiye" --encoding=UTF-8 -D pgdata > > yes, here is the output: > c:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A > scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" --locale > "Turkish,Türkiye" -W > The files belonging to this database system will be owned by user > "sandeep". > This user must also own the server process. > > The database cluster will be initialized with locale > "Turkish_Türkiye.1254". > The default text search configuration will be set to "turkish". > > Data page checksums are disabled. > > Enter new superuser password: > Enter it again: > > fixing permissions on existing directory c:/Program > Files/PostgreSQL/16/data ... ok > creating subdirectories ... ok > selecting dynamic shared memory implementation ... windows > selecting default max_connections ... 100 > selecting default shared_buffers ... 128MB > selecting default time zone ... UTC > creating configuration files ... ok > running bootstrap script ... ok > performing post-bootstrap initialization ... child process was terminated > by exception 0xC409 > initdb: removing contents of data directory "c:/Program > Files/PostgreSQL/16/data" > > . o O (Why does that locale name have a comma?) If v15 works and v16 > breaks, perhaps you could try comparing the output with the attached > patch? It will give a hex dump of the contents of the locale name at > various points in the program, to see if/where it was corrupted, which > might also be a bit less confusing than looking at script output via > email (I don't even know how many onion layers of transcoding are > involved...) > here is the output: v15: c:\Program Files\PostgreSQL\15\bin>initdb.exe --encoding=UTF-8 -A > scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\15\data" --locale > "Turkish,Türkiye" -W > XXX debug raw: getopt optarg = "Turkish,Türkiye" > XXX debug hex: getopt optarg = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 > 79 65 } > XXX debug txt: getopt optarg = { T u r k i s h , T ? r k i y > e } > XXX debug raw: getopt optarg = "Turkish,Türkiye" > XXX debug hex: getopt optarg = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 > 79 65 } > XXX debug txt: getopt optarg = { T u r k i s h , T ? r k i y > e } > The files belonging to this database system will be owned by user > "sandeep". > This user must also own the server process. > > XXX debug raw: setlocales lc_ctype = "Turkish,Türkiye" > XXX debug hex: setlocales lc_ctype = { 54 75 72 6b 69 73 68 2c 54 fc 72 > 6b 69 79 65 } > XXX debug txt: setlocales lc_ctype = { T u r k i s h , T ? r k > i y e } > XXX debug raw: setlocales cannonname = "Turkish_Türkiye.1254" > XXX debug hex: setlocales cannonname = { 54 75 72 6b 69 73 68 5f 54 fc 72 > 6b 69 79 65 2e 31 32 35 34 } > XXX debug txt: setlocales cannonname = { T u r k i s h _ T ? r > k i y e . 1 2 5 4 } > XXX debug raw: setup_locale_encoding = "Turkish_Türkiye.1254" > XXX debug hex: setup_locale_encoding = { 54 75 72 6b 69 73 68 5f 54 fc 72 > 6b 69 79 65 2e 31 32 35 34 } > XXX debug txt: setup_locale_encoding = { T u r k i s h _ T ? r > k i y e . 1 2 5 4 } > The database cluster will be initialized with locale > "Turkish_Türkiye.1254". > The default text search configuration will be set to "turkish". > > Data page checksums are disabled. > > Enter new superuser password: > Enter it again: > > fixing permissions on existing directory c:/Program > Files/PostgreSQL/15/data ... ok > creating subdirectories ... ok > selecting dynamic shared memory implementation ... windows > selecting default max_connections ... 100 > selecting default shared_buffers ... 128MB > selecting default time zone ... UTC > creating configuration files ... ok > running bootstrap script ... ok > performing post-bootstrap initialization ... ok > syncing data to disk ... ok > > Success. You can now start the database server using: > > pg_ctl -D ^"c^:^\Program^ Files^\PostgreSQL^\15^\data^" -l logfile > start > v16: > C:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A > scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" --locale > "Turkish,Türkiye" -W XXX debug raw: getopt optarg = "Turkish,Türkiye" XXX > debug hex: getopt optarg = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 79 65 } > XXX debug txt: getopt optarg = { T u r k i s
Re: Windows installation problem at post-install step
On Tue, Aug 6, 2024 at 4:06 PM Sandeep Thakkar < sandeep.thak...@enterprisedb.com> wrote: > > > On Tue, Aug 6, 2024 at 10:57 AM Thomas Munro > wrote: > >> On Mon, Aug 5, 2024 at 8:50 PM Sandeep Thakkar >> wrote: >> > This issue is seen only on v16 and not the back branches (tested on 15 >> and 14) and also confirmed by @Ertan Küçükoglu at >> https://github.com/EnterpriseDB/edb-installers/issues/127#issuecomment-2268371442 >> >> Does that mean you can reproduce the problem with initdb.exe directly >> in a shell? That is, remove the EDB installer from the picture and >> compare v15 and v16 with the exact command line options that >> initcluster.vbs is using, or perhaps just: >> >> initdb.exe --locale="Turkish,Türkiye" --encoding=UTF-8 -D pgdata >> >> yes, here is the output: > >> c:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A >> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" --locale >> "Turkish,Türkiye" -W >> The files belonging to this database system will be owned by user >> "sandeep". >> This user must also own the server process. >> >> The database cluster will be initialized with locale >> "Turkish_Türkiye.1254". >> The default text search configuration will be set to "turkish". >> >> Data page checksums are disabled. >> >> Enter new superuser password: >> Enter it again: >> >> fixing permissions on existing directory c:/Program >> Files/PostgreSQL/16/data ... ok >> creating subdirectories ... ok >> selecting dynamic shared memory implementation ... windows >> selecting default max_connections ... 100 >> selecting default shared_buffers ... 128MB >> selecting default time zone ... UTC >> creating configuration files ... ok >> running bootstrap script ... ok >> performing post-bootstrap initialization ... child process was terminated >> by exception 0xC409 >> initdb: removing contents of data directory "c:/Program >> Files/PostgreSQL/16/data" >> > > > >> . o O (Why does that locale name have a comma?) If v15 works and v16 >> breaks, perhaps you could try comparing the output with the attached >> patch? It will give a hex dump of the contents of the locale name at >> various points in the program, to see if/where it was corrupted, which >> might also be a bit less confusing than looking at script output via >> email (I don't even know how many onion layers of transcoding are >> involved...) >> > > here is the output: > v15: > > c:\Program Files\PostgreSQL\15\bin>initdb.exe --encoding=UTF-8 -A >> scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\15\data" --locale >> "Turkish,Türkiye" -W >> XXX debug raw: getopt optarg = "Turkish,Türkiye" >> XXX debug hex: getopt optarg = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 >> 79 65 } >> XXX debug txt: getopt optarg = { T u r k i s h , T ? r k i >> y e } >> XXX debug raw: getopt optarg = "Turkish,Türkiye" >> XXX debug hex: getopt optarg = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 >> 79 65 } >> XXX debug txt: getopt optarg = { T u r k i s h , T ? r k i >> y e } >> The files belonging to this database system will be owned by user >> "sandeep". >> This user must also own the server process. >> >> XXX debug raw: setlocales lc_ctype = "Turkish,Türkiye" >> XXX debug hex: setlocales lc_ctype = { 54 75 72 6b 69 73 68 2c 54 fc 72 >> 6b 69 79 65 } >> XXX debug txt: setlocales lc_ctype = { T u r k i s h , T ? r >> k i y e } >> XXX debug raw: setlocales cannonname = "Turkish_Türkiye.1254" >> XXX debug hex: setlocales cannonname = { 54 75 72 6b 69 73 68 5f 54 fc >> 72 6b 69 79 65 2e 31 32 35 34 } >> XXX debug txt: setlocales cannonname = { T u r k i s h _ T ? r >> k i y e . 1 2 5 4 } >> XXX debug raw: setup_locale_encoding = "Turkish_Türkiye.1254" >> XXX debug hex: setup_locale_encoding = { 54 75 72 6b 69 73 68 5f 54 fc >> 72 6b 69 79 65 2e 31 32 35 34 } >> XXX debug txt: setup_locale_encoding = { T u r k i s h _ T ? r >> k i y e . 1 2 5 4 } >> The database cluster will be initialized with locale >> "Turkish_Türkiye.1254". >> The default text search configuration will be set to "turkish". >> >> Data page checksums are disabled. >> >> Enter new superuser password: >> Enter it again: >> >> fixing permissions on existing directory c:/Program >> Files/PostgreSQL/15/data ... ok >> creating subdirectories ... ok >> selecting dynamic shared memory implementation ... windows >> selecting default max_connections ... 100 >> selecting default shared_buffers ... 128MB >> selecting default time zone ... UTC >> creating configuration files ... ok >> running bootstrap script ... ok >> performing post-bootstrap initialization ... ok >> syncing data to disk ... ok >> >> Success. You can now start the database server using: >> >> pg_ctl -D ^"c^:^\Program^ Files^\PostgreSQL^\15^\data^" -l logfile >> start >> > > > > v16: > >> C:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A >> scram-sha-256 -U postgres -D "c:\Program Files\P
Re: Windows installation problem at post-install step
On 2024-08-06 16:06:51 +0530, Sandeep Thakkar wrote: > On Tue, Aug 6, 2024 at 10:57 AM Thomas Munro wrote: > > On Mon, Aug 5, 2024 at 8:50 PM Sandeep Thakkar > wrote: > > This issue is seen only on v16 and not the back branches (tested on 15 > and 14) and also confirmed by @Ertan Küçükoglu at https://github.com/ > EnterpriseDB/edb-installers/issues/127#issuecomment-2268371442 > > Does that mean you can reproduce the problem with initdb.exe directly > in a shell? That is, remove the EDB installer from the picture and > compare v15 and v16 with the exact command line options that > initcluster.vbs is using, or perhaps just: > > initdb.exe --locale="Turkish,Türkiye" --encoding=UTF-8 -D pgdata > > > yes, here is the output: > > c:\Program Files\PostgreSQL\16\bin>initdb.exe --encoding=UTF-8 -A > scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\16\data" > --locale > "Turkish,Türkiye" -W > The files belonging to this database system will be owned by user > "sandeep". > This user must also own the server process. > > The database cluster will be initialized with locale > "Turkish_Türkiye.1254". I assume that "1254" here is the code page. But you specified --encoding=UTF-8 above, so your default locale uses a different encoding than the template databases. I would expect that to cause problems if the template databases contain any charecters where the encodings differ (such as "ü" in the locale name). > c:\Program Files\PostgreSQL\15\bin>initdb.exe --encoding=UTF-8 -A > scram-sha-256 -U postgres -D "c:\Program Files\PostgreSQL\15\data" > --locale > "Turkish,Türkiye" -W > XXX debug raw: getopt optarg = "Turkish,Türkiye" > XXX debug hex: getopt optarg = { 54 75 72 6b 69 73 68 2c 54 fc 72 6b 69 > 79 > 65 } > XXX debug txt: getopt optarg = { T u r k i s h , T ? r k i y > e } This also looks like window-1254 (or at least some ISO-8859 variant) to me. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
libpq version macro to use or not PQsocketPoll
Hi. Now that v17beta2 is part of my dev-env, I'm trying out the new API. And I discover there's no version macro for conditional compilation in LIBPQ... I found https://www.postgresql.org/message-id/968815.1623936849%40sss.pgh.pa.us where Tom thinks that a horrible idea, and instead proposes a new _HAS_ flag instead. Except of course there's on such flag for the new PQsocketPoll in beta2... I see only 3 such _HAS_ flags. I'm not sure what's so wrong with version macro as such. - GCC has one. - [SQLite has one][1]. - Why can't LIBPQ have one? That's exactly the kind of issues they are designed to solve for code that needs to be portable across versions. What am I missing? Thanks, --DD [1]: https://www.sqlite.org/c3ref/c_source_id.html
Re: libpq version macro to use or not PQsocketPoll
Dominique Devienne writes: > Hi. Now that v17beta2 is part of my dev-env, I'm trying out the new API. > And I discover there's no version macro for conditional compilation in > LIBPQ... Indeed, that's an oversight, and there's a number of other things we added to libpq-fe.h between 16 and 17 that probably deserve their own LIBPQ_HAS symbols. > I'm not sure what's so wrong with version macro as such. A couple of things: * It doesn't help if we forget to update it, so it's not really better than HAS_ symbols on that score. * Usage is error-prone (you might test for the wrong cutoff version) and not very readable. * We can't retroactively make such a symbol appear in old copies of libpq-fe.h. regards, tom lane
Re: libpq version macro to use or not PQsocketPoll
On Tue, Aug 6, 2024 at 4:31 PM Tom Lane wrote: > Dominique Devienne writes: > > Hi. Now that v17beta2 is part of my dev-env, I'm trying out the new API. > > And I discover there's no version macro for conditional compilation in > > LIBPQ... > > Indeed, that's an oversight, and there's a number of other things > we added to libpq-fe.h between 16 and 17 that probably deserve > their own LIBPQ_HAS symbols. OK, thanks. Glad to know they'll be in beta3 or the final release. > > I'm not sure what's so wrong with version macro as such. > A couple of things: > * It doesn't help if we forget to update it, so it's not really > better than HAS_ symbols on that score. I think it's easier to track/spot as part of the release process, and probably even automate it (or its checking). It's also a very common practice. If DRH does it for SQLite, then it has to be OK :) > * Usage is error-prone (you might test for the wrong cutoff version) > and not very readable. That's the client's responsibility. The "vendor" just has to increment/update the macros on every release. > * We can't retroactively make such a symbol appear in old copies > of libpq-fe.h. Sure. But we can start at some point. And eventually assume the macros exist, because older versions w/o them are no longer supported by your client code. And each client code base decides that. Again, the "vendor" just has to start defining the macros, and keep updating them. That's all. The rest is up to the client. My $0.02. I can use _HAS_ flags too. They are orthogonal in the first place, in fact. We can have both. Also, version macros don't multiply the way _HAS_ macros do, over time. It's neve too late to start adding them :). --DD
data checksums
Hello. I've been dealing with some database corruption events, so i've been wondering to enable data checksums on my deployments. so my question is why data checksums aren't enabled by default on pg? the pg doc mentions a considerable performance penality, how considerable it is? Thanks -- Bruno Vieira da Silva
Re: data checksums
On Tue, Aug 6, 2024 at 11:12 AM bruno vieira da silva wrote: > Hello. > I've been dealing with some database corruption events, so i've been > wondering to enable data checksums on my deployments. > > so my question is why data checksums aren't enabled by default on pg? the > pg doc > mentions a considerable performance penality, how considerable it is? > I'm going to hazard a guess that the relevant line from the docs were written many many years ago, back when servers were much slower. We certainly didn't notice any speed degradation when going from PG 9.6.24 to PG 14.12. Or... we just use enough hardware and flash-only SAN disks to make the overhead minuscule. -- Death to America, and butter sauce. Iraq lobster!
Re: data checksums
> On Aug 6, 2024, at 08:11, bruno vieira da silva wrote: > > so my question is why data checksums aren't enabled by default on pg? At this point, mostly historical reasons. They're also superfluous if your underlying file system or storage hardware does storage-level corruption checks (which most don't). > the pg doc > mentions a considerable performance penality, how considerable it is? That line is probably somewhat out of date at this point. We haven't seen a significant slowdown in enabling them on any modern hardware. I always turn them on, except on the type of filesystems/hardware mentioned above.
Re: data checksums
The penalty is not "considerable", but it is not completely trivial either. But it's more on the trivial side. Part of the problem is that it is hard to measure, as it is very workload dependent. As to why it is not the default, Postgres is very careful and conservative by default, and not everybody was convinced that enabling checksums is worth the tradeoff, especially (IIRC) with the lack of real-world examples of people discovering issues thanks to these checksums. So yes, please enable and share with us if the checksums catch anything. I think the argument is worth re-opening again, because (as far as I know), all of the companies providing Postgres support, whether completely cloud-managed, setting up a complex HA cluster, or just providing tuning advice, have enabled checksums by default for many, many years. So a big +1 from me to finally making it the default. It's a de-facto default anyway at this point. Cheers, Greg
Re: Windows installation problem at post-install step
On Tue, Aug 6, 2024 at 10:38 PM Sandeep Thakkar wrote: > On Tue, Aug 6, 2024 at 4:06 PM Sandeep Thakkar > wrote: [v15] >>> XXX debug raw: setup_locale_encoding = "Turkish_Türkiye.1254" >>> XXX debug hex: setup_locale_encoding = { 54 75 72 6b 69 73 68 5f 54 fc 72 >>> 6b 69 79 65 2e 31 32 35 34 } >>> XXX debug txt: setup_locale_encoding = { T u r k i s h _ T ? r >>> k i y e . 1 2 5 4 } >>> The database cluster will be initialized with locale "Turkish_Türkiye.1254". [v16] >>> XXX debug raw: setup_locale_encoding = "Turkish_Türkiye.1254" >>> XXX debug hex: setup_locale_encoding = { 54 75 72 6b 69 73 68 5f 54 fc 72 >>> 6b 69 79 65 2e 31 32 35 34 } >>> XXX debug txt: setup_locale_encoding = { T u r k i s h _ T ? r >>> k i y e . 1 2 5 4 } >>> The database cluster will be initialized with locale "Turkish_Türkiye.1254". OK so we see that the "Turkish,Türkiye" -> "Turkish_Türkiye.1254" transformation happens in the "cannonname" step, and then the final values are identical between the versions. >>> performing post-bootstrap initialization ... child process was terminated >>> by exception 0xC409 If I understand correctly that's abort(), so can you please run it with -d so we can maybe see some information about that? Also -n might be useful to stop it from deleting the data directory at the end in case something useful for diagnosis is in there.
Standard of data storage and transformation
Hi All, We are having a use case in which we are having transaction data for multiple customers in one of the Postgres databases(version 15.4) and we are consuming it from multiple sources(batch file processing, kafka event processing etc). It's currently stored in normalized form postgres with constraints, indexes, partitions defined. This postgres database is holding the transaction data for around a month or so. There are use cases of running online transaction search reports which will be mostly real time reporting and also some daily transaction batch reports based on customers and also month end reports for customers. In target state it will hold Approx. ~400 million transactions/day which can be billions of rows across multiple related parent/child tables. There is another requirement to send these customer transaction data to an olap system which is in a snowflake database and there it will be persisted for many years. The lag between the data in postgres/oltp and in snowflake will be ~1hr. And any reporting api can query postgres for <1 month worth of transaction data and if it needs to scan for >1month worth of transaction data, it will point to the snowflake database. Now the question which we are wondering is , should we send the data as is in normalized table form to snowflake and then there we transform/flatten the data to support the reporting use case or should we first flatten or transform the data in postgres itself and make it as another structure( for example creating materialized views on top of base table) and only then move that data to the snowflake? What is the appropriate standard and downside if we do anything different. Regards Yudhi
Re: Windows installation problem at post-install step
On Tue, Aug 6, 2024 at 11:44 PM Peter J. Holzer wrote: > I assume that "1254" here is the code page. > But you specified --encoding=UTF-8 above, so your default locale uses a > different encoding than the template databases. I would expect that to > cause problems if the template databases contain any charecters where > the encodings differ (such as "ü" in the locale name). It's weird, but on Windows, PostgreSQL allows UTF-8 encoding with any locale, and thus apparent contradictions: /* See notes in createdb() to understand these tests */ if (!(locale_enc == user_enc || locale_enc == PG_SQL_ASCII || locale_enc == -1 || #ifdef WIN32 user_enc == PG_UTF8 || #endif user_enc == PG_SQL_ASCII)) { pg_log_error("encoding mismatch"); ... and createdb's comments say that is acceptable because: * 3. selected encoding is UTF8 and platform is win32. This is because * UTF8 is a pseudo codepage that is supported in all locales since it's * converted to UTF16 before being used. At the time PostgreSQL was ported to Windows, UTF-8 was not a supported encoding in "char"-based system interfaces like strcoll_l(), and the port had to convert to "wchar_t" interfaces and call (in that example) wcscoll_l(). On modern Windows it is, and there are two locale names, with and without ".UTF-8" suffix (cf. glibc systems that have "en_US" and "en_US.UTF-8" where the suffix-less version uses whatever traditional encoding was used for that language before UTF-8 ate the world). If we were doing the Windows port today, we'd probably not have that special case for Windows, and we wouldn't have the wchar_t conversions. Then I think we'd allow only: --locale=tr-TR (defaults to --encoding=WIN1254) --locale=tr-TR --encoding=WIN1254 --locale-tr-TR.UTF-8 --locale=tr-TR.UTF-8 --encoding=UTF-8 If we come up with an automated (or even manual but documented) way to perform the "Turkish_Türkiye.1254" -> "tr-TR" upgrade as Dave was suggesting upthread, we'll probably want to be careful to tidy up these contradictory settings. For example I guess that American databases initialised by EDB's installer must be using --locale="English_United States.1252" and --encoding=UTF-8, and should be changed to "en-US.UTF-8", while those initialised by letting initdb.exe pick the encoding must be using --locale="English_United States.1252" and --encoding=WIN1252 (implicit) and should be changed to "en-US" to match the WIN1252 encoding. Only if we did that update would we be able to consider removing the extra UTF-16 conversions that are happening very frequently inside PostgreSQL code, which is a waste of CPU cycles and programmer sanity. (But that's all just speculation from studying the locale code -- I've never really used Windows.)
Re: Standard of data storage and transformation
On Tue, Aug 6, 2024 at 5:07 PM yudhi s wrote: > Hi All, > We are having a use case in which we are having transaction data for > multiple customers in one of the Postgres databases(version 15.4) and we > are consuming it from multiple sources(batch file processing, kafka event > processing etc). It's currently stored in normalized form postgres > with constraints, indexes, partitions defined. This postgres database is > holding the transaction data for around a month or so. There are use cases > of running online transaction search reports which will be mostly real time > reporting and also some daily transaction batch reports based on customers > and also month end reports for customers. In target state it will hold > Approx. ~400 million transactions/day which can be billions of rows across > multiple related parent/child tables. > > There is another requirement to send these customer transaction data to an > olap system which is in a snowflake database and there it will be persisted > for many years. The lag between the data in postgres/oltp and in snowflake > will be ~1hr. And any reporting api can query postgres for <1 month worth > of transaction data and if it needs to scan for >1month worth of > transaction data, it will point to the snowflake database. > > Now the question which we are wondering is , should we send the data as is > in normalized table form to snowflake and then there we transform/flatten > the data to support the reporting use case or should we first flatten or > transform the data in postgres itself and make it as another structure( for > example creating materialized views on top of base table) and only then > move that data to the snowflake? What is the appropriate standard and > downside if we do anything different. > Some thoughts: 0) How big are the records? 1) Materialized views add disk space overhead. 2) Materialized views are for when you query the same static over and over again. 3) IIUC, you'll be moving the data from PG to Snowflake just once. 4) Writing an MV to disk and then reading it only once doubles the IO requirements. 5) Thus, my first thought would be to extract the data from PG using a denormalizing "plain" view. 5a) If you can't make that Fast Enough, then obviously you must pull the normalized data from PG and denorm it elsewhere. You know your situation better than us. 6) Indices will be critical: not too many, but not too few. 7) Obviously consider partitioning, but note that too many partitions can make query planning MUCH slower. 7a) 31 days cut into hours means 744 partitions. That's a LOT. 7b) Partitioning every THREE hours means only 248 child tables. A lot, but much more manageable. 7c) That might well kill reporting performance, though, if it's for example one customer across many partitions. 8) You (hopefully) know what kind of queries will be run. Maybe partition by customer (or whatever) range and THEN by an hour range. 8a) You'd have to simultaneously run multiple extract jobs (on for each "customer" range), but that might not be too onerous, since then each hour partition would be smaller. 9) Testing. Nothing beats full-scale testing. -- Death to America, and butter sauce! Iraq lobster...
Streaming replication issue post upgrade from version 11 to 14 on windows 2016 Server
Hi Team, we have upgraded postgresql version 11 to 14 on windows server 2016, post upgrade we configured streaming replication. the data is not getting replicated. but compared to version 11, 14 is very different in configuring replication. Tried from documentation from postgres site. no luck, Any help appreciated. Tried many other sites for Windows side replication configuration notes. No luck Thankyou vamshi.chava
Re: data checksums
On Tue, 2024-08-06 at 09:29 -0700, Christophe Pettus wrote: > > > On Aug 6, 2024, at 08:11, bruno vieira da silva > > wrote: > > > > so my question is why data checksums aren't enabled by default on pg? > > At this point, mostly historical reasons. They're also superfluous if your > underlying > file system or storage hardware does storage-level corruption checks (which > most don't). I am surprised by that. Would you say that most storage systems will happily give you a garbage block if there was a hardware problem somewhere? > > the pg doc > > mentions a considerable performance penality, how considerable it is? > > That line is probably somewhat out of date at this point. We haven't seen a > significant > slowdown in enabling them on any modern hardware. I always turn them on, > except on the > type of filesystems/hardware mentioned above. Turning data checksums on will write WAL for hint bits, which can significantly increase the amount of WAL written. Yours, Laurenz Albe
Re: data checksums
> On Aug 6, 2024, at 19:45, Laurenz Albe wrote: > I am surprised by that. Would you say that most storage systems will happily > give you a > garbage block if there was a hardware problem somewhere? "Most" is hard for me to judge. HDDs can have uncorrected and undetected errors, definitely. ZFS, for example, can correct those (within limits); XFS doesn't try. I have been told that SSDs can have uncorrected/undetected errors as well, but I don't know details on that. > Turning data checksums on will write WAL for hint bits, which can > significantly increase > the amount of WAL written. I was curious about that, so I just did a quick experiment using pgbench, with identical databases except for checksums. They both generated the same amount of WAL within 10% or so, so I don't think the impact is huge. (And you need the hint bits for pg_rewind, which is a nice thing to have in your back pocket if required.)
ANALYZE on partitioned tables vs on individual partitions
Hello Experts, Our application has a database with a large number of partitioned tables used to store time series data. It is partitioned by time: new data is populated into the current partition, old partitions eventually get dropped, and the partitions in between are largely static. I had not realized until recently that the partitioned tables themselves are not analyzed by the autovacuum processes. I am now wondering if we should be manually analyzing those. The problem is they are quite enormous and take a long time to analyze - I tested one yesterday and it took ~30mins, and we have hundreds of them. It might not be practical to regularly analyze them. My questions are: 1. What is the consequence of not having good statistics on partitioned table level, if you do have good statistics on the partition level? 2. Which planning operations rely on partitioned table level statistics? My reason for asking is I'm trying to understand the tradeoff between the IO cost associated with frequently analyzing as opposed to possibly non-optimal plans being chosen. Thanks in advance, and sorry if this has all been covered before. I could not find anything definitive in the docs, and while I did find a few references by googling, none really explained the answers to those questions too well. Regards Mike
Re: Streaming replication issue post upgrade from version 11 to 14 on windows 2016 Server
Hi Vamsi, We have been doing these steps to setup streaming replication --Primay-Server CREATE ROLE repl REPLICATION LOGIN PASSWORD 'your_password'; postgresql.conf listen_addresses = '*' wal_level = replica archive_mode = on archive_command = 'copy %p \\path_to_archive\\%f' max_wal_senders = 3 wal_keep_size = 1024 --pg_hba.conf host replication repl standby_server_ip/32 auth_method Creating standby PGPASSWORD=repl pg_basebackup -F p -C -S pslot1 -h localhost -p 5333 -D path_to_standby_data_directory -U repl -P -v -R -X stream --standby-postgresql.conf primary_conninfo = 'host=primary_server_ip port=5432 user=repl password=your_password' restore_command = 'copy \\path_to_archive\\%f %p' # if you are using archive logs Hope I did not forget any step. Regards, Muhammad Ikram On Wed, Aug 7, 2024 at 6:11 AM Vamsi Chava wrote: > Hi Team, > > we have upgraded postgresql version 11 to 14 on windows server 2016, post > upgrade we configured streaming replication. the data is not getting > replicated. but compared to version 11, 14 is very different in > configuring replication. Tried from documentation from postgres site. no > luck, Any help appreciated. Tried many other sites for Windows side > replication configuration notes. No luck > > > Thankyou > vamshi.chava > -- Muhammad Ikram
Re: ANALYZE on partitioned tables vs on individual partitions
> On Aug 6, 2024, at 21:13, Michael Harris wrote: > > 1. What is the consequence of not having good statistics on partitioned > table level, if you do have good statistics on the partition level? The child partitions are just tables, and all of the same issues that can arise from bad statistics on a table can arise on them: specifically, bad query plans. (There are also some consequences to leaving a table unvacuumed, even an append-only table.) The root table in declarative partitioning has no rows, so there really is no such thing as "statistics at the partition level." > 2. Which planning operations rely on partitioned table level statistics? Just about every query uses the information that ANALYZE gathers in some way. If the query can't be executed in any other conceivable way than a sequential scan, or if it is a simple lookup on a unique column, the statistics will not contribute to the plan, but essentially every other plan will use them in one way or another. Child partitions should be autovacuumed and autoanalyzed just like any other table; they are not prohibited from autovacuum in any way by default. It's probably a good idea to investigate why they are not being picked up by autovacuum. If they are created by a bulk load process, it's not a bad idea to do a VACUUM ANALYZE on them once the bulk load is complete.
Re: ANALYZE on partitioned tables vs on individual partitions
On Wed, 7 Aug 2024 at 16:44, Christophe Pettus wrote: > Child partitions should be autovacuumed and autoanalyzed just like any other > table; they are not prohibited from autovacuum in any way by default. It's > probably a good idea to investigate why they are not being picked up by > autovacuum. If they are created by a bulk load process, it's not a bad idea > to do a VACUUM ANALYZE on them once the bulk load is complete. I think the complaint was about no autovacuum on the partitioned table, not the partitions. This is expected as we don't track the counters (in particular n_mod_since_analyze) shown in pg_stat_all_tables at the partitioned table level, so the trigger points that normally cause autovacuum to analyze or vacuum a table just won't be triggered for a partitioned table. For VACUUM, that's fine as, as you mentioned, no rows are stored. But for analyze, that does present a problem. To name the aspects of planning that rely on statistics of the partitioned table, basically anything above the Append or MergeAppend which joins the partitioned results together. So that doesn't include the scans of each partition and any quals that are pushed down to the scan level as those are able to use the partition level statistics. However, it does include things like joins, group by, distinct as those require n_distinct estimates for the partitioned table. It's not all bad though as the row estimates for each individual partition will be totalled up through the Append / MergeAppend simply by adding up the row estimates for each Append / MergeAppend child plan. So, it's really only an estimation problem for any node that comes after a join node or a group by node as the output rows for those nodes will depend on a good n_distinct estimate for the partitioned table. Partition-wise joins and aggregates do change things a bit as those features do permit moving those operations below the Append / Merge Append, in which case the statistics for the individual partition can be used. You could consider manually setting the n_distinct_inherited estimates for the columns that you join on or group by in the partitioned table. You might find that you're able to choose a suitable value for that if you review the documentation for that setting. In particular, please review what is mentioned about using negative numbers for that setting. You may be able to choose a value that scales correctly with the row estimate that doesn't get outdated as you add more rows to the partitions. You'll need to determine that based on the data you're storing. David