Re: Unexpected behavior with transition tables in update statement trigger
On Tue, Feb 27, 2018 at 4:18 AM, Tom Kazimiers wrote: > On Mon, Feb 26, 2018 at 11:15:44PM +1300, Thomas Munro wrote: >> On Sat, Feb 24, 2018 at 4:47 PM, Tom Kazimiers >> wrote: >> Thanks for the reproducer. Yeah, that seems to be a bug. >> nodeNamedTuplestorescan.c allocates a new read pointer for each >> separate scan of the named tuplestore, but it doesn't call >> tuplestore_select_read_pointer() so that the two scans that appear in >> your UNION ALL plan are sharing the same read pointer. At first >> glance the attached seems to fix the problem, but I'll need to look >> more carefully tomorrow. > > Thanks very much for investigating this. I can confirm that applying your > patch results in the tuples I expected in both my test trigger and my actual > trigger function. Thanks for testing. > It would be great if this or a similar fix would make it into the next > official release. Here's a new version with tuplestore_select_read_pointer() added in another place where it was lacking, and commit message. Moving to -hackers, where patches go. Here's a shorter repro. On master it prints: NOTICE: count = 1 NOTICE: count union = 1 With the patch the second number is 2, as it should be. CREATE TABLE test (i int); INSERT INTO test VALUES (1); CREATE OR REPLACE FUNCTION my_trigger_fun() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN RAISE NOTICE 'count = %', (SELECT COUNT(*) FROM new_test); RAISE NOTICE 'count union = %', (SELECT COUNT(*) FROM (SELECT * FROM new_test UNION ALL SELECT * FROM new_test) ss); RETURN NULL; END; $$; CREATE TRIGGER my_trigger AFTER UPDATE ON test REFERENCING NEW TABLE AS new_test OLD TABLE as old_test FOR EACH STATEMENT EXECUTE PROCEDURE my_trigger_fun(); UPDATE test SET i = i; -- Thomas Munro http://www.enterprisedb.com 0001-Fix-tuplestore-read-pointer-confusion-in-nodeNamedtu.patch Description: Binary data
Re: Unexpected behavior with transition tables in update statement trigger
On Wed, Feb 28, 2018 at 9:58 AM, Tom Lane wrote: > Thomas Munro writes: >> Here's a new version with tuplestore_select_read_pointer() added in >> another place where it was lacking, and commit message. Moving to >> -hackers, where patches go. > > Pushed, along with a regression test based on your example. > Unfortunately, this came in a bit too late for this week's releases :-( Thanks! Tom K, if you need a workaround before 10.4 comes out in May[1], you could try selecting the whole transition table into a CTE up front. Something like WITH my_copy AS (SELECT * FROM new_table) SELECT * FROM my_copy UNION ALL SELECT * FROM my_copy should work. [1] https://www.postgresql.org/developer/roadmap/ -- Thomas Munro http://www.enterprisedb.com
Re: Concurrent CTE
On Wed, Apr 4, 2018 at 8:01 AM, Artur Formella wrote: > Execution now: > time--> > Thread1: aa | bb | cc | dd | ee | ff | gg | hh | primary > > And the question: is it possible to achieve more concurrent execution plan > to reduce the response time? For example: > Thread1: aa | dd | ff | primary > Thread2: bb | ee | gg > Thread3: cc | -- | hh Parallel query can't be used for CTE queries currently. Other good things also don't happen when you use CTEs -- it's an "optimiser fence" (though there is discussion of changing that eventually). Maybe try rewriting your query as: SELECT ... FROM (SELECT ...) AS aa, (SELECT ...) AS bb, ... Note that in the form of parallelism supported in PostgreSQL 10, every process (we use processes instead of threads) runs the same execution plan at the same time, but gives each worker only a part of the problem using disk block granularity, so it looks more like this: Process1: fragments of aa | fragments of bb | ... Process2: fragments of aa | fragments of bb | ... PostgreSQL 11 (not yet released) will introduce an exception that looks more like what you showed: the Parallel Append operator (for unions and scans of partitions) can give each worker a different part of the plan approximately as you showed, but IIUC that's used as a fallback strategy when it can't use block granularity (because of technical restrictions). The problem with sub-plan granularity is that the various sub-plans can finish at different times leaving some CPU cores with nothing to do while others are still working, whereas block granularity keeps everyone busy until the work is done and should finish faster. -- Thomas Munro http://www.enterprisedb.com
Re: Concurrent CTE
On Thu, Apr 5, 2018 at 5:16 PM, David G. Johnston wrote: > On Wed, Apr 4, 2018 at 10:12 PM, Thomas Munro > wrote: >> >> Parallel query can't be used for CTE queries currently. > > A pointer to the location in the docs covering this limitation would be > appreciated. It isn't covered here: > > https://www.postgresql.org/docs/10/static/when-can-parallel-query-be-used.html https://www.postgresql.org/docs/10/static/parallel-safety.html -- Thomas Munro http://www.enterprisedb.com
Re: pg_multixact/members growing
On Wed, May 23, 2018 at 7:49 AM, Tom Lane wrote: > Tiffany Thang writes: >> Our pg_multixact/members directory has been growing to more than 18GB over >> the last couple of months. According to the documentation, the files in >> there are used to support row locking by multiple transactions and when all >> tables in all databases are eventually scanned by VACUUM, the older >> multixacts are removed. In our case, the files are not removed. > > Hmm. What does pg_controldata tell you about NextMultiXactId, > NextMultiOffset, oldestMultiXid, oldestMulti's DB? > Are pg_clog/ or pg_subtrans/ or pg_multixact/offsets/ getting large? > Is there anything at all in pg_twophase/? Is this system a replication > master, and if so are any of its slaves lagging behind? Some thoughts: There are MULTIXACT_MEMBERS_PER_PAGE = 1636 members for every 8KB page. The reported directory size implies 18GB / 8KB * 1636 = 3,859,808,256 members. Above MULTIXACT_MEMBER_SAFE_THRESHOLD = 2,147,483,647 we should be triggering emergency autovacuums to try to reclaim space. Only ~435 million more members can be created. Is this system now aggressively running "wraparound prevention" autovacuums? There are MULTIXACT_OFFSETS_PER_PAGE = 2048 multixacts for every 8KB page, so the default autovacuum_multixact_freeze_max_age should soft-cap the size of pg_multixact/offsets at around 1.5GB ~= 400,000,000 / 2048 * 8KB. Unfortunately autovacuum_multixact_freeze_max_age doesn't impose any limit on the number of members. The totals can be quite explosive with high numbers of backends, because when n backends share lock a row we make O(n) multixacts and O(n^2) members. First we make a multixact with 2 members, then a new one with 3 members, etc... so that's n - 1 multixacts and (n * (n + 1)) / 2 - 1 members. -- Thomas Munro http://www.enterprisedb.com
Re: Can PostgreSQL create new WAL files instead of reusing old ones?
On Fri, Jun 22, 2018 at 11:22 AM, David Pacheco wrote: > On Wed, Jun 20, 2018 at 10:35 AM, Jerry Jelinek > wrote: >> I have implemented a small change to allow WAL recycling to be disabled. >> It is visible at: >> https://cr.joyent.us/#/c/4263/ >> >> I'd appreciate getting any feedback on this. >> tl;dr: We've found that under many conditions, PostgreSQL's re-use of old >> WAL files appears to significantly degrade query latency on ZFS. I haven't tested by it looks reasonable to me. It needs documentation in doc/src/sgml/config.sgml. It should be listed in src/backend/utils/misc/postgresql.conf.sample. We'd want a patch against our master branch. Could you please register it in commitfest.postgresql.org so we don't lose track of it? Hey, a question about PostgreSQL on ZFS: what do you guys think about pg_flush_data() in fd.c? It does mmap(), msync(), munmap() to try to influence writeback? I wonder if at least on some operating systems that schlepps a bunch of data out of ZFS ARC into OS page cache, kinda trashing the latter? -- Thomas Munro http://www.enterprisedb.com
Re: incorrect resource manager data checksum in record
On Fri, Jun 29, 2018 at 5:44 AM, Devin Christensen wrote: > The pattern is the same, regardless of ubuntu or postgresql versions. I'm > concerned this is somehow a ZFS corruption bug, because the error always > occurs downstream of the first ZFS node and ZFS is a recent addition. I > don't know enough about what this error means, and haven't found much > online. When I restart the nodes effected, replication resumes normally, > with no known side-effects that I've discovered so far, but I'm no longer > confident that the data downstream from the primary is valid. Really not > sure how best to start tackling this issue, and hoping to get some guidance. > The error is infrequent. We have 11 total replication chains, and this error > has occurred on 5 of those chains in approximately 2 months. It's possible and sometimes expected to see that error when there has been a crash, but you didn't mention that. From your description it sounds like it's happening in the middle of streaming, right? My first thought was that the filesystem change is surely a red herring. But... I did find this similar complaint that involves an ext4 primary and a btrfs replica: https://dba.stackexchange.com/questions/116569/postgresql-docker-incorrect-resource-manager-data-checksum-in-record-at-46f-6 I'm having trouble imagining how the filesystem could be triggering a problem though (unless ZoL is dramatically less stable than on other operating systems, "ZFS ate my bytes" seems like a super unlikely theory). Perhaps by being slower, it triggers a bug elsewhere? We did have a report recently of ZFS recycling WAL files very slowly (presumably because when it moves the old file to become the new file, it finishes up slurping it back into memory even though we're just going to overwrite it, and it can't see that because our writes don't line up with the ZFS record size, possibly unlike ye olde write-in-place 4k block filesystems, but that's just my guess). Does your machine have ECC RAM? -- Thomas Munro http://www.enterprisedb.com
Re: incorrect resource manager data checksum in record
On Fri, Jun 29, 2018 at 1:14 PM, Devin Christensen wrote: >> From your description it sounds like it's happening in the middle of >> streaming, right? > > Correct. None of the instances in the chain experience a crash. Most of the > time I see the "incorrect resource manager data checksum in record" error, > but I've also seen it manifested as: > > invalid magic number 8813 in log segment 0003AEC2009C, offset > 15335424 I note that that isn't at a segment boundary. Is that also the case for the other error? One theory would be that there is a subtle FS cache coherency problem between writes and reads of a file from different processes (causality), on that particular stack. Maybe not too many programs pass data through files with IPC to signal progress in this kinda funky way, but that'd certainly be a violation of POSIX if it didn't work correctly and I think people would know about that so I feel a bit silly suggesting it. To follow that hypothesis to the next step: I suppose it succeeds after you restart because it requests the whole segment again and gets a coherent copy all the way down the chain. Another idea would be that our flush pointer tracking and IPC is somehow subtly wrong and that's exposed by different timing leading to incoherent reads, but I feel like we would know about that by now too. I'm not really a replication expert, so I could be missing something simple here. Anyone? >> I did find this similar complaint that involves an ext4 primary and a >> btrfs replica: > > It is interesting that my issue occurs on the first hop from ZFS to ext4. I > have not seen any instances of this happening going from the ext4 primary to > the first ZFS replica. I happen to have a little office server that uses ZFS so I left it chugging through a massive pgbench session with a chain of 3 replicas while I worked on other stuff, and didn't see any problems (no ext4 involved though, this is a FreeBSD box). I also tried --wal-segsize=1MB (a feature of 11) to get some more frequent recycling happening just in case it was relevant. >> We did have a report recently of ZFS recycling WAL files very slowly > > Do you know what version of ZFS that effected? We're currently on 0.6.5.6, > but could upgrade to 0.7.5 on Ubuntu 18.04 I think that issue is fundamental/all versions, and has something to with the record size (if you have 128KB ZFS records and someone writes 8KB, it probably needs to read a whole 128KB record in, whereas with ext4 et al you have 4KB blocks and the OS can very often skip reading it in because it can see you're entirely overwriting blocks), and possibly the COW design too (I dunno). Here's the recent thread, which points back to an older one, from some Joyent guys who I gather are heavy ZFS users: https://www.postgresql.org/message-id/flat/CACPQ5FpEY9CfUF6XKs5sBBuaOoGEiO8KD4SuX06wa4ATsesaqg%40mail.gmail.com There was a ZoL bug that made headlines recently but that was in 0.7.7 so not relevant to your case. -- Thomas Munro http://www.enterprisedb.com
Re: Why the sql is not executed in parallel mode
On Wed, Sep 19, 2018 at 1:53 PM jimmy wrote: > > Why the sql is not executed in parallel mode, does the sql has some problem? > with sql1 as Hello Jimmy, WITH is the problem. From the manual[1]: "The following operations are always parallel restricted. Scans of common table expressions (CTEs). ...". That means that these CTEs can only be scanned in the leader process. If you rewrite the query using sub selects it might do better. FWIW there is a project to make WITH work like subselects automatically in a future release of PostgreSQL: https://www.postgresql.org/message-id/flat/87sh48ffhb@news-spur.riddles.org.uk [1] https://www.postgresql.org/docs/10/static/parallel-safety.html -- Thomas Munro http://www.enterprisedb.com
Re: how to know whether query data from memory after pg_prewarm
On Wed, Sep 19, 2018 at 1:35 PM jimmy wrote: > I use select pg_prewarm('table1','read','main') to load data of table1 into > the memory. > when I use select count(1) from table1 group by aa to query data. > I find the speed of query is not fast, I wonder whether it query data from > memory. > And it is slower than Oracle, both of Oracle and Postgresql has same table > and count of data. > when pg_prewarm use 'read' mode, the data is put into the OS cache, how to > examine the table which is pg_prewarmed into the OS cache . > I know pg_buffercache ,but it just examine the table in the shared buffer of > Postgresql, not the table in the OS cache. This is a quick and dirty hack, but it might do what you want: https://github.com/macdice/pgdata_mincore Tested on FreeBSD, not sure how well it'll travel. -- Thomas Munro http://www.enterprisedb.com
Re: how to know whether query data from memory after pg_prewarm
On Wed, Sep 19, 2018 at 7:44 PM Cédric Villemain wrote: > Le 19/09/2018 à 05:29, Thomas Munro a écrit : > > On Wed, Sep 19, 2018 at 1:35 PM jimmy wrote: > >> I use select pg_prewarm('table1','read','main') to load data of table1 > >> into the memory. > >> when I use select count(1) from table1 group by aa to query data. > >> I find the speed of query is not fast, I wonder whether it query data from > >> memory. > >> And it is slower than Oracle, both of Oracle and Postgresql has same table > >> and count of data. > >> when pg_prewarm use 'read' mode, the data is put into the OS cache, how > >> to examine the table which is pg_prewarmed into the OS cache . > >> I know pg_buffercache ,but it just examine the table in the shared buffer > >> of Postgresql, not the table in the OS cache. > > > > This is a quick and dirty hack, but it might do what you want: > > > > https://github.com/macdice/pgdata_mincore > > > > Tested on FreeBSD, not sure how well it'll travel. > > You can use pgfincore extension for that purpose, and more. > > https://github.com/klando/pgfincore/blob/master/README.md Yes, if you only want to know *how many* pages are in the OS page cache. pgdata_mincore shows you which PG blocks are in the page cache in the same format as pg_buffercache, which is useful for studying double buffering effects. Maybe I should turn it into a patch for pgfincore... -- Thomas Munro http://www.enterprisedb.com
Re: We are facing "PANIC: could not read from control file:Success error while starting the database.
On Fri, Oct 5, 2018 at 4:29 AM Raghavendra Rao J S V wrote: > PANIC: could not read from control file:Success That means that the pg_control file is the wrong size. What size is it? What filesystem is this, that allowed an out-of-space condition to result in a file being truncated? Normally we only overwrite that file, so after creation it should stay the same size. -- Thomas Munro http://www.enterprisedb.com
Re: OOM with many sorts
On Tue, Jul 9, 2019 at 4:44 AM Justin Pryzby wrote: > [...many partial GroupAggregate/Sort repeated > here for ~200 child tables...] > > This isn't the first time I've had to reduce work_mem on a parallel query to > avoid OOM, but it seems unreasonable to be concerned with 50MB sorts. This is a fundamental problem with our model of memory budgets. To oversimplify massively, we can use about work_mem * ntables * the number of parallel workers. Core counts are going up, and now we have a convenient way to get large values of ntables. One of many -hackers threads to discuss the problem: https://www.postgresql.org/message-id/flat/CAH2-WzmNwV%3DLfDRXPsmCqgmm91mp%3D2b4FvXNF%3DcCvMrb8YFLfQ%40mail.gmail.com > It looks like the results of each Sort node stay in RAM, during processing of > additional sort nodes (is that required?) That's a very interesting question linked to whole-query memory budgets. If you don't have a whole-query memory budget, then you have nothing to lose by keeping hash and sort results in memory, and you gain the ability to do cheap rescans (if the node happens to be in a nest loop). I discussed this in the context of hash joins over here: https://www.postgresql.org/message-id/CAEepm%3D0N6DODN7nx6Zb93YOW-y%3DRftNNFZJRaLyG6jbJHJVjsA%40mail.gmail.com The TL;DR is that some other databases throw out eg hash tables as soon as possible, and consider changing the shape of their join nesting in order to minimise the total number of hash tables in memory at once, in order to come in under a certain budget for memory used at any one point in time. That is, they don't just divide query_work_mem up over all the nodes, they understand things about when nodes will allocate and free memory. There be some less clever things we can do, though, before we tackle the big problems involved. We could probably still opportunistically give back memory sooner, when we know there is no chance of rescan, and other things along those lines. > Thanks in advance for any advice. I think it's impossible to choose a single value for work_mem if you have a mixture of types of queries that hit wildly different numbers of partitions and workers. I think this is an ongoing topic for -hackers. -- Thomas Munro https://enterprisedb.com
Re: PGSQL 11.4: shared_buffers and /dev/shm size
On Tue, Jul 9, 2019 at 10:15 PM Jean Louis wrote: > * Konstantin Malanchev [2019-07-09 12:10]: > > I have 8 GB RAM and /dev/shm size is 4GB, and there is no significant > > memory usage by other system processes. I surprised that Postgres uses more > > space in /dev/shm than sharred_buffers parameter allows, probably I don't > > understand what this parameter means. > > > > I have no opportunity to enlarge total RAM and probably this query requires > > too much RAM to execute. Should Postgres just use HDD as temporary storage > > in this case? > > That I cannot know. I know that /dev/shm could > grow as much as available free RAM. Hi, PostgreSQL creates segments in /dev/shm for parallel queries (via shm_open()), not for shared buffers. The amount used is controlled by work_mem. Queries can use up to work_mem for each node you see in the EXPLAIN plan, and for each process, so it can be quite a lot if you have lots of parallel worker processes and/or lots of tables/partitions being sorted or hashed in your query. -- Thomas Munro https://enterprisedb.com
Re: PGSQL 11.4: shared_buffers and /dev/shm size
On Tue, Jul 9, 2019 at 11:11 PM Konstantin Malanchev wrote: > Thank you for explanation. work_mem = 512MB and > max_parallel_workers_per_gather = 2 and I run only one Postgres instance and > only one query. EXPLAIN shows "Workers Planned: 2" for this query. Why it can > use more than 1GB of /dev/shm? For example, if you have one Parallel Hash Join in your plan, it could allocate up to 512MB * 3 of shared memory (3 = leader process + 2 workers). It sounds like you'll need to set work_mem smaller. If you run EXPLAIN ANALYZE you'll see how much memory is used by individual operations. Usually it's regular private anonymous memory, but for Parallel Hash it's /dev/shm memory. -- Thomas Munro https://enterprisedb.com
Re: PGSQL 11.4: shared_buffers and /dev/shm size
On Wed, Jul 10, 2019 at 12:27 AM Konstantin Malanchev wrote: > I'm executing the query with smaller work_mem, it will take some time. But I > still confused why it used all /dev/shm (4GB) and fails with "no space left" > error while work_mem = 512MB. I think it could fail that way for two reasons: /dev/shm size limit (mount option, which I think you are saying you have set to 4GB?), or your system ran out of RAM +swap. The directly listing in your first message only shows 1.4GB, not 4GB, so perhaps it's the second problem. > -> Parallel Hash Join (cost=11030236131.39..255829470118.27 > rows=18618961759219 width=31) >Hash Cond: (another_mat_view.oid = my_table.oid) >-> Parallel Seq Scan on another_mat_view > (cost=0.00..652514.56 rows=31645556 width=8) >-> Parallel Hash (cost=636676233.38..636676233.38 > rows=20353804801 width=31) > -> Parallel Seq Scan on my_table > (cost=0.00..636676233.38 rows=20353804801 width=31) >Filter: (flag = ''::bit(16)) It's strange that it's hashing the ~20B row table instead of the ~30M row table. -- Thomas Munro https://enterprisedb.com
Re: ERROR: too many dynamic shared memory segments
On Wed, Sep 11, 2019 at 11:20 PM Nicola Contu wrote: > If the error persist I will try to revert the work_mem. > Thanks a lot Hi Nicola, It's hard to say exactly what the cause of the problem is in your case and how to avoid it, without knowing what your query plans look like. PostgreSQL allows 64 + 2 * max_connections segments to exist a time, and it needs a number of them that depends on work_mem (in the case of Parallel Hash Join and Parallel Bitmap Index Scan), and also depends on the number of Gather nodes that appear in the plan, which in some unusual cases can result from partitioning. I've seen people reaching this error by running a lot of parallel queries concurrently. If that's the cause, then you can definitely get some relief by turning work_mem down, or by turning max_connections up (even though you don't want to allow more connections -- because it influences the formula for deciding on the DSM segment limit). We should probably adjust some of the internal constants to give us more slots, to avoid that problem, as discussed here: https://www.postgresql.org/message-id/flat/CA%2BhUKGL6H2BpGbiF7Lj6QiTjTGyTLW_vLR%3DSn2tEBeTcYXiMKw%40mail.gmail.com I've also seen people reaching this error by somehow coming up with plans that had a very large number of Gather nodes in them, corresponding to partitions; that's probably a bad plan (it'd presumably be better to terminate parallelism higher up in the plan, but these plans do seem to exist in the wild; I don't recall exactly why). I think we need a bit of a redesign so that if there are multiple Gather nodes, they share the same main DSM segment, instead of blowing through this limit. -- Thomas Munro https://enterprisedb.com
Re: PostgreSQL - unrecognized win32 error code: 38
On Tue, Oct 29, 2019 at 9:23 PM ZhenHua Cai wrote: > No, it doesn't call any in-core code. I wondered if this could be coming from the new code in src/port/pg_p{read,write}.c. ERROR_HANDLE_EOF is a documented GetLastError() return value after ReadFile() fails[1], but only for asynchronous files. We are using that interface ("overlapped" IO, their name for asynchronous IO, but the same interface can also do synchronous IO and happens to support an offset like Unix's pread()), but we're not opening file handles with FILE_FLAG_OVERLAPPED so we have a plain old synchronous handle here. Searching the web for ERROR_HANDLE_EOF leads to a least one discussion[2] of "mailslots" (like our syslog pipe?) and a couple of possible underlying NT errors, and following those leads some other stuff about SMB filesystems and sockets. Is your database running on a local or SMB (or other remote) filesystem? Are you using the PostgreSQL syslog process? I don't know anything about Windows at all but from these breadcrumbs I feel like there has to be a network or asynchronous communication involved somewhere here, not a plain old end-of-file on a regular synchronous file operation, which, as Tom said, we have code paths for that don't look like this (because that just shows up as 0 bytes read). [1] https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfile [2] https://stackoverflow.com/questions/56510727/what-causes-writefile-to-return-error-38-error-handle-eof
Re: here does postgres take its timezone information from?
On Wed, Nov 6, 2019 at 12:02 PM Adrian Klaver wrote: > On 11/5/19 3:00 PM, Chris Withers wrote: > > Hmm. Is there any option to use the system timezone packages? > > https://www.postgresql.org/docs/11/install-procedure.html > > --with-system-tzdata=DIRECTORY By the way, you can see if your installation of PostgreSQL was built to use system-provided tzdata by running the pg_config program that was installed alongside it. That could be useful if it was built by a package maintainer (Debian etc) and you want to see how they configured it. You'll see something like CONFIGURE = '... --with-system-tzdata=/usr/share/... ' if it's using OS vendor tzdata files. I hope that most distributions do that*, because otherwise you could finish up with lots of out-of-sync copies of the tzdata database inside your database, your JVM, your libc, etc etc, and you want a single source of truth for that stuff. Once I was involved in rolling out a last minute DST rule change that happened in Australia due to politicians and an international sporting event, and we had to go hunting for copies of tzdata hiding on our servers that had to agree on when the financial markets were opening... we found many copies, and ever since then I complain wherever I see packages shipping their own copies of this stuff... Assuming you are using system tzdata, your other question was what you need to do after the tzdata files have been updated. I suspect that new PostgreSQL database sessions (processes) will see the new rules, but existing sessions may continue to see the old rules if they had loaded them already, because we cache them in per-process memory (see pg_tzset()). It would probably be safest to restart the PostgreSQL cluster. If you're using PostgreSQL's build-in tzdata, then you'll need to restart your cluster anyway once you install the version that shipped with the new tzdata rules, and depending on your package manager, that might happen automatically when you upgrade. *It looks like FreeBSD's port uses the copy of tzdata from the PostgreSQL source tree by default and thus that is what you get if you install PostgreSQL with "pkg". That's not a great default IMHO and should be changed.
Re: here does postgres take its timezone information from?
On Wed, Nov 6, 2019 at 2:20 PM Tom Lane wrote: > Thomas Munro writes: > > On Wed, Nov 6, 2019 at 12:02 PM Adrian Klaver > > wrote: > >> On 11/5/19 3:00 PM, Chris Withers wrote: > >>> Hmm. Is there any option to use the system timezone packages? > > >> --with-system-tzdata=DIRECTORY > > > I hope that most distributions do that*, because otherwise you > > could finish up with lots of out-of-sync copies of the tzdata database > > inside your database, your JVM, your libc, etc etc, and you want a > > single source of truth for that stuff. > > Right. Our recommendation is to use --with-system-tzdata if you're on > a platform where the vendor updates their copy of tzdata regularly. > The fact that we supply a copy of tzdata at all is really just a fallback > for folks on poorly-supported platforms. (Naming no names here ...) Incidentally, that's also why I don't want to give up on libc collations quite as easily as some, despite their limitations. It should be possible to get all the software on your system to agree on the ordering of two strings and the current time! > > *It looks like FreeBSD's port uses the copy of tzdata from the > > PostgreSQL source tree by default and thus that is what you get if you > > install PostgreSQL with "pkg". That's not a great default IMHO and > > should be changed. > > Ugh. Who can we poke there? Maybe Palle? (CCed).
Re: here does postgres take its timezone information from?
On Wed, Nov 13, 2019 at 3:40 AM Palle Girgensohn wrote: > The decision to use postgresql's tzdata is quite old. It was based on the > assumption that postgres is updated more frequently than the operating > system, and that for that reason it was better to use postgresql's tzdata, > since it would be more accurate more often. This is probably not true > anymore, so I agree it should probably be changed to default=system-tzdata on > FreeBSD. > > I will commit an upgrade in Thursday, and unless anybody raise their voice > against it, I will change the default setting then. Thanks! FreeBSD users already have the choice between zoneinfo from base or the misc/zoneinfo port if for some reason they want to control tzdata updates separately. PostgreSQL and FreeBSD both track tzdata closely, and both pushed a commit for version 2019c into their stable branches within a couple of weeks of it being released, so I don't foresee any problem with this change, and staying in sync with libc seems to outweigh any other concerns IMHO. https://www.iana.org/time-zones https://github.com/postgres/postgres/tree/REL_12_STABLE/src/timezone/data https://github.com/freebsd/freebsd/tree/stable/12/contrib/tzdata
Re: here does postgres take its timezone information from?
On Sat, Nov 16, 2019 at 8:38 AM Tom Lane wrote: > Palle Girgensohn writes: > >> 6 nov. 2019 kl. 03:03 skrev Thomas Munro : > >>> *It looks like FreeBSD's port uses the copy of tzdata from the > >>> PostgreSQL source tree by default and thus that is what you get if you > >>> install PostgreSQL with "pkg". That's not a great default IMHO and > >>> should be changed. > > > The decision to use postgresql's tzdata is quite old. It was based on the > > assumption that postgres is updated more frequently than the operating > > system, and that for that reason it was better to use postgresql's tzdata, > > since it would be more accurate more often. This is probably not true > > anymore, so I agree it should probably be changed to default=system-tzdata > > on FreeBSD. > > I will commit an upgrade in Thursday, and unless anybody raise their voice > > against it, I will change the default setting then. > > So it seems that that change was not entirely without fallout: > > https://www.postgresql.org/message-id/flat/16118-ef1e45e342c52416%40postgresql.org > > I don't think this is reason to revert the change, exactly, > but it's a concern. I wonder why FreeBSD editorializes on > the set of zone names? Ugh. It doesn't have the old backward compatibility names like US/Pacific installed by default, which is a problem if that's what initdb picked for your cluster (or you've stored references to any of those names in other ways).
Re: here does postgres take its timezone information from?
On Sat, Nov 16, 2019 at 7:13 PM Tom Lane wrote: > Palle Girgensohn writes: > > 15 nov. 2019 kl. 21:32 skrev Thomas Munro : > >> Ugh. It doesn't have the old backward compatibility names like > >> US/Pacific installed by default, which is a problem if that's what > >> initdb picked for your cluster (or you've stored references to any of > >> those names in other ways). > > > One quick fix is to revert the change. Tom thinks this is not reason to > > revert. Would it be enough to edit the postgresql.conf to use the correct > > "modern" name for US/Pacific (PST?)? In rhar case, an update note might be > > sufficient? > > I think the "official" name of that zone is America/Los_Angeles. > But initdb might seize on the US/Pacific alias, if available, > because it's shorter. We've seen related problems with other > time zone names, though usually it was just cosmetic and not a > reason for the postmaster to fail to start. > > Yes, changing the zone name in postgresql.conf should be a sufficient > fix. In theory, a FreeBSD user ought to know the "official" alias > for their zone, since the rest of the system would expect that. > So this is slightly tedious if initdb chose a non-official alias, > but I don't think it's reason to panic. Perhaps the best thing would be to revert this for the older PostgreSQL releases so that people doing minor version upgrades are inconvenienced by a system that can't start up after "pkg upgrade", but do it for 12 since not many people will be using that yet?
Re: PostgreSQL - unrecognized win32 error code: 38
On Wed, Oct 30, 2019 at 12:13 AM Thomas Munro wrote: > On Tue, Oct 29, 2019 at 9:23 PM ZhenHua Cai wrote: > > No, it doesn't call any in-core code. > > I wondered if this could be coming from the new code in > src/port/pg_p{read,write}.c. ERROR_HANDLE_EOF is a documented > GetLastError() return value after ReadFile() fails[1], but only for > asynchronous files. We are using that interface ("overlapped" IO, > their name for asynchronous IO, but the same interface can also do > synchronous IO and happens to support an offset like Unix's pread()), > but we're not opening file handles with FILE_FLAG_OVERLAPPED so we > have a plain old synchronous handle here. > [1] > https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfile Hi, With a little help we got to the bottom of this: https://www.postgresql.org/message-id/flat/CAC%2BAXB0_zNFUH1BrRGKUkFxvq3SxsojroJN70iTUA1gUNn_gag%40mail.gmail.com#cb324760c6a142ec4ed6abefa8e68f18 It should be fixed in the next release. Thanks for the report!
Re: tcp keep alive don't work when the backend is busy
On Wed, Dec 11, 2019 at 4:17 AM Fabio Ugo Venchiarutti wrote: > On 10/12/2019 15:06, Tom Lane wrote: > > =?utf-8?B?0J7Qu9C10LMg0KHQsNC80L7QudC70L7Qsg==?= writes: > >> According to the documentation > >> https://www.postgresql.org/docs/12/runtime-config-connection.html > >> A backend must check connection to the client by tcp_keepalive messages. > >> (Config option tcp_keepalives_idle). > > > >> But this is don't work if the backend is busy. > > > > You're reading something into the documentation that isn't there. > > > > The TCP keepalive mechanism is something that the OS does, independently > > of backend processing. The backend isn't going to notice loss of client > > connection until it tries to read or write on the connection. > > > > If it were free to improve this, we might do so. But it would be > > very much not free. > > At what points does the backend bite the bullet to test the state of > that file descriptor? > > I'd expect select() and poll() to return immediately when keepalive > probes timeout, so idling clients are covered (and that's the main use > case); does any other code path go out if its way to ensure that there's > still a client without actually needing to read()/write()/send()/recv()? > (obviously at the cost you mentioned) It has been proposed that busy backends should (optionally) periodically try to do a MSG_PEEK so they can learn about a client that has gone away some time before they eventually try to write: https://www.postgresql.org/message-id/flat/77def86b27e41f0efcba411460e929ae%40postgrespro.ru More work is needed to move that forward, though.
Re: When are Predicate Locks Freed
On Fri, Dec 13, 2019 at 5:00 AM Dave Halter wrote: > I have worked quite a bit with serializable transactions. I'm trying > to understand when predicate locks are freed. I read the whole > README-SSI [1], but I'm still not sure when a predicate lock gets > dropped. > > What I learned from that README is that predicate locks are not freed > at the moment where a transaction passes, but only at a later stage. > This makes sense, because some concurrent transactions might be > invalidated by the predicate locks that are not needed anymore for a > transaction that was just committed. Right. Even though T1 has committed, T2 and T3 can create a "dangerous cycle" by touching data that T1 accessed, meaning that they can't all be allowed to commit because what they've seen isn't consistent with any serial ordering. So the ghost of T1 continues to affect other transactions, and predicate.c needs to figure out the correct poltergeist duration. > What I don't understand is: We have only short transaction (<1s), but a > > select count(*) from pg_locks where mode = 'SIReadLock'; > > would vary by a lot. It typically went up all the way to 300k. At this > point it would drop to almost zero (always <10k) and rise again. This > cycle pretty consistently happened every three minutes. (I think it > looked a bit more like 2 minutes 50 seconds, but I wasn't measuring it > exactly). Is there a "garbage collection" happening every few minutes? > Since we scaled up our service by quite a bit, we increased > `max_pred_locks_per_transaction`. Now we are at 1.0m to 1.3m > predicate locks with the same ~3 minute cleanup period. Garbage collection happens in ClearOldPredicateLocks() when a SERIALIZABLE transactions end. This is triggered every time the oldest serializable transaction finishes (according to the "xmin"): /* * Check whether it's time to clean up old transactions. This can only be * done when the last serializable transaction with the oldest xmin among * serializable transactions completes. We then find the "new oldest" * xmin and purge any transactions which finished before this transaction * was launched. */ > Does anybody have any insights in why this might be happening? I would > also be very interested in good explanations about when predicate > locks are actually freed. I tried reading the C code, but couldn't > really figure out where that was happening. Do you have long running serializable transactions? They would prevent cleanup of any number of short serializable transactions that start later. Can you tell us a bit about your workload, number of concurrent queries, range of query durations? Do you see a lot of files under pg_serial (this is where finished transactions get spilled to disk, and with the large numbers you're reporting, perhaps there would be some of that happening too).
Re: Row locks, SKIP LOCKED, and transactions
On Wed, Dec 18, 2019 at 5:12 AM Steven Winfield wrote: > * I observe this even if I crank up the transaction isolation level to > repeatable read and serializable. Huh. SERIALIZABLE shouldn't allow two transactions to see no result row for a given ID and then insert a result row for that ID. One of those transactions should have to roll back, because otherwise it'd be incompatible with both serial orderings of the two transactions. > I'm wondering if row locks are not obeying the same transactional semantics > as row data, They are indeed a bit weird. They sometimes check if the condition still apply (since the row might have changed between the scan and LockRows node) which leads to some interesting effects, but only if the row being locked was concurrently updated, and here that isn't the case. This is a source of a fair amount of confusion about FOR UPDATE and joins/subselects. > Perhaps this is a misuse of the locking system, since I'm locking a row "FOR > UPDATE" but not actually updating it, but as row locks are released at the > end of a transaction (according to the docs) then my expectation was for the > unlocking and the visibility of newly committed rows to be atomic. > I've tried FOR NO KEY UPDATE too, without luck. > > If I'm doing something forbidden (and the docs say so) then I'd be grateful > if someone could point that out! Conceptually, the thing you really need to lock for this to work is the result row that isn't there yet, so that some overlapping transaction doesn't try to lock the same absent thing. Unfortunately, our system for locking things that aren't there isn't there either. Some articles on serializability talk about "materialising the conflict", which means locking some other surrogate thing that "covers" a gap you are interested in. You might think the job row would do the trick, but since we don't recheck the condition (that is, recheck that there is no corresponding result because you don't update the job row), no cigar. You could also use plain old pg_try_advisory_xact_lock(id), because it just locks integers, and they're always there. SERIALIZABLE deals with that type of magic internally (it locks gaps in key ranges by predicate-locking a physical btree or hash page that you'd need to write on to insert a row with a matching key, which is how it discovers a conflict between one transaction that went looking for key=42 but didn't find it and another that later writes key=42), but, as mentioned, SERIALIZABLE doesn't really allow concurrency with this workload, and you specified that you wanted concurrency with SKIP LOCKED (but I think you'd have the same problem without it; SKIP LOCKED just gets you the wrong answer faster). There are various ways you could deal with this, but I'd probably go for a simple scheme where you only have to consult a single row to know if you can claim it. You could still put the results into a separate table, but use job.state to find work, and set it to DONE when you insert the result. It may also be possible to add no new columns but do a dummy update to the job row to get the join qual rechecked, but I'm not sure if that'd work. Another reason to add a state column to the job table is so that you can put a conditional index on it so you can find jobs to be done very quickly, if you're not planning to remove the ones that are done.
Re: Writing Postgres Extensions in C on Windows
On Wed, Jan 8, 2020 at 4:32 AM İlyas Derse wrote: > I want to register C code to PostgreSql on Windows. So I think, I have to > make a extension for PostgreSql. But I did not find to written extension on > windows. Do you have an idea ? I don't do Windows myself but this blog from Craig Ringer looks like a good starting point: https://www.2ndquadrant.com/en/blog/compiling-postgresql-extensions-visual-studio-windows/
Re: ERROR: too many dynamic shared memory segments
On Wed, Jan 22, 2020 at 4:06 AM Nicola Contu wrote: > after a few months, we started having this issue again. > So we revert the work_mem parameter to 600MB instead of 2GB. > But the issue is still there. A query went to segmentation fault, the DB went > to recovery mode and our app went to read only for a few minutes. Hi Nicola, Hmm, a segmentation fault sounds like a different problem. Can you please share the exact error messages from PostgreSQL and OS logs? > I understand we can increase max_connections so we can have many more > segments. > > My question is : is there a way to understand the number of segments we > reached? If you're on Linux, you can probably see them with "ls /dev/shm". > Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have > about 500 shared segments. > We would like to increase that number to 300 or 400 but would be great to > understand if there is a way to make sure we will solve the issue as it > requires a restart of the service. > > I know you were also talking about a redesign this part in PostgreSQL. Do you > know if anything has changed in any of the newer versions after 11.5? It's possible that we should increase a couple of constants used the formula -- I'll look into that again. But first I'd like to see if we're even investigating the right problem here.
Re: ERROR: too many dynamic shared memory segments
On Wed, Jan 29, 2020 at 10:37 PM Nicola Contu wrote: > This is the error on postgres log of the segmentation fault : > > 2020-01-21 14:20:29 GMT [] [4]: [108-1] db=,user= LOG: server process > (PID 2042) was terminated by signal 11: Segmentation fault > 2020-01-21 14:20:29 GMT [] [4]: [109-1] db=,user= DETAIL: Failed process > was running: select pid from pg_stat_activity where query ilike 'REFRESH > MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats' > 2020-01-21 14:20:29 GMT [] [4]: [110-1] db=,user= LOG: terminating any > other active server processes Ok, this is a bug. Do you happen to have a core file? I don't recall where CentOS puts them. > > If you're on Linux, you can probably see them with "ls /dev/shm". > > I see a lot of files there, and doing a cat they are empty. What can I do > with them? Not much, but it tells you approximately how many 'slots' are in use at a given time (ie because of currently running parallel queries), if they were created since PostgreSQL started up (if they're older ones they could have leaked from a crashed server, but we try to avoid that by trying to clean them up when you restart). > Those are two different problems I guess, but they are related because right > before the Segmentation Fault I see a lot of shared segment errors in the > postgres log. That gave me an idea... I hacked my copy of PostgreSQL to flip a coin to decide whether to pretend there are no slots free (see below), and I managed to make it crash in the regression tests when doing a parallel index build. It's late here now, but I'll look into that tomorrow. It's possible that the parallel index code needs to learn to cope with that. #2 0x00a096f6 in SharedFileSetInit (fileset=0x80b2fe14c, seg=0x0) at sharedfileset.c:71 #3 0x00c72440 in tuplesort_initialize_shared (shared=0x80b2fe140, nWorkers=2, seg=0x0) at tuplesort.c:4341 #4 0x005ab405 in _bt_begin_parallel (buildstate=0x7fffc070, isconcurrent=false, request=1) at nbtsort.c:1402 #5 0x005aa7c7 in _bt_spools_heapscan (heap=0x801ddd7e8, index=0x801dddc18, buildstate=0x7fffc070, indexInfo=0x80b2b62d0) at nbtsort.c:396 #6 0x005aa695 in btbuild (heap=0x801ddd7e8, index=0x801dddc18, indexInfo=0x80b2b62d0) at nbtsort.c:328 #7 0x00645b5c in index_build (heapRelation=0x801ddd7e8, indexRelation=0x801dddc18, indexInfo=0x80b2b62d0, isreindex=false, parallel=true) at index.c:2879 #8 0x00643e5c in index_create (heapRelation=0x801ddd7e8, indexRelationName=0x7fffc510 "pg_toast_24587_index", indexRelationId=24603, parentIndexRelid=0, I don't know if that's the bug that you're hitting, but it definitely could be: REFRESH MATERIALIZED VIEW could be rebuilding an index. === diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c index 90e0d739f8..f0b49d94ee 100644 --- a/src/backend/storage/ipc/dsm.c +++ b/src/backend/storage/ipc/dsm.c @@ -468,6 +468,13 @@ dsm_create(Size size, int flags) nitems = dsm_control->nitems; for (i = 0; i < nitems; ++i) { + /* BEGIN HACK */ + if (random() % 10 > 5) + { + nitems = dsm_control->maxitems; + break; + } + /* END HACK */ if (dsm_control->item[i].refcnt == 0) { dsm_control->item[i].handle = seg->handle;
Re: ERROR: too many dynamic shared memory segments
On Wed, Jan 29, 2020 at 11:24 PM Julian Backes wrote: > we only had the "too many shared too many dynamic shared memory segments" > error but no segmentation faults. The error started occurring after upgrading > from postgres 10 to postgres 12 (server has 24 cores / 48 threads, i.e. many > parallel workers). The error itself was not that much of a problem but > /dev/shm started filling up with orphaned files which probably (?) had not > been cleaned up by postgres after the parallel workers died. In consequence, > after some time, /dev/shm was full and everything crashed. Oh, thanks for the report. I think see what was happening there, and it's a third independent problem. The code in dsm_create() does DSM_OP_DESTROY (ie cleans up) in the DSM_CREATE_NULL_IF_MAXSEGMENTS case, but in the case where you see "ERROR: too many dynamic shared memory segments" it completely fails to clean up after itself. I can reproduce that here. That's a terrible bug, and has been sitting in the tree for 5 years. > Unfortunately, the only "solution" we found so far was to increase max > connections from 100 to 1000. After that (about 2 months ago I think), the > error had gone. I'll take that as a vote for increasing the number of slots.
Re: ERROR: too many dynamic shared memory segments
On Wed, Jan 29, 2020 at 11:53 PM Thomas Munro wrote: > On Wed, Jan 29, 2020 at 10:37 PM Nicola Contu wrote: > > This is the error on postgres log of the segmentation fault : > > > > 2020-01-21 14:20:29 GMT [] [4]: [108-1] db=,user= LOG: server process > > (PID 2042) was terminated by signal 11: Segmentation fault > > 2020-01-21 14:20:29 GMT [] [4]: [109-1] db=,user= DETAIL: Failed > > process was running: select pid from pg_stat_activity where query ilike > > 'REFRESH MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats' > > 2020-01-21 14:20:29 GMT [] [4]: [110-1] db=,user= LOG: terminating any > > other active server processes > That gave me an idea... I hacked my copy of PostgreSQL to flip a coin > to decide whether to pretend there are no slots free (see below), and > I managed to make it crash in the regression tests when doing a > parallel index build. It's late here now, but I'll look into that > tomorrow. It's possible that the parallel index code needs to learn > to cope with that. Hi Nicola, Without more information I can't know if I found the same bug you experienced, but I think it's likely. I have committed a fix for that, which will be available in the next release (mid-February). https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74618e77b43cfce670b4725d5b9a300a2afd12d1
Re: ERROR: too many dynamic shared memory segments
On Thu, Jan 30, 2020 at 12:26 AM Thomas Munro wrote: > On Wed, Jan 29, 2020 at 11:24 PM Julian Backes wrote: > > we only had the "too many shared too many dynamic shared memory segments" > > error but no segmentation faults. The error started occurring after > > upgrading from postgres 10 to postgres 12 (server has 24 cores / 48 > > threads, i.e. many parallel workers). The error itself was not that much of > > a problem but /dev/shm started filling up with orphaned files which > > probably (?) had not been cleaned up by postgres after the parallel workers > > died. In consequence, after some time, /dev/shm was full and everything > > crashed. > > Oh, thanks for the report. I think see what was happening there, and > it's a third independent problem. The code in dsm_create() does > DSM_OP_DESTROY (ie cleans up) in the DSM_CREATE_NULL_IF_MAXSEGMENTS > case, but in the case where you see "ERROR: too many dynamic shared > memory segments" it completely fails to clean up after itself. I can > reproduce that here. That's a terrible bug, and has been sitting in > the tree for 5 years. I committed a fix for that. It'll be in the new releases that due out in a couple of weeks. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=93745f1e019543fe7b742d0c5e971aad8d08fd56 > > Unfortunately, the only "solution" we found so far was to increase max > > connections from 100 to 1000. After that (about 2 months ago I think), the > > error had gone. > > I'll take that as a vote for increasing the number of slots. I committed something to do this for 13 (due out end of year), but I'm not game to back-patch it to the release branches. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d061ea21fc1cc1c657bb5c742f5c4a1564e82ee2
Re: ERROR: too many dynamic shared memory segments
On Fri, Jan 31, 2020 at 11:05 PM Nicola Contu wrote: > Do you still recommend to increase max_conn? Yes, as a workaround of last resort. The best thing would be to figure out why you are hitting the segment limit, and see if there is something we could tune to fix that. If you EXPLAIN your queries, do you see plans that have a lot of "Gather" nodes in them, perhaps involving many partitions? Or are you running a lot of parallel queries at the same time? Or are you running queries that do very, very large parallel hash joins? Or something else?
Re: Unable to startup postgres: Could not read from file "pg_clog/00EC"
On Fri, Feb 7, 2020 at 1:47 AM Nick Renders wrote: > Thank you for the feedback, Alvaro. > > Unfortunately, the database is no longer "dumpable". We were able to do > a pg_dump yesterday morning (12 hours after the crash + purging the > pg_clog) but if we try one now, we get the following error: > > unexpected chunk number 1 (expected 0) for toast value 8282331 in > pg_toast_38651 > > Looking at our data, there seem to be 6 tables that have corrupt > records. Doing a SELECT * for one of those records, will return a > similar error: > > missing chunk number 0 for toast value 8288522 in pg_toast_5572299 > > > What is the best way to go from here? Is tracking down these corrupt > records and deleting them the best / only solution? > Is there a way to determine of there are issues with new data (after the > crash)? > > Any help and advice is very much appreciated. This error indicates that the file did exist already, it was just shorter than we expected: 2020-02-04 15:20:44 CET DETAIL: Could not read from file "pg_clog/00EC" at offset 106496: Undefined error: 0. What was the length of the file before you overwrote it? Are there 00EB and 00ED files, and if so what size? When your server rebooted, did crash recovery run or had it shut down cleanly? Do you know if the machine lost power, or the kernel crashed, or if it was a normal reboot? What are your settings for "fsync" and "wal_sync_method"? What is the output of pg_controldata -D pgdata? I wonder if that part of the clog file was supposed to be created before the checkpoint (ie the checkpoint is somehow borked), or was supposed to be created during recovery after that checkpoint (something else is borked, but I don't know what), or if the xid is somehow corrupted. Here's a dirty trick that might help rescue some data. Assuming you have a copy of the original file before you zeroed it, you could write a 256kb file full of 0x55 (that's 01010101 and represents 4 commits, so if you fill the file up with that it means 'all transactions in this range committed', which is probably closer to the truth than all zeroes), and then copy the original shorter file over the top of it, so that at least the range of transactions represented by the earlier part of the file that did make it to disk are preserved, and we have just have bogus force-everything-to-look-committed data after that. But as Alvaro said, this is a pretty bad situation, this is key meta data used to interpret all other data files, so all bets are off here, this is restore-from-backups territory.
Re: Postgres 12.1 : UPPER() in WHERE clause restarts server
On Sun, Feb 9, 2020 at 11:46 AM Tom Lane wrote: > "Nick Renders" writes: > > When we do the following statement: > > SELECT * FROM f_gsxws_schedule WHERE UPPER(gwsc_dossier) = 'TEST' > > the Postgres service restarts. > > Hm. > > > Here is what is logged: > > 2020-02-08 20:21:19.942 CET [83892] LOG: server process (PID 85456) was > > terminated by signal 9: Killed: 9 > > 2020-02-08 20:21:19.942 CET [83892] DETAIL: Failed process was running: > > SELECT * FROM f_gsxws_schedule WHERE UPPER(gwsc_dossier) = 'TEST' > > That's mighty interesting, because signal 9 is an external SIGKILL, > not an internal-to-Postgres software fault. > > If you were running on Linux I would hypothesize that your process > was getting killed by the infamous OOM killer, in which case we could > guess that for some reason this query is consuming an unreasonable > amount of memory and thereby attracting the wrath of the OOM killer. > However, I'm not aware that any such mechanism exists on macOS. macOS's thing like that appears as "kernel[0]: memorystatus_thread: idle exiting pid XXX [some program]" in system.log, which seems like a bit of an understatement to me but that's what they call it. Some details here: http://newosxbook.com/articles/MemoryPressure.html Nick, did you see that?
Re: Postgres 12.1 : UPPER() in WHERE clause restarts server
On Mon, Feb 10, 2020 at 4:35 AM Marc wrote: > We will keep the 12.1 in place so that we can run additional tests to assist > to pin-point the issue. > > Feel free to ask but allow us to recover from these hectic days ;-) Here's how to get a stack so we can see what it was doing, assuming you have the Apple developer tools installed: 1. Find the PID of the backend you're connected to with SELECT pg_backend_pid(). 2. "lldb -p PID" from a shell to attach to the process, then "cont" to let it continue running. 3. Run the query in that backend and wait for the SIGKILL. 4. In the lldb session, type "bt". It'll only make sense if your PostgreSQL build has debug symbols, but let's see.
Re: LDAP with TLS is taking more time in Postgresql 11.5
On Wed, Feb 26, 2020 at 7:37 AM Adrian Klaver wrote: > On 2/25/20 10:23 AM, Mani Sankar wrote: > > Hi Adrian, > > > > Both the machines are in same network and both are pointing towards the > > same LDAP server > > I don't see any errors in the Postgres logs. > > You probably should take a look at the LDAP server logs to see if there > is anything there. > > You could also turn up the logging detail in Postgres to see if it > reveals anything. A couple more ideas: If you take PostgreSQL out of the picture and run the equivalent LDAP queries with the ldapsearch command line tool, do you see the same difference in response time? If so, I'd trace that with strace etc with timings to see where the time is spent -- for example, is it simply waiting for a response from the LDAP (AD?) server? If not, I'd try tracing the PostgreSQL process and looking at the system calls (strace -tt -T for high res times and elapsed times), perhaps using PostgreSQL's pre_auth_delay setting to get time to attach strace. A wild stab in the dark: if it's slow from one computer and not from another, perhaps the problem has something to do with a variation in reverse DNS lookup speed on the LDAP server side when it's verifying the certificate. Or something like that.
Re: \COPY to accept non UTF-8 chars in CHAR columns
On Sat, Mar 28, 2020 at 4:46 AM Tom Lane wrote: > Matthias Apitz writes: > > In short, it there a way to let \COPY accept such broken ISO bytes, just > > complaining about, but not stopping the insert of the row? > > No. We don't particularly believe in the utility of invalid data. > > If you don't actually care about what encoding your data is in, > you could use SQL_ASCII as the database "encoding" and thereby > disable all UTF8-specific behavior. Otherwise, maybe this conversion > is a good time to clean up the mess? Something like this approach might be useful for fixing the CSV file: https://codereview.stackexchange.com/questions/185821/convert-a-mix-of-latin-1-and-utf-8-to-proper-utf-8 I haven't tested that program but it looks like the right sort of approach; I remember writing similar logic to untangle the strange mixtures of Latin 1, Windows 1252, and UTF-8 that late 90s browsers used to send. That sort of approach can't fix every theoretical problem (some valid Latin1 sequences are also valid UTF-8 sequences) but it's doable with text in European languages.
Re: EINTR while resizing dsm segment.
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi wrote: > I provided the subject, and added -hackers. > > > Hello, > > I am running postgres 11.5 and we were having issues with shared segments. > > So I increased the max_connection as suggested by you guys and reduced my > > work_mem to 600M. > > > > Right now instead, it is the second time I see this error : > > > > ERROR: could not resize shared memory segment "/PostgreSQL.2137675995" to > > 33624064 bytes: Interrupted system call > > The function posix_fallocate is protected against EINTR. > > | do > | { > | rc = posix_fallocate(fd, 0, size); > | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending)); > > But not for ftruncate and write. Don't we need to protect them from > ENTRI as the attached? We don't handle EINTR for write() generally because that's not supposed to be necessary on local files (local disks are not "slow devices", and we document that if you're using something like NFS you should use its "hard" mount option so that it behaves that way too). As for ftruncate(), you'd think it'd be similar, and I can't think of a more local filesystem than tmpfs (where POSIX shmem lives on Linux), but I can't seem to figure that out from reading man pages; maybe I'm reading the wrong ones. Perhaps in low memory situations, an I/O wait path reached by ftruncate() can return EINTR here rather than entering D state (non-interruptable sleep) or restarting due to our SA_RESTART flag... anyone know? Another thought: is there some way for the posix_fallocate() retry loop to exit because (ProcDiePending || QueryCancelPending), but then for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to reporting the EINTR?
Re: EINTR while resizing dsm segment.
On Tue, Apr 7, 2020 at 8:58 PM Nicola Contu wrote: > So that seems to be a bug, correct? > Just to confirm, I am not using NFS, it is directly on disk. > > Other than that, is there a particular option we can set in the postgres.conf > to mitigate the issue? Hi Nicola, Yeah, I think it's a bug. We're not sure exactly where yet.
Re: Transition tables for column-specific UPDATE triggers
On Wed, Oct 9, 2019 at 3:59 PM Guy Burgess wrote: > The manual says: > https://www.postgresql.org/docs/current/sql-createtrigger.html > > A column list cannot be specified when requesting transition relations. > > And (I think the same point): > > The standard allows transition tables to be used with column-specific UPDATE > triggers, but then the set of rows that should be visible in the transition > tables depends on the trigger's column list. This is not currently > implemented by PostgreSQL. > > Are there any plans to allow transition tables to be used with > column-specific UPDATE triggers? Or, is there another way for a trigger > function to see the rows changed by a column-specific UPDATE trigger? Hi Guy, Answering an ancient message that went unanswered... I'm not aware of anyone working on that, and there isn't another way: the transition tables feature simply won't let you create such a trigger. The last I've seen anyone say about that was in the following commit: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0f79440fb0b4c5a9baa9a95570c01828a9093802 Excerpt: "Also, forbid using transition tables with column-specific UPDATE triggers. The spec requires such transition tables to show only the tuples for which the UPDATE trigger would have fired, which means maintaining multiple transition tables or else somehow filtering the contents at readout. Maybe someday we'll bother to support that option, but it looks like a lot of trouble for a marginal feature." The code preventing it is here: /* * We currently don't allow column-specific triggers with * transition tables. Per spec, that seems to require * accumulating separate transition tables for each combination of * columns, which is a lot of work for a rather marginal feature. */ if (stmt->columns != NIL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("transition tables cannot be specified for triggers with column lists"))); In theory you could do some filtering in your trigger procedure, by comparing rows in OLD TABLE and NEW TABLE to see which row-pairs represent changes to the columns you care about, but that's slightly complicated: you can join OLD and NEW using whatever keys you have defined, but that only works if there's no danger of the keys themselves changing. I wondered about inventing something like WITH ORDINALITY so that you get unique ordered numbers in an extra column that can be used to join the two transition tables without knowing anything about the user defined keys, but among other problems I couldn't figure out how to fit it in syntax-wise. I suppose PostgreSQL could do this internally to make the feature you want work: whenever you scan either table, in an UPDATE OF trigger, it could scan both transition tables in sync and filter out rows that didn't change your columns of interest. Or it could do that filtering up front, before your trigger fires, to create two brand new tuplestores just for your trigger invocation. Or there could be a third spillable data structure, that records which triggers should be able to see each old/new-pair, or which columns changed, and is scanned in sync with the others. Just some first thoughts; I am not planning to work on this any time soon.
Re: 12.2: Howto check memory-leak in worker?
On Tue, May 5, 2020 at 10:13 AM Peter wrote: > BTW, I would greatly appreciate if we would reconsider the need for > the server to read the postmaster.pid file every few seconds (probably > needed for something, I don't know). > That makes it necessary to set atime=off to get a spindown, and I > usually prefer to have atime=on so I can see what my stuff is > currently doing. That'd be this: /* * Once a minute, verify that postmaster.pid hasn't been removed or * overwritten. If it has, we force a shutdown. This avoids having * postmasters and child processes hanging around after their database * is gone, and maybe causing problems if a new database cluster is * created in the same place. It also provides some protection * against a DBA foolishly removing postmaster.pid and manually * starting a new postmaster. Data corruption is likely to ensue from * that anyway, but we can minimize the damage by aborting ASAP. */ if (now - last_lockfile_recheck_time >= 1 * SECS_PER_MINUTE) { if (!RecheckDataDirLockFile())
Re: Is there a significant difference in Memory settings between 9.5 and 12
On Tue, May 12, 2020 at 2:52 PM Tory M Blue wrote: > It took the change but didn't help. So 10GB of shared_buffers in 12 is still > a no go. I'm down to 5GB and it works, but this is the same hardware, the > same exact 9.5 configuration. So I'm missing something. WE have not had to > mess with kernel memory settings since 9.4, so this is an odd one. > > I'll keep digging, but i'm hesitant to do my multiple TB db's with half of > their shared buffer configs, until I understand what 12 is doing differently > than 9.5 Which exact version of 9.5.x are you coming from? What's the exact error message on 12 (you showed the shared_memory_type=sysv error, but with the default value (mmap) how does it look)? What's your huge_pages setting? Can you reproduce the problem with a freshly created test cluster? As a regular user, assuming regular RHEL packaging, something like /usr/pgsql-12/bin/initdb -D test_pgdata, and then /usr/pgsql-12/bin/postgres -D test_pgdata -c shared_buffers=10GB (then ^C to stop it). If that fails to start in the same way, it'd be interesting to see the output of the second command with strace in front of it, in the part where it allocates shared memory. And perhaps it'd be interesting to see the same output with /usr/pgsql-9.5/bin/XXX (if you still have the packages). For example, on my random dev laptop that looks like: openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 6 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 read(6, "MemTotal: 16178852 kB\nMemF"..., 1024) = 1024 read(6, ":903168 kB\nShmemHugePages: "..., 1024) = 311 close(6)= 0 mmap(NULL, 11016339456, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 11016003584, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7ff74e579000 shmget(0x52e2c1, 56, IPC_CREAT|IPC_EXCL|0600) = 3244038 shmat(3244038, NULL, 0) = 0x7ff9df5ad000 The output is about the same on REL9_5_STABLE and REL_12_STABLE for me, only slightly different sizes. If that doesn't fail in the same way on your system with 12, perhaps there are some more settings from your real clusters required to make it fail. You could add them one by one with -c foo=bar or in the throw away test_pgdata/postgresql.conf, and perhaps that process might shed some light? I was going to ask if it might be a preloaded extension that is asking for gobs of extra memory in 12, but we can see from your "Failed system call was shmget(key=5432001, size=11026235392, 03600)" that it's in the same ballpark as my total above for shared_buffers=10GB.
Re: Ubuntu 20.04: apt.postgresql.org/pub/repos/apt focal-pgdg InRelease' doesn't support architecture 'i386'
On Sun, May 17, 2020 at 10:45 AM Hugh wrote: > While this doesn't appear to be a bug that causes problems of any kind, I do > have a question about its cause. > > The "error" listed in the Subject: line is basically what I'm seeing. The > entire message is below, particularly the 'N:' at the end. Is there a repo > setting I should change to prevent the request for '386' architecture? Thank > you in advance for your assistance. I'm not sure, but it seems related to this complaint and the answer might be to tell your sources.list that that source has only amd64: https://www.postgresql.org/message-id/flat/16402-1f2d77e819f9e1f2%40postgresql.org
Re: Help understanding SIReadLock growing without bound on completed transaction
On Fri, May 22, 2020 at 7:48 AM Mike Klaas wrote: > locktype: page > relation::regclass::text: _pkey > virtualtransaction: 36/296299968 > granted:t > pid:2263461 That's an unusually high looking pid. Is that expected, for example did you crank Linux's pid_max right up, or is this AIX, or something?
Re: Help understanding SIReadLock growing without bound on completed transaction
On Fri, May 22, 2020 at 7:48 AM Mike Klaas wrote: > It's my understanding that these locks should be cleared when there are no > conflicting transactions. These locks had existed for > 1 week and we have > no transactions that last more than a few seconds (the oldest transaction in > pg_stat_activity is always < 1minute old). > Why would a transaction that is finished continue accumulating locks over > time? Predicate locks are released by ClearOldPredicateLocks(), which releases SERIALIZABLEXACTs once they are no longer interesting. It has a conservative idea of what is no longer interesting: it waits until the lowest xmin across active serializable snapshots is >= the transaction's finishedBefore xid, which was the system's next xid (an xid that hasn't been used yet*) at the time the SERIALIZABLEXACT committed. One implication of this scheme is that SERIALIZABLEXACTs are cleaned up in commit order. If you somehow got into a state where a few of them were being kept around for a long time, but others committed later were being cleaned up (which I suppose must be the case or your system would be complaining about running out of SERIALIZABLEXACTs), that might imply that there is a rare leak somewhere in this scheme. In the past I have wondered if there might be a problem with wraparound in the xid tracking for finished transactions, but I haven't worked out the details (transaction ID wraparound is both figuratively and literally the Ground Hog Day of PostgreSQL bug surfaces). *Interestingly, it takes an unlocked view of that value, but that doesn't seem relevant here; it could see a value that's too low, not too high.
Re: Can we get SQL Server-like cross database queries
On Thu, Jun 4, 2020 at 4:26 PM Ron wrote: > On 6/3/20 2:57 PM, Rob Sargent wrote: > >> On Jun 3, 2020, at 1:46 PM, Tom Lane wrote: > >> Guyren Howe writes: > >>> Is it practical to provide the SQL Server-like feature in Postgres? > >> No. > > That got me chuckling. > > I had just decided not to bother posting, but well, here goes. > > > > I call bs on the “cross db query” notion of tsql - but I admit I haven’t > > used it since it was a Sybase thing. > > > > Is db.owner.table (iirc) is really anything more than nuance on > > schema.table. Does a db for automotive parts need to be > > up-close-and-personal with a db payroll? > > Those aren't the only two databases that exist. Think of a federated system > where you've got a "reference" database full of lookup tables, and one > database for every 10,000 customers. For 45,000 customers you've got five > databases, and they all need to access the reference database, plus "all > customers" queries need to access all five databases. There's no doubt it's useful, and it's also part of the SQL spec, which says you can do catalog.schema.table. I would guess that we might get that as a byproduct of any project to make PostgreSQL multithreaded. That mountain moving operation will require us to get rid of all the global state that currently ties a whole process to one session and one database, and replace it with heap objects with names like Session and Database that can be passed around between worker threads.
Re: Shared memory error
On Thu, Jun 4, 2020 at 6:18 AM Sonam Sharma wrote: >>> 1) Postgres version : 11.2 FYI This is missing over a year's worth of bugfixes. That said, I don't know of anything fixed that has this symptom. >>> 4) Is this only with one query and if so what is it doing? : No , few >>> queries work, few fails .. mostly all are complex select queries. It's a long shot, because I expect you're running the usual packages under a user set up by the package to be a "system" user, but just in case: there is one known way for POSIX shm to be unlinked while you're still using it: systemd. For example, this might happen if you have cronjobs running as the same user. Some more info: https://www.postgresql.org/docs/11/kernel-resources.html#SYSTEMD-REMOVEIPC Do you see any other errors around this one, in the PostgreSQL logs?
Re: Shared memory error
>> Do you see any other errors around this one, in the PostgreSQL logs? > No , only this is the error from db and jdbc end .. and queries are failing If you need a workaround right now you could always set max_parallel_workers_per_gather=0 so that it doesn't try to use parallel query. That could make some queries run more slowly. When you say you extended the /dev/shm filesystem, what are the mount options, or how did you configure it? It's interesting that the problem went away for a while when you did that.
Re: Shared memory error
On Fri, Jun 5, 2020 at 1:00 AM Sonam Sharma wrote: > The dynamic_shared_memory_type was set to POSIX . Because of this it was > using tmpfs /dev/shm. When the query was running I saw the file system was > filling. So I extended the file system and luckily the query worked for that > time Oh, hmm. When you say "filling", maybe you hit the leaking shm problem that was fixed in 11.7, that requires messages somewhere in your log about lack of DSM slots? I don't have an explanation for the exact the error you're seeing though. It's a bit tricky to speculate on older releases with so many bug fixes missing though. Please let us know if you still see the problem after your next scheduled upgrade.
Re: troubleshooting postgresql ldap authentication
On Tue, Jun 9, 2020 at 9:05 AM Chris Stephens wrote: > hostsslall all 0.0.0.0/0 ldap > ldapserver="ldaps://xxx" ldapbasedn="yyy" ldaptls=1 > does anyone know what might be causing "LDAP: Bad parameter to an ldap > routine" You probably want ldapurl="ldaps://xxx" (note: ldapurl, not ldapserver). Or you could use ldapscheme="ldaps" and ldapserver="xxx".
Re: Definition of REPEATABLE READ
On Mon, Jun 22, 2020 at 12:25 AM Peter J. Holzer wrote: > I've read http://jepsen.io/analyses/postgresql-12.3 which reports a > problem in PostgreSQL's SERIALIZABLE isolation leven (which has since been > fixed) and also shows an example of a violation of what they consider to > be the correct definition of REPEATABLE READ. In response to that report, we added a paragraph to the manual to state explicitly that what we offer is snapshot isolation: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1575fbf1ac3c8464b2dade245deff537a3aa2498 https://www.postgresql.org/docs/devel/transaction-iso.html#XACT-REPEATABLE-READ > Since those papers are now 25 and 20 years old, respectively, and there > have been several revisions of the SQL standard in the meantime, has the > SQL committee come around to that view (SQL/2003 seems unchanged, I > couldn't find a later revision online)? And if it has, did it follow > those definitions or come up with different ones (it seems to me that > G2-item is much stronger that warranted by the wording in the standard)? SQL:2016 is the latest, and I don't have it to hand right now, but SQL:2011 still defined four degrees of isolation in terms of the three phenomena (pages 133-135 of part II, "Foundations", at least in the draft copy you can find on the net). As for what else PostgreSQL should do about this historical confusion, I suspect that a patch to pgsql-hackers to accept the name SNAPSHOT ISOLATION as an alternative would be popular, and we could probably even spin it as the "true" name of the level (while still accepting REPEATABLE READ; there is no point in breaking that), and maybe add a second table to the manual's Transaction Isolation section to cover the later formalisations in a digested user friendly format (if that is possible).
Re: Same query taking less time in low configuration machine
On Tue, Jul 14, 2020 at 9:27 PM Vishwa Kalyankar wrote: > Function Scan on kseb_geometry_trace_with_barrier_partition > (cost=0.25..10.25 rows=1000 width=169) (actual time=11626.548..11626.568 > rows=254 loops=1) > Function Scan on kseb_geometry_trace_with_barrier_partition > (cost=0.25..10.25 rows=1000 width=169) (actual time=22304.425..22304.448 > rows=254 loops=1) I have no idea what that function does, but perhaps it runs more queries, and you can't see the plans for those here. If you set up auto_explain[1], and turn on auto_explain.log_nested_statements, then you'll be able to see the query plans for the internal stuff happening in that function. I'd look at EXPLAIN (BUFFERS) or auto_explain.log_buffers to get more information on cache hits and misses. I'd look for settings differences with EXPLAIN (SETTINGS) to see if there's anything accidentally set differently (maybe JIT or paralelism or something like that). I'd look at pg_stat_activity repeatedly while it runs to see what the processes are doing, especially the wait_event column. I'd also look at the CPU and I/O on the systems with operating system tools like iostat, htop, perf to try to find the difference. [1] https://www.postgresql.org/docs/current/auto-explain.html
Re: PG 9.5.5 cores on AIX 7.1
On Sun, Jul 19, 2020 at 11:01 PM Abraham, Danny wrote: > Segmentation fault in _alloc_initial_pthread at 0x9521474 > 0x9521474 (_alloc_initial_pthread+0x1d4) e803 ld > r0,0x0(r3) > (dbx) where > _alloc_initial_pthread(??) at 0x9521474 > __pth_init(??) at 0x951f390 > uload(??, ??, ??, ??, ??, ??) at 0x9fff000ab70 load1(??, ??, ??, ??) at > 0x9000b74 load(??, ??, ??) at 0x9001ef0 loadAndInit(??, ??, > ??) at 0x905b38c dlopen(??, ??) at 0x909bfe0 > internal_load_library(??) at 0x10014c684 > RestoreLibraryState(??) at 0x10014d79c > ParallelWorkerMain(??) at 0x1000bb2d0 > StartBackgroundWorker() at 0x10026cd94 > maybe_start_bgworkers() at 0x10003834c > sigusr1_handler(??) at 0x10003902c > __fd_select(??, ??, ??, ??, ??) at 0x91567fc > ServerLoop() at 0x1004cec90 > PostmasterMain(??, ??) at 0x10003a4e8 > main(??, ??) at 0x108f8 FWIW there was a report a decade ago that looked at least superficially similar: https://www.postgresql.org/message-id/flat/09B23E7BF70425478C1330D893A722C602FEC019BD%40MailSVR.invera.com
Re: Check replication lag
On Thu, Aug 6, 2020 at 7:02 AM Sreejith P wrote: > IN SYNC mode of replication what would be the impact on Master DB server in > terms of over all performance ? The pg_stat_replication columns write_lag, flush_lag and replay_lag are designed tell you how long to expect commits to take for synchronous standbys, based on recent history, if synchronous_commit it set to remote_write, on or remote_apply respectively. Those times tell you about commit latency, which limits sequential commit rate for each session.
Re: PostgreSQL on Windows' state
On Wed, Sep 23, 2020 at 10:53 PM Alessandro Dentella wrote: > Thanks Pavel, but psycopg2 (that I always use is just for Python). T > hey claim/complain that from c# there's no native solution. Maybe https://www.npgsql.org/?
Re: Unable to compile postgres 13.1 on Slackware current x64
On Mon, Nov 16, 2020 at 10:10 PM Laurenz Albe wrote: > On Mon, 2020-11-16 at 09:15 +0200, Condor wrote: > > collationcmds.c: In function ‘get_icu_language_tag’: > > collationcmds.c:467:51: error: ‘TRUE’ undeclared (first use in this > > function); did you mean ‘IS_TRUE’? > >467 | uloc_toLanguageTag(localename, buf, sizeof(buf), TRUE, &status); > >| ^~~~ > >| IS_TRUE > > > "UBool" and "TRUE" are defined in "umachine.h", which is a header file for the > "libicu" library. > > PostgreSQL includes "unicode/ucol.h", which will include "umachine.h" > (via "utypes.h"), so that should be fine. > > Are your libicu headers installed under /usr/include/unicode? > Do you get any messages about missing include files earlier? It looks like something happened to ICU's boolean macros . See this commit in FreeBSD ports: https://github.com/freebsd/freebsd-ports/commit/81a88b4506ec06d07be10d199170ef4003eb0e30 ... which references: https://github.com/unicode-org/icu/commit/c3fe7e09d844
Re: Unable to compile postgres 13.1 on Slackware current x64
On Tue, Nov 17, 2020 at 8:02 PM Condor wrote: > I try to compile postgres again with (cd src/backend/commands; sed > 's/TRUE/true/' collationcmds.c > collationcmds.c; ) and it's compiled > but get new error on linking: Doesn't that produce an empty file collationcmds.c? I think you want: sed 's/TRUE/true/' collationcmds.c > collationcmds.c.tmp && mv collationcmds.c.tmp collationcmds.c
Re: Unable To Drop Tablespace
On Fri, Feb 5, 2021 at 12:43 PM Ian Lawrence Barwick wrote: > 2021年2月5日(金) 3:52 Pavan Pusuluri : >> We are trying to drop a table space on RDS Postgres . We have removed the >> objects etc, but it still won't drop. >> >> I have checked and there's no reference anywhere to this tablespace but it >> complains it's not empty. >> >> I checked if it is a default for a database, revoked all privileges on the >> tablespace. >> >> We dropped the database but underpinning tablespace remained but when I >> query to see if any reference i get no hits. >> >> "Select c.relname,t.spcname from pg_class c JOIN pg_tablespace t ON >> c.reltablespace=t.oid where t.spcname='mytablespace' >> >> I dont find any objects referencing. Kindly let me know if anything else >> needs to be checked? > > > There's a handy function "pg_tablespace_databases()" to check which databases > might still have objects in a database. There are a couple of useful queries > demonstrating > usage here: > > https://pgpedia.info/p/pg_tablespace_databases.html It's also possible for there to be stray files in there, in some crash scenarios where PostgreSQL doesn't currently clean up relation files that it ideally should. The one with the widest window AFAIK is where you crash after creating a table but before committing[1]. You'd need a directory listing to investigate that. [1] https://www.postgresql.org/message-id/flat/CAEepm%3D0ULqYgM2aFeOnrx6YrtBg3xUdxALoyCG%2BXpssKqmezug%40mail.gmail.com
Re: How to post to this mailing list from a web based interface
On Fri, Jan 29, 2021 at 4:27 AM Alvaro Herrera wrote: > On 2021-Jan-28, Ravi Krishna wrote: > > I recollect there use to be a website from where one can reply from web. > > The community does not maintain such a service. > > There used to be a Gmane archive of this list that you could use to > post. Seems it's still online at postgresql-archive.org. They have a > "Reply" button and it says to require your account, but given SPF and > DMARC and other restrictions on email generation, it seems pretty > uncertain that emails posted that way would work correctly. I think we > would even reject such emails if they reached our mailing list servers. While catching up with some interesting new threads just now I was quite confused by the opening sentence of this message (which also arrived in my mailbox): https://www.postgresql.org/message-id/1611355191319-0.post%40n3.nabble.com ... until I got to the last line. I wonder if the "Resend" facility on our own archives could be better advertised, via a "Want to join this thread?" link in the Quick Links section that explains how to use it and what problem it solves, or something...
Re: [LDAPS] Test connection user with ldaps server
On Tue, Feb 16, 2021 at 4:32 AM Laurenz Albe wrote: > What I would do is experiment with the "ldapsearch" executable from OpenLDAP > and see > if you can reproduce the problem from the command line. Also, maybe try doing this as the "postgres" user (or whatever user PostgreSQL runs as), just in case there are some environmental differences affecting the behaviour.
Re: SELECT creates millions of temp files in a single directory
On Sun, Apr 24, 2022 at 8:00 AM Peter wrote: > More than a million files in a single directory, this is > inacceptable. You didn't show EXPLAIN (ANALYZE) but if [Parallel] Hash is making insane numbers of temporary files then something is not working as intended... and I can take a guess at what it is. I tried to recreate this with randomly distributed file.pathid, same size tables as you, and I got 32 batches and a nice manageable number of temporary files. Adding millions of extra file rows with duplicate pathid=42 gets me something like "Batches: 524288 (originally 32)" in EXPLAIN (ANALYZE) output. I guess that's the sort of distribution you have here? Extensive discussion of the cause of that and potential (hard) solutions over here: https://www.postgresql.org/message-id/flat/CA%2BhUKGKWWmf%3DWELLG%3DaUGbcugRaSQbtm0tKYiBut-B2rVKX63g%40mail.gmail.com To summarise, when the hash table doesn't fit in work_mem, then we "partition" (spill part of the data to disk) to make twice as many (hopefully) smaller hash tables that do fit. Sometimes partitoning produces one or more hash tables that are still too big because of uneven key distribution, so we go around again, doubling the number of partitions (and thus temporary files) every time. I would say that once we get past hundreds of partitions, things are really turning to custard (file descriptors, directory entries, memory overheads, ... it just doesn't work well anymore), but currently we don't give up until it becomes very clear that repartitioning is not helping. This algorithmic problem existed before parallelism was added, but it's possible that the parallel version of the meltdown hurts a lot more (it has extra per-process files, and in multi-attempt scenarios the useless earlier attempts hang around until the end of the query instead of being cleaned up sooner, which doubles the number of files). Hopefully that gives some clues about how one might rewrite the query to avoid massive unsplittable set of duplicate keys in hash tables, assuming I'm somewhere close to the explanation here (maybe some subselect with UNIQUE or GROUP BY in it, or some way to rewrite the query to avoid having the problematic duplicates on the "inner" side, or completely different plan..., or just crank up work_mem massively for this query so that you don't need a partition step at all) . Obviously it would be nice for PostgreSQL to be hardened against this risk, eventually, though. As for merge join planning, unfortunately they aren't fully parallelisable and in the plan you showed, a separate copy of the big sort runs in every process, which isn't nice (we don't have a Parallel Sort that could fix that yet).
Re: Pg14 possible index corruption after reindex concurrently
On Wed, May 25, 2022 at 6:17 AM Aleš Zelený wrote: > SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname) This may be related to bug #17485, discussed at: https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org
Re: Improve configurability for IO related behavoir
On Sun, May 29, 2022 at 4:29 AM 浩辰 何 wrote: > Furthermore, the results above are also related to IO API supported by OS. > MySQL support synchronized IO and Linux libaio. It seems > that PostgreSQL only supports synchronized IO, so shall we support more IO > engines? like io_uring which is very popular in recent years. Hi Haochen, There is an active project to bring these things to PostgreSQL. https://wiki.postgresql.org/wiki/AIO has some information and links. The short version is that there is a development patch set to add these GUCs: io_method=worker,io_uring,posix_aio,iocp,... io_data_direct=on,off io_wal_direct=on,off It also adds a bunch of specialised logic that knows how to initiate IO in key places (scans, writeback, recovery, ...), because it's not enough to just turn off kernel I/O buffering, we also have to do all the work the kernel is doing for us or performance will be terrible.
Re: Order of rows in statement triggers NEW/OLD tables
On Fri, May 6, 2022 at 6:20 PM hubert depesz lubaczewski wrote: > when defining statement triggers on update I can use: > > REFERENCING OLD TABLE AS xxx NEW TABLE as YYY > > these "pseudo" tables contain rows that were before and after. > > Is the order guaranteed? > > Can I assume that "first" row returned by select from xxx, will be older > version of first row returned by select from yyy? Good question, and one I've wondered about before. I *think* that is currently true, due to implementation details, but it could change. The trigger code fills up both tuplestores (= memory/disk containers for transition tables) in sync with each other in AfterTriggerSaveEvent(), and then NamedTuplestoreScan node emits tuples in insertion order. We already banned the use of transition tables when there is "ON UPDATE OR INSERT" etc so you can't get mixed INSERT/UPDATE/DELETE results which would desynchronise the old and new tuples, and I also wondered if something tricky like FK ON DELETE CASCADE/SET NULL in a self-referencing table could mix some old-with-no-new into UPDATE results, but I can't see how to do that, and I can't think of any other way off the top of my head. Of course, joins etc could destroy the order higher in your query plan. While reading about proposed incremental materialized view ideas, I once idly wondered if it'd be useful, as an extension to the standard, to be able to use WITH ORDINALITY for transition tables (normally only used for unnest(), and in PG also any SRF) and then you could use that + ORDER BY to explicitly state your desired order (and possibly teach the planner that ORDINALITY attributes are path keys by definition so it doesn't need to insert sort nodes in simple cases). That is, instead of relying on scan order. In any case, an in-core IMV feature is allowed to peek deeper into the implementation and doesn't even need SQL here, so I didn't follow that thought very far... I am not sure about this, but I wonder if any user-level portable-across-SQL-implementation user-level scheme for replication/materialization built on top of trigger transition tables would need to require immutable unique keys in the rows in order to be able match up before/after tuples.
Re: AIX and EAGAIN on open()
On Mon, Jun 20, 2022 at 9:53 PM Christoph Berg wrote: > IBM's reply to the issue back in December 2020 was this: > > The man page / infocenter document is not intended as an exhaustive > list of all possible error codes returned and their circumstances. > "Resource temporarily unavailable" may also be returned for > O_NSHARE, O_RSHARE with O_NONBLOCK. > > Afaict, PG does not use these flags either. > > We also ruled out that the system is using any anti-virus or similar > tooling that would intercept IO traffic. > > Does anything of that ring a bell for someone? Is that an AIX bug, a > PG bug, or something else? No clue here. Anything unusual about the file system (NFS etc)? Can you truss/strace the system calls, to sanity check the flags arriving into open(), and see if there's any unexpected other activity around open() calls that might be coming from something you're linked against?
Re: Strange collation names ("hu_HU.UTF-8")
On Wed, Aug 3, 2022 at 1:43 AM Tom Lane wrote: > I believe most if not all variants of Unix are > permissive about the spelling of the encoding part. I've only seen glibc doing that downcase-and-strip-hyphens thing to the codeset part of a locale name when looking for locale definition files. Other systems like FreeBSD expect to be able to open /usr/share/locale/$LC_COLLATE/LC_COLLATE directly without any kind of munging. On a Mac it's probably a little fuzzy because the filenames are case insensitive...
Re: Segmentation Fault PG 14
On Tue, Nov 8, 2022 at 11:45 AM Willian Colognesi wrote: > root@ip-10-x-x-x:/home/ubuntu# pg_config --configure > ... --with-extra-version= (Ubuntu 14.5-2.pgdg20.04+2)' ... > ... '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-10' ... > There is no llvm installed on ubuntu server, postgresql was installed via apt > package `apt install postgresql-14` We can see from the pg_config output that it's built with LLVM 10. Also that looks like it's the usual pgdg packages which are certainly built against LLVM and will install it automatically.
Re: PANIC: could not flush dirty data: Cannot allocate memory
On Tue, Nov 15, 2022 at 10:54 AM Christoph Moench-Tegeder wrote: > ## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): > > On several servers we see the error message: PANIC: could not flush > > dirty data: Cannot allocate memory > Of these three places, there's an sync_file_range(), an posix_fadvise() > and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate > memory" would be ENOMEM, which posix_fadvise() does not return (as per > it's docs). So this would be sync_file_range(), which could run out > of memory (as per the manual) or msync() where ENOMEM actually means > "The indicated memory (or part of it) was not mapped". Both cases are > somewhat WTF for this setup. It must be sync_file_range(). The others are fallbacks that wouldn't apply on a modern Linux. It has been argued before that we might have been over-zealous applying the PANIC promotion logic to sync_file_range(). It's used to start asynchronous writeback to make the later fsync() call fast, so it's "only a hint", but I have no idea if it could report a writeback error from the kernel that would then be consumed and not reported to the later fsync(), so I defaulted to assuming that it could.
Re: PANIC: could not flush dirty data: Cannot allocate memory
On Wed, Nov 16, 2022 at 1:24 AM wrote: > Filesystem is ext4. VM technology is mixed: VMware, KVM and XEN PV. > Kernel is 5.15.0-52-generic. > > We have not seen this with Ubutnu 18.04 and 20.04 (although we might not > have noticed it). > > I guess upgrading to postgresql 13/14/15 does not help as the problem > happens in the kernel. > > Do you have any advice how to go further? Shall I lookout for certain > kernel changes? In the kernel itself or in ext4 changelog? It'd be good to figure out what is up with Linux or tuning. I'll go write a patch to reduce that error level for non-EIO errors, to discuss for the next point release. In the meantime, you could experiment with setting checkpoint_flush_after to 0, so the checkpointer/bgwriter/other backends don't call sync_file_range() all day long. That would have performance consequences for checkpoints which might be unacceptable though. The checkpointer will fsync relations one after another, with less I/O concurrency. Linux is generally quite lazy at writing back dirty data, and doesn't know about our checkpointer's plans to fsync files on a certain schedule, which is why we ask it to get started on multiple files concurrently using sync_file_range(). https://www.postgresql.org/docs/15/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS
Re: Getting PSQL in Windows to support TAB/Autocomplete via modified readline...
On Tue, Nov 22, 2022 at 4:25 AM Kirk Wolak wrote: > In researching this problem, it appears that the decision was made like > 17yrs ago, when windows did not have a realistic "terminal" type interface. > Assuming we target Windows 8.1 or higher, I believe this goes away. FWIW PostgreSQL 16 will require Windows 10+. Not a Windows user myself, but I read somewhere that Windows 8.1 has already started showing full screen warnings at startup that security patches etc end in January, while PostgreSQL 16 (the soonest release that could include your changes) is expected late in the year. > What we are looking for is a simple bullet point list of what would make > such a contribution acceptable. And how far should we get through that list > on our own, before getting some help, especially from the build teams? [Our > goal would be an NEW Config type flag: READLINE_FOR_WINDOWS (you guys name > it, and control the default setting)] Some thoughts: Re configuration flags: don't waste time with the old perl-based build system. The configuration should be done only with the new meson build system (soon to be the only way to build on Windows). I didn't quite understand if you were saying that readline itself needs patches for this (I gather from earlier threads about this that there were some problems with dll symbol export stuff, so maybe that's it?). In passing, I noticed that there is also a Windows port of editline AKA libedit, the BSD-licensed alternative to readline. It has a compatible API and PostgreSQL can use that too. I have no idea which is easier, more supported, etc on Windows. It's OK to post a work-in-progress patch to pgsql-hackers, even if it doesn't work right yet. With any luck, people will show up to help with problems. I am 100% sure that our Windows user community would love this feature. It would be good if the tests in src/bin/psql/t/010_tab_completion.pl pass on Windows, but if that's hard, don't let that stop you sharing a patch.
Re: 13.x, stream replication and locale(?) issues
On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin wrote: > 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux and > compile it on FreeBSD - will it help ? Out of curiosity (I'm not saying it's a good idea!), do you know if FreeBSD's localedef can compile glibc's collation definitions? In theory they are in a format standardised by POSIX... I suspect there may be extensions and quirks... At a wild guess, since the data you showed doesn't even look like it contains non-ASCII characters (it looks like machine readable identifiers or something, and perhaps its the sort order of '-' that is causing you trouble), so it might also be possible to use "ucs_basic" locale for that column and then all computers will agree on the sort order, but of course that doesn't address the more general problem; presumably you might also have Russian language text in your system too. As for ".utf8" vs ".UTF-8", which one is selected by initdb as the database default seems to be something that varies between Linux distributions, so I guess maybe the installers use different techniques for discovering and selecting default locale names. Unlike glibc, FreeBSD doesn't do any name mangling at all when mapping LC_COLLATE to a pathname to find the file, whereas glibc downcases and removes '-' so you can find both formats of name in the various places...
Re: Interval in hours but not in days Leap second not taken into account
On Mon, Feb 27, 2023 at 8:26 PM PALAYRET Jacques wrote: > # PostgreSQL does not take into account the additional second (leap second) > in some calendar days ; eg. 2016, 31 dec. : > SELECT to_timestamp('20170102 10:11:12','mmdd hh24:mi:ss') - > to_timestamp('20161230 00:00:00','mmdd hh24:mi:ss') intervalle ; >intervalle > - > 3 days 10:11:12 Bonjour Jacques, Just for fun: postgres=# SELECT utc_to_tai(to_timestamp('20170102 10:11:12','mmdd hh24:mi:ss')) - utc_to_tai(to_timestamp('20161230 00:00:00','mmdd hh24:mi:ss')) intervalle; intervalle - 3 days 10:11:13 (1 row) PostgreSQL could, in theory, provide built-in UTC/TAI conversions functions using a leap second table that would be updated in each minor release, considering that the leap second table is included in the tzdata package that PostgreSQL vendors (ie includes a copy of), but it doesn't do anything like that or know anything about leap seconds. Here's a quick and dirty low technology version of the above: CREATE TABLE leap_seconds (time timestamptz primary key, off int); -- refresh leap second table from ietf.org using not-very-secure hairy shell code BEGIN; CREATE TEMP TABLE import_leap_seconds (s int8, off int); COPY import_leap_seconds FROM PROGRAM 'curl -s https://www.ietf.org/timezones/data/leap-seconds.list | grep -v ''^#'' | cut -f1,2'; TRUNCATE TABLE leap_seconds; INSERT INTO leap_seconds (time, off) SELECT '1900-01-01 00:00:00Z'::timestamptz + interval '1 second' * s, off FROM import_leap_seconds; DROP TABLE import_leap_seconds; COMMIT; CREATE OR REPLACE FUNCTION leap_seconds_before_utc_time(t timestamptz) RETURNS int STRICT LANGUAGE SQL AS $$ SELECT off FROM leap_seconds WHERE time <= t ORDER BY time DESC FETCH FIRST ROW ONLY $$; CREATE OR REPLACE FUNCTION utc_to_tai(t timestamptz) RETURNS timestamptz STRICT LANGUAGE SQL AS $$ SELECT t + interval '1 second' * coalesce(leap_seconds_before_utc_time(t), 0); $$; CREATE OR REPLACE FUNCTION tai_to_utc(t timestamptz) RETURNS timestamptz STRICT LANGUAGE SQL AS $$ SELECT t - interval '1 second' * coalesce(leap_seconds_before_utc_time(t), 0); $$;
Re: 13.x, stream replication and locale(?) issues
On Wed, Mar 1, 2023 at 10:30 AM Thomas Munro wrote: > On Wed, Mar 1, 2023 at 12:09 AM Eugene M. Zheganin wrote: > > 3) how do I fix it ? Should I take locale sources for ru_RU.utf8 on Linux > > and compile it on FreeBSD - will it help ? > > Out of curiosity (I'm not saying it's a good idea!), do you know if > FreeBSD's localedef can compile glibc's collation definitions? In > theory they are in a format standardised by POSIX... I suspect there > may be extensions and quirks... Another thought: if you upgrade to 15, you could use ICU as the default collation provider, and then make sure you have the same major version of ICU on your Linux and FreeBSD systems (which probably means compiling from source on FreeBSD, as FreeBSD tends to have a newish ICU in packages, while popular stable Linux distributions tend to have a fairly old one). As far as I know, Linux and FreeBSD systems on amd64 arch should match in every other important detail already (they both follow the System V ABI, so there shouldn't be any layout differences in eg structs that are sent over the wire AFAIK). For what it's worth, for the future we are trying to figure out how to support multi-version ICU so that you could explicitly set the provider to ICU v72 to get that sort of thing working across OS versions and even "close enough" cousins like your case, without having to compile anything from source, but unfortunately we didn't manage to get it ready in time for 16.
Re: src/test/examples/testlibpq2.c where the HAVE_SYS_SELECT_H is defined.
On Fri, Mar 17, 2023 at 7:48 PM jian he wrote: > Hi, > playing around with $[0] testlibpq2.c example. I wondered where > HAVE_SYS_SELECT_H is defined? > > I searched on the internet, founded that people also asked the same question > in $[1]. > > In my machine, I do have . > system version: Ubuntu 22.04.1 LTS > gcc version: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 > gcc compile command: gcc pg_testlibpq2.c -I/home/jian/postgres/pg16/include \ > -L/home/jian/postgres/pg16/lib -lpq > > [0]https://www.postgresql.org/docs/current/libpq-example.html > [1]https://stackoverflow.com/questions/37876850/in-compilation-time-how-to-find-the-macro-is-defined-in-which-header-file In 15 and earlier, it is defined in pg_config.h, which is created by configure. But in 16, that particular macro was removed by commit 7e50b4e3c. It looks like you are using PostgreSQL 16 sources, but looking at PostgreSQL 15 examples?
Re: Postgresql issue: FATAL: dsa_allocate could not find 7 free pages
On Fri, Oct 26, 2018 at 2:21 AM Alexandre Assouad wrote: > FATAL: dsa_allocate could not find 7 free pages > Some users have faced this issue which seems to be a bug in postgresql query > planner which should be solved : > https://www.postgresql.org/message-id/CAEepm%3D1k7sYJbxoOSJcS-4ti2MHOnBXBfLf%3D-gtuFLTXPqvTDg%40mail.gmail.com Hello Alexandre, Thanks for the report. Yes, that bug does seem like a plausible explanation for that error. The fix was in commit 8ffc3be1, which will be included in 10.6 (target release date: November 8th). It's also in 11.0, out now. If you are able to reproduce the problem easily on a copy of your database, and you have the time/inclination to investigate, is there any chance you could test the query on a local build of REL_10_STABLE (the branch that will produce 10.6 soon), instructions below, or the released v11.0 (if Timescale is available for that, it doesn't look like it)? If not, no worries. > But I’m still facing this issue. > I’m using postgresql 10.5 on ubuntu 18.04 > With timescaledb extension (which could be involved in this bug but I > couldn’t find any related issue on their side) It's interesting that reports came from users of Citus and Timescale. There doesn't seem to be any reason to think it's caused by anything these extension are doing, other than just having a lot of data, big queries and the right access pattern to hit that bug. === How to set up a throw-away REL_10_STABLE cluster: On an Ubuntu developer machine, check out, build, install into temporary directory: sudo apt-get install git make gcc flex bison libz-dev libreadline-dev git clone https://github.com/postgres/postgres.git cd postgres git checkout REL_10_STABLE ./configure --prefix=$HOME/tmp_install make -s -j8 && make -s install Initialise and start a database cluster: ~/tmp_install/bin/initdb -D ~/tmp_pgdata ~/tmp_install/bin/postgres -D ~/tmp_pgdata ... now postgres is running in the foreground, until you hit ^C ... do whatever you need to do to install Timescale extension, schema, data, reproduce problem To check that you can reproduce the problem in 10.5 with a server built that way, stop that and: git checkout REL_10_5 make -s clean && make -s -j8 && make -s install ~/tmp_install/bin/postgres -D ~/tmp_pgdata To install Timescale it's probably the instructions from https://github.com/timescale/timescaledb, using ./bootstrap -DPG_CONFIG=~/tmp_install/bin/pgconfig but i haven't tried that myself. (You don't have to run initdb again or reload data when switching between tags/branches in the 10.x series). -- Thomas Munro http://www.enterprisedb.com
Re: Postgresql issue: FATAL: dsa_allocate could not find 7 free pages
On Fri, Oct 26, 2018 at 9:17 PM Alexandre Assouad wrote: > I’ll try next week to build a test environment and I’ll send you the results. > Does it make any difference to set up a VM vs a dedicated machine ? Thanks. Probably not, the important thing is probably the same data, settings (work_mem etc) and query. Maybe start with a build of REL_10_5 and try to reproduce the problem, and if you can get that to happen reliably, then switch to a build of REL_10_STABLE to confirm that the problem goes away? -- Thomas Munro http://www.enterprisedb.com
Re: Different memory allocation strategy in Postgres 11?
On Sat, Oct 27, 2018 at 6:10 AM Thomas Kellerer wrote: > Jeff Janes schrieb am 26.10.2018 um 17:42: > > I typically configure "shared_buffers = 4096MB" on my 16GB system as > > sometimes when testing, it pays off to have a bigger cache. > > > > With Postgres 10 and earlier, the Postgres process(es) would only > > allocate that memory from the operating system when needed. > > So right after startup, it would only consume several hundred MB, not > > the entire 4GB > > > > However with Postgres 11 I noticed that it immediately grabs the > > complete memory configured for shared_buffers during startup. > > > > It's not really a big deal, but I wonder if that is an intentional > > change or a result from something else? > > > > > > Do you have pg_prewarm in shared_preload_libraries? > > No. The only shared libraries are those for pg_stat_statemens Does your user have "Lock Pages in Memory" privilege? One thing that is new in 11 is huge AKA large page support, and the default is huge_pages=try. Not a Windows person myself but I believe that should succeed if you have that privilege and enough contiguous chunks of physical memory are available. If you set huge_pages=off does it revert to the old behaviour? -- Thomas Munro http://www.enterprisedb.com
Re: Is there a way to speed up WAL replay?
On Thu, Nov 1, 2018 at 4:25 AM Jeff Janes wrote: > On Wed, Oct 31, 2018 at 1:38 AM Torsten Förtsch > wrote: >> I am working on restoring a database from a base backup + WAL. With the >> default settings the database replays about 3-4 WAL files per second. The >> startup process takes about 65% of a CPU and writes data with something >> between 50 and 100 MB/sec. >> >> Is there a way to speed that up? The disk can easily sustain 400-500 MB/sec. > > > WAL replay is single-threaded, so the most you would be able to speed it up > is 50%, to where it would be taking 100% CPU. > > Is the time spent not on the CPU being spent waiting for WAL files to arrive > from the restore_command, or waiting for the blocks it needs to replay into > to get read from disk, or waiting for dirty pages to get written to disk so > they can be evicted to make way for new ones? > > One way I found to speed up restore_command is to have another program run a > few WAL files ahead of it, copying the WAL from the real archive into a > scratch space which is on the same filesystem as pg_xlog/pg_wal. Then have > restore_command simply move (not copy) the requested files from the scratch > space onto %p. The intra-filesystem move completes much faster than a copy. > > If it spends time waiting for blocks that need to be recovered into to get > read from disk, and you have enough RAM, you could speed it up by pre-warming > the file system cache. Something like: > > tar -cf - $PGDATA | wc -c For more targeted prewarming of large systems that don't fit in RAM and to get all the way into PostgreSQL's buffer pool, I suppose you could write a small Python/whatever script that extracts the relfilenode + block references from the output of pg_waldump (one file ahead, or whatever), sorts and uniques them, merges them into block ranges, converts the relfilenode reference to relation OID, and then calls pg_prewarm() for each range. -- Thomas Munro http://www.enterprisedb.com
Re: Impact on PostgreSQL due to Redhat acquisition by IBM
On Fri, Nov 16, 2018 at 5:07 AM Adrian Klaver wrote: > On 11/15/18 7:59 AM, Sachin Kotwal wrote: > > I feel community has most of linux based instance in thier buildfarm for > > testing, might be very few Ubuntu based. > > I might be wrong here. > > Easy enough to see: > > https://buildfarm.postgresql.org/cgi-bin/show_members.pl Out of curiosity, here are the current counts for HEAD: Linux distros: Amazon Linux: 1 Arch Linux: 1 CentOS: 9 Debian: 34 Fedora: 5 Photon: 1 Raspbian: 2 RHEL: 8 SUSE: 7 Ubuntu: 7 BSD diaspora: DragonflyBSD: 1 FreeBSD: 6 NetBSD: 2 OpenBSD: 1 OpenSolaris diapora: OmniOS: 1 SmartOS: 1 Other Unixen: AIX: 4 HP-UX: 3 macOS: 4 Windows: Windows: 6 Cygwin: 1 I wouldn't be too worried about any of these, especially the open ones. Closed Solaris, though, is apparently dead to us. Nobody cares enough to build HEAD on it anymore, and for example 3a769d82 (reflink support for pg_upgrade) went in without consideration of Solaris 11.4 reflink(). I've personally moved to the 'acceptance' phase of grief; all three Unixes that I cut my teeth on in the 90s are now either formally dead and buried or in this case, a zombie. >From personal observations, I know that we have developers and committers doing their primary development work on at least Debian, Fedora, FreeBSD, macOS, Ubuntu and Windows. -- Thomas Munro http://www.enterprisedb.com
Re: Transition Tables doesn´t have OID
On Sun, Dec 2, 2018 at 7:38 AM Adrian Klaver wrote: > > On 12/1/18 8:51 AM, PegoraroF10 wrote: > > ok > > then, help me to find alternatives to it. > > As I sad, sometimes whe change our PK, so using OID would be a smart way to > > have a join between old and new transition tables and we would like to use > > transition tables because each statement is a lot faster than each row for > > auditing purposes. > > > > So, whats the alternative ? One trigger for each row just for changing PK > > values (will occur few times) and another trigger for each statement to do > > the logging ? > > If you don't want to change the structure of your tables that would be > seem to be the way to go. It will require some thought to make sure the > 'for each row' and 'for each statement' don't interfere with each other. I also contemplated this question when hacking on transition tables. We know that the new and old tuplestores are ordered in the same way (as an implementation detail), but there is no way to take advantage of that knowledge in SQL, where relations are unordered sets. There is a syntax WITH ORDINALITY that SQL uses to deal with the fact that set-returning functions' results might really be be ordered, and you might not want to lose that information. Suppose we allowed WITH ORDINALITY for transition tables, so that the 'capture' order of rows could be exposed, and we promised that old and new ORDINALTITY numbers will line up, and then we made the ORDINALITY column a pathkey of the scan. Now you could join old and new tables by the ORDINALITY column, and get a merge join without any sorting. That's... pretty weird though, and the syntax would be outside the SQL spec, and the semantics might be questionable. -- Thomas Munro http://www.enterprisedb.com
Re: What is the tuplestore?
On Tue, Dec 11, 2018 at 2:56 AM Ron wrote: > 2018-12-07 06:21:55.504 EST 10.140.181.89(35868) CDS CDSLBXW 13748 SELECT > PostgreSQL JDBC Driver 53100 ERROR: could not write to tuplestore temporary > file: No space left on device > > I see this in the pg_log file, but #1 can't figure out what "tuplestore" is > (Google doesn't help), and #2 there's lots of space on all my file systems. > data/base, where pgsql_tmp lives, has 96GB free.) Maybe the setting log_temp_files could help you identify the query that is doing it? https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-TEMP-FILES -- Thomas Munro http://www.enterprisedb.com
Re: What is the tuplestore?
On Tue, Dec 11, 2018 at 3:58 PM Ron wrote: > On 12/10/2018 06:24 PM, Thomas Munro wrote: > > On Tue, Dec 11, 2018 at 2:56 AM Ron wrote: > >> 2018-12-07 06:21:55.504 EST 10.140.181.89(35868) CDS CDSLBXW 13748 SELECT > >> PostgreSQL JDBC Driver 53100 ERROR: could not write to tuplestore > >> temporary > >> file: No space left on device > >> > >> I see this in the pg_log file, but #1 can't figure out what "tuplestore" is > >> (Google doesn't help), and #2 there's lots of space on all my file systems. > >> data/base, where pgsql_tmp lives, has 96GB free.) > > Maybe the setting log_temp_files could help you identify the query > > that is doing it? > > > > https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-TEMP-FILES > > Great idea; I've made the change. What do I grep for in the pg_log file? It should say something like: LOG: temporary file: path "base/pgsql_tmp/...", size ... STATEMENT: ... -- Thomas Munro http://www.enterprisedb.com
Re: Does PostgreSQL use atomic file creation of FS?
On Wed, Dec 12, 2018 at 11:52 PM Dmitry Lazurkin wrote: > Thank you. But I have read this. I said about network file system only > for example. I would like to known how PostgreSQL handles this specific > case (of course if someone knowns a answer): > > fd = open(file, "w"); > write(fd, data); > // crash and now I have empty file which isn't correct > fsync(fd); > > PS. I think PostgreSQL doesn't have this problem. It depends on the context, but in general PostgreSQL knows about that sort of thing. When the cluster shuts down, it records that it shut down cleanly, meaning that everything that should be on disk is on disk. When you start the cluster up, if it sees that it didn't shut down cleanly, it enters recovery. During recovery it tolerates files being too short while it's replaying the WAL to get back to a consistent state. See the comment in mdread() for example: https://github.com/postgres/postgres/blob/master/src/backend/storage/smgr/md.c#L755 It's called "write-ahead log" because we log our intention before we write to data files (and make sure it's on disk first), so we'll always replay the same effects again if we're interrupted. The WAL is a magic source of reliability (we can do it again if things go wrong) and also performance (IO becomes serial, optimised for the storage hardware). https://www.postgresql.org/docs/current/wal-intro.html -- Thomas Munro http://www.enterprisedb.com
Re: shared_buffers on Big RAM systems
On Fri, Dec 14, 2018 at 2:17 AM Ron wrote: > https://www.postgresql.org/docs/9.6/runtime-config-resource.html > > The docs say, "If you have a dedicated database server with 1GB or more of > RAM, a reasonable starting value for shared_buffers is 25%". > > But that's pretty archaic in 2018. What if the dedicated database server > has 128GB RAM? I agree, we might as well drop the words "with 1GB of more of RAM". That's the size of the very smallest cloud instances available these days, available for free or up to a few bucks a month, and for physical servers I wonder if you can still get DIMMs that small. -- Thomas Munro http://www.enterprisedb.com
Re: Unused files in the database directory after crashed VACUUM FULL
On Mon, Feb 11, 2019 at 10:21 AM Tom Lane wrote: > Ah, so Andrew was correct: we panicked due to lack of WAL space, and > that explains why the vacuuming process didn't have an opportunity > to delete the files belonging to the uncommitted new relation. > It's a pretty well-understood dynamic, I believe. Perhaps we should > try harder to recover cleanly, but I don't know of anyone putting > effort into the case. FTR I am working on a PG13 patch that records relfilenodes of uncommitted transactions in undo logs, so it can unlink them reliably, even if you crash (at the cost of introducing a WAL flush before creating files). I haven't specifically studied the VACUUM FULL case yet, but in principle this is exactly what my project aims to fix. It's mostly intended as example code to demonstrate the undo log machinery (defining undo record types, registering undo log action functions that are invoked during rollback, rollback of aborted but not yet rolled back transaction at startup, ...) without having to understand the whole zheap sandwich at once, but it's also a solution to an age old problem. More on that soon. -- Thomas Munro http://www.enterprisedb.com
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Fri, Feb 15, 2019 at 2:56 PM Bruce Klein wrote: > > In 11.1 did you see the message "WARNING: could not flush dirty data: > > Function not implemented" > Yes I wonder if this is coming from sync_file_range(), which is not implemented on WSL according to random intergoogling, but probably appears as present to our configure script. I find it harder to believe they didn't implement fsync(). -- Thomas Munro http://www.enterprisedb.com
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Fri, Feb 15, 2019 at 3:56 PM Thomas Munro wrote: > On Fri, Feb 15, 2019 at 2:56 PM Bruce Klein wrote: > > > In 11.1 did you see the message "WARNING: could not flush dirty data: > > > Function not implemented" > > Yes > > I wonder if this is coming from sync_file_range(), which is not > implemented on WSL according to random intergoogling, but probably > appears as present to our configure script. I find it harder to > believe they didn't implement fsync(). Here is a place where people go to complain about that: https://github.com/Microsoft/WSL/issues/645 I suppose we could tolerate ENOSYS. -- Thomas Munro http://www.enterprisedb.com
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Fri, Feb 15, 2019 at 5:29 PM Tom Lane wrote: > Thomas Munro writes: > >> On Fri, Feb 15, 2019 at 2:56 PM Bruce Klein wrote: > >>> In 11.1 did you see the message "WARNING: could not flush dirty data: > >>> Function not implemented" > >> Yes > > > Here is a place where people go to complain about that: > > https://github.com/Microsoft/WSL/issues/645 > > I suppose we could tolerate ENOSYS. > > What I'm not grasping here is why you considered that sync_file_range > failure should be treated as a reason to PANIC in the first place? > Surely it is not fsync(), nor some facsimile thereof. In fact, if > any of the branches in pg_flush_data really need the data_sync_elevel > treatment, somebody's mental model of that operation needs adjustment. > Maybe it's mine. My thinking was that sync_file_range() might in its current, future or alternative (WSL, ...) implementation eat an error that would otherwise reach fsync(), due to the single-flag error state treatment we see in several OSes (older Linux, also recent Linux via the 'seen' flag that we rely on to receive errors that happened before we opened the fd). Should we be inspecting the Linux source or asking assurances from Linux hackers that that can't happen? Perhaps it behaves more like fdatasync() with the SYNC_FILE_RANGE_WAIT_* flags (= can clear seen flag), but more like fadvise() without (can't touch it)? I don't know, and I didn't want to base my choice on what it looks like it currently does in the Linux tree. Without guarantees from standards (not relevant here) or man pages (which note only that EIO is possible), I made what I thought was an appropriately pessimistic choice. -- Thomas Munro http://www.enterprisedb.com
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Sat, Feb 16, 2019 at 6:50 AM Andres Freund wrote: > On February 15, 2019 9:44:50 AM PST, Tom Lane wrote: > >Andres Freund writes: > >> On February 15, 2019 9:13:10 AM PST, Tom Lane > >wrote: > >>> I'm of the opinion that we shouldn't be panicking for > >sync_file_range > >>> failure, period. > > > >> With some flags it's strictly required, it does"eat"errors depending > >on the flags. So I'm not sure I understand? > > > >Really? The specification says that it starts I/O, not that it waits > >around for any to finish. > > That depends on the flags you pass in. By memory I don't think it eats an > error with our flags in recent kernels, but I'm not sure. Right, there was some discussion of that, and I didn't (and still don't) think it'd be wise to rely on undocumented knowledge about which flags can eat errors based on a drive-by reading of a particular snapshot of the Linux tree. The man page says it can return EIO; I think we should assume that it might actually do that. BTW I had a report from someone on IRC that PostgreSQL breaks in other ways (not yet understood) if you build it directly on WSL/Ubuntu. I guess the OP is reporting about a .deb that was built on a real Linux system. I'm vaguely familiar with these types of problems from other platforms (Linux syscall emulation on FreeBSD and Sun-ish systems, and also I'm old enough to remember people doing SCO SysV syscall emulation on Linux systems back before certain valuable software was available natively); it's possible that you get ENOSYS on other emulators too, considering that other kernels don't seem to have a sync_file_range()-like facility, but probably no one cares, since there is no reason to run PostgreSQL on a syscall emulator when you can run it natively. This is a bit different though: I guess people want to be able to develop Linux-stack stuff on company-issued Windows computers for later deployment on Linux servers; someone interested in this would ideally make it work and set up a build farm animal to tell us when we break it. It would probably require only minimal changes, but considering that no one bothered to complain about PostgreSQL spewing scary looking warnings on WSL for years, it's not too surprising that we didn't consider this case before. A bit like the nightjar case, the PANIC patch revealed a pre-existing problem that had gone unreported and needs some work, but it doesn't seem like a very good reason to roll back that part of the change completely IMHO. -- Thomas Munro http://www.enterprisedb.com
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Sun, Feb 17, 2019 at 4:56 AM Tom Lane wrote: > Thomas Munro writes: > >>> Really? The specification says that it starts I/O, not that it waits > >>> around for any to finish. > > > Right, there was some discussion of that, and I didn't (and still > > don't) think it'd be wise to rely on undocumented knowledge about > > which flags can eat errors based on a drive-by reading of a particular > > snapshot of the Linux tree. The man page says it can return EIO; I > > think we should assume that it might actually do that. > > I had a thought about this: maybe we should restrict the scope of this > behavior to be "panic on EIO", not "panic on anything within hailing > distance of fsync". > > The direction you and Andres seem to want to go in is to add a pile of > unprincipled exception cases, which seems like a recipe for constant > pain to me. I think we might be better off with a whitelist of errnos > that mean trouble, instead of a blacklist of some that don't. I'm > especially troubled by the idea that blacklisting some errnos might > reduce to ignoring them completely, which would be a step backwards > from our pre-PANIC behavior. Hmm. Well, at least ENOSPC should be treated the same way as EIO. Here's an experiment that seems to confirm some speculations about NFS on Linux from the earlier threads: $ uname -a Linux debian 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 GNU/Linux $ dpkg -l nfs-kernel-server | tail -1 ii nfs-kernel-server 1:1.3.4-2.4 amd64support for NFS kernel server First, set up a 10MB loop-back filesystem: $ dd if=/dev/zero of=/tmp/10mb.loopback bs=1024 count=1 $ sudo losetup /dev/loop0 /tmp/10mb.loopback $ sudo mkfs -t ext3 -m 1 -v /dev/loop0 ... $ sudo mkdir /mnt/test_loopback $ sudo mount -t ext3 /dev/loop0 /mnt/test_loopback Then, export that via NFS: $ tail -1 /etc/exports /mnt/test_loopback localhost(rw,sync,no_subtree_check) $ sudo exportfs -av exporting localhost:/mnt/test_loopback Next, mount that over NFS: $ sudo mkdir /mnt/test_loopback_remote $ sudo mount localhost:/mnt/test_loopback /mnt/test_loopback_remote Now, fill up the whole disk with a file full of newlines: $ sudo mkdir /mnt/test_loopback/dir $ sudo chown $USER:$USER /mnt/test_loopback/dir $ tr "\000" "\n" < /dev/zero > /mnt/test_loopback_remote/dir/file tr: write error: No space left on device tr: write error $ df -h /mnt/test_loopback* FilesystemSize Used Avail Use% Mounted on /dev/loop08.5M 8.4M 0 100% /mnt/test_loopback localhost:/mnt/test_loopback 8.5M 8.4M 0 100% /mnt/test_loopback_remote Now, run a program that appends a greeting and then calls fsync() twice: $ cat test.c #include #include #include int main(int argc, char *argv[]) { int fd, rc; fd = open("/mnt/test_loopback_remote/dir/file", O_RDWR | O_APPEND); if (fd < 0) { perror("open"); return 1; } rc = write(fd, "hello world\n", 12); if (rc < 0) perror("write"); else if (rc < 12) fprintf(stderr, "only managed to write %d bytes\n", rc); rc = fsync(fd); if (rc < 0) perror("fsync 1"); rc = fsync(fd); if (rc < 0) perror("fsync 2"); rc = close(fd); if (rc < 0) perror("close"); return 0; } $ cc test.c $ ./a.out fsync 1: No space left on device $ The write() and the second fsync() reported success. Great, let's go and look at our precious data, both through NFS and locally: $ tail -3 /mnt/test_loopback_remote/dir/file $ tail -3 /mnt/test_loopback/dir/file $ It's gone. If you try it again with a file containing just a few newlines so there is free space, it works correctly and you see the appended greeting. Perhaps the same sort of thing might happen with remote EDQUOT, but I haven't tried that. Perhaps there are some things that could be tuned that would avoid that? (Some speculation about NFS: To avoid data-loss from running out of disk space, I think PostgreSQL requires either a filesystem that reserves space when we're extending a file, so that we can exclude the possibility of ENOSPC before we evict data from our own shared buffers, or a page cache that doesn't drop dirty flags or whole buffers on failure so we can meaningfully retry once space becomes available. As far as I know, the former would be theoretically possible with NFS, if the client and server are using NFSv4.2+ with ALLOCATE support and glibc and kernel versions both support true fallocate() and pass it all the way through, but current versions either don
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Mon, Feb 18, 2019 at 2:19 PM Michael Paquier wrote: > On Sun, Feb 17, 2019 at 10:54:54AM -0800, Andres Freund wrote: > > On 2019-02-17 23:29:09 +1300, Thomas Munro wrote: > >> Hmm. Well, at least ENOSPC should be treated the same way as EIO. > >> Here's an experiment that seems to confirm some speculations about NFS > >> on Linux from the earlier threads: > > > > I wish we'd' a good way to have test scripts in the tree for something > > like that, but using postgres. Unfortunately it's not easy to write > > portable setup scripts for it. > > Yes, it seems to me as well that ENOSPC should be treated as much as > EIO. Just looking at the code for data_sync_retry we should really > have some errno filtering. I agree with you up to a point: It would make some amount of sense for data_sync_elevel() not to promote to PANIC if errno == ENOSYS; then for sync_file_range() you'd get WARNING and for fsync() you'd get ERROR (since that's what those call sites pass in) on hypothetical kernels that lack those syscalls. As I argued earlier, ENOSYS seems to be the only errno that we know for sure to be non-destructive to the page cache since it promises it didn't run any kernel code at all (or rather didn't make it past the front door), so it's the *only* errno that belongs on such a whitelist IMHO. That would get us back to where we were for WSL users in 11.1. The question is whether we want to go further than that and provide a better experience for WSL users, now that we know that it was already spewing warnings. One way to do that might be not to bother with errno filtering at all, but instead (as Andres suggested) do a test of whether sync_file_range() is implemented on this kernel/emulator at startup and if not, just disable it somehow. Then we don't need any filtering. Here is a restatement of my rationale for not including errno filtering in the first place: Take a look at the documented errnos for fsync() on Linux. Which of those tell us that it's sane to retry? AFAICS they all either mean that it ran filesystem code that is known to toast your data on error (data loss has already occurred), or your file descriptor is bogus (code somewhere is seriously busted and all bets are off). -- Thomas Munro http://www.enterprisedb.com
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Tue, Feb 19, 2019 at 6:01 AM Andres Freund wrote: > On 2019-02-18 10:33:50 -0500, Ravi Krishna wrote: > > Are there any plans to support PG on WSL ? Just curious. Hi Ravi, I definitely want to fix this particular issue for 11.3. > I think if somebody wanted to start investing efforts to improve testing > of that setup, and then fix the resulting issues, nobody would seriously > object. But also most people working on PG are already busy. Please > feel free to create a buildfarm test machine with postgres running on > WSL. Right, the first step would be for a WSL user to figure out what's wrong with builds on the WSL and show us how to fix it; I heard through the grapevine that if you try it, initdb doesn't work (it must be something pretty subtle in the configure phase or something like that, since the Ubuntu .deb apparently works, except for the issue reported in this thread). Then, confirm that they're happy with whatever patch we come up with. Then if someone wants to make sure we don't accidentally break it in future, yeah, a build farm animal would help a lot. Here's a starter patch that shows one of the approaches discussed. It gets WSL users to a better place than they were before, by suppressing further warnings after the first one. -- Thomas Munro http://www.enterprisedb.com -- <https://postgresvision.com/> <https://postgresvision.com/> 0001-Tolerate-ENOSYS-failure-from-sync_file_range.patch Description: Binary data
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Tue, Feb 19, 2019 at 5:16 PM James Sewell wrote: >> Here's a starter patch that shows one of the approaches discussed. It >> gets WSL users to a better place than they were before, by suppressing >> further warnings after the first one. > > This wasn't quite right, updated to check erro for ENOSYS (not rc) > > This compiles and stops the panic on WSL (with a single warning). > > I haven't tested if a version compiled on Linux will behave the same way - > but based on the error messages in the top post it looks like the behavior is > the same. Great. Thanks for testing, and for the fix! Well that all sounds like good news: it corrects the behaviour from 11.2, and also improves on the previous behaviour which I'd have accepted as a bug if anyone had reported it. So the next problem is that we don't have a consensus on whether this is the right approach, so I don't feel like I can commit it yet. Does any want to make another concrete proposal? -- Thomas Munro https://enterprisedb.com
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
On Tue, Feb 19, 2019 at 5:31 PM Thomas Munro wrote: > On Tue, Feb 19, 2019 at 5:16 PM James Sewell > wrote: > >> Here's a starter patch that shows one of the approaches discussed. It > >> gets WSL users to a better place than they were before, by suppressing > >> further warnings after the first one. > > > > This wasn't quite right, updated to check erro for ENOSYS (not rc) > > > > This compiles and stops the panic on WSL (with a single warning). > > > > I haven't tested if a version compiled on Linux will behave the same way - > > but based on the error messages in the top post it looks like the behavior > > is the same. > > Great. Thanks for testing, and for the fix! Well that all sounds > like good news: it corrects the behaviour from 11.2, and also improves > on the previous behaviour which I'd have accepted as a bug if anyone > had reported it. So the next problem is that we don't have a > consensus on whether this is the right approach, so I don't feel like > I can commit it yet. Does any want to make another concrete proposal? Ok, here's the version I'm planning to push soon if there are no objections. Re-adding Bruce to the thread, as I just noticed the CC list got pruned at some point in this thread. -- Thomas Munro https://enterprisedb.com 0001-Tolerate-ENOSYS-failure-from-sync_file_range-v3.patch Description: Binary data
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
>> > Great. Thanks for testing, and for the fix! Well that all sounds >> > like good news: it corrects the behaviour from 11.2, and also improves >> > on the previous behaviour which I'd have accepted as a bug if anyone >> > had reported it. So the next problem is that we don't have a >> > consensus on whether this is the right approach, so I don't feel like >> > I can commit it yet. Does any want to make another concrete proposal? >> >> Ok, here's the version I'm planning to push soon if there are no objections. >> Re-adding Bruce to the thread, as I just noticed the CC list got >> pruned at some point in this thread. Pushed. I also noticed that the call to sync_file_range() in file_utils.c used by fsync_pgdata() ignores the return code, and in the 9.4 and 9.5 branches, the call in pg_flush_data() ignores the return code too. This inconsistency should be fixed; I'll think about which direction it should be fixed in (either we are convinced that sync_file_range(SYNC_FILE_RANGE_WRITE) is non-destructive of error state or we aren't, and should handle it everywhere), and maybe start a new -hackers thread. -- Thomas Munro https://enterprisedb.com
Re: LDAP authenticated session terminated by signal 11: Segmentation fault, PostgresSQL server terminates other active server processes
On Thu, Feb 21, 2019 at 2:42 PM Mike Yeap wrote: > openldap-clients.x86_64 2.4.44-21.el7_6 > @updates > openldap-devel.i686 2.4.44-21.el7_6updates > openldap-devel.x86_64 2.4.44-21.el7_6updates > openldap.i686 2.4.44-21.el7_6updates > openldap-servers-sql.x86_64 2.4.44-21.el7_6updates > openldap-servers.x86_64 2.4.44-21.el7_6updates > openldap.x86_64 2.4.44-21.el7_6 > @updates > On Wed, Feb 20, 2019 at 10:17 PM Tom Lane wrote: >> With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, PostgreSQL >> backends can crash at exit. Raise a warning during "configure" based on >> the compile-time OpenLDAP version number, and test the crash scenario in >> the dblink test suite. Back-patch to 9.0 (all supported versions). Clearly 2.4.44 is not in the range 2.4.24 through 2.4.31. Perhaps the dangerous range is out of date? Hmm, so Noah's analysis[1] says this is a clash between libldap_r.so (used by libpq) and libldap.so (used by the server), specifically in destructor/exit code. Curiously, in a thread about Curl's struggles with this problem, I found a claim[2] that Debian decided to abandon the non-"_r" variant and just use _r always. Sure enough, on my Debian buster VM I see a symlink libldap-2.4.so.2 -> libldap_r-2.4.so.2. So essentially Debian and friends have already forced Noah's first option on users: > 1. Link the backend with libldap_r, so we never face the mismatch. On some > platforms, this means also linking in threading libraries. FreeBSD and CentOS systems near me have separate libraries still. [1] https://www.postgresql.org/message-id/flat/20140612210219.GA705509%40tornado.leadboat.com [2] https://www.openldap.org/lists/openldap-technical/201608/msg00094.html -- Thomas Munro https://enterprisedb.com
Re: LDAP authenticated session terminated by signal 11: Segmentation fault, PostgresSQL server terminates other active server processes
On Tue, Feb 26, 2019 at 8:17 PM Mike Yeap wrote: > Hi Thomas, does that mean the bug is still there? Hi Mike, I haven't tried to repro this myself, but it certainly sounds like it. It also sounds like it would probably go away if you switched to a Debian-derived distro, instead of a Red Hat-derived distro, but I doubt that's the kind of advice you were looking for. We need to figure out a proper solution here, though I'm not sure what. Question for the list: other stuff in the server needs libpthread (SSL, LLVM, ...), so why are we insisting on using non-MT LDAP? -- Thomas Munro https://enterprisedb.com
Re: LDAP authenticated session terminated by signal 11: Segmentation fault, PostgresSQL server terminates other active server processes
On Tue, Feb 26, 2019 at 9:11 PM Thomas Munro wrote: > On Tue, Feb 26, 2019 at 8:17 PM Mike Yeap wrote: > > Hi Thomas, does that mean the bug is still there? > I haven't tried to repro this myself, but it certainly sounds like it. > It also sounds like it would probably go away if you switched to a > Debian-derived distro, instead of a Red Hat-derived distro, but I > doubt that's the kind of advice you were looking for. We need to > figure out a proper solution here, though I'm not sure what. Question > for the list: other stuff in the server needs libpthread (SSL, LLVM, > ...), so why are we insisting on using non-MT LDAP? Concretely, why don't we just kill the LDAP_LIBS_FE/LDAP_LIBS_BE distinction and use a single LDAP_LIBS? Then it'll always match. It can still be the non-MT variant if you build with --disable-thread-safety (who does that?), but then it'll be the same in the server too so that postgres_fdw + ldap works that way too. Sketch patch attached. -- Thomas Munro https://enterprisedb.com diff --git a/configure b/configure index 481bd3b66d..e2fc438297 100755 --- a/configure +++ b/configure @@ -652,8 +652,7 @@ CFLAGS_SSE42 have_win32_dbghelp LIBOBJS UUID_LIBS -LDAP_LIBS_BE -LDAP_LIBS_FE +LDAP_LIBS PTHREAD_CFLAGS PTHREAD_LIBS PTHREAD_CC @@ -12375,7 +12374,6 @@ else as_fn_error $? "library 'ldap' is required for LDAP" "$LINENO" 5 fi -LDAP_LIBS_BE="-lldap $EXTRA_LDAP_LIBS" if test "$enable_thread_safety" = yes; then # on some platforms ldap_r fails to link without PTHREAD_LIBS { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ldap_simple_bind in -lldap_r" >&5 @@ -12425,9 +12423,9 @@ else as_fn_error $? "library 'ldap_r' is required for LDAP" "$LINENO" 5 fi - LDAP_LIBS_FE="-lldap_r $EXTRA_LDAP_LIBS" + LDAP_LIBS="-lldap_r $EXTRA_LDAP_LIBS" else - LDAP_LIBS_FE="-lldap $EXTRA_LDAP_LIBS" + LDAP_LIBS="-lldap $EXTRA_LDAP_LIBS" fi for ac_func in ldap_initialize do : @@ -12488,14 +12486,12 @@ else as_fn_error $? "library 'wldap32' is required for LDAP" "$LINENO" 5 fi -LDAP_LIBS_FE="-lwldap32" -LDAP_LIBS_BE="-lwldap32" +LDAP_LIBS="-lwldap32" fi LIBS="$_LIBS" fi - # for contrib/sepgsql if test "$with_selinux" = yes; then { $as_echo "$as_me:${as_lineno-$LINENO}: checking for security_compute_create_name in -lselinux" >&5 diff --git a/configure.in b/configure.in index 9c7a9738bc..bf21728268 100644 --- a/configure.in +++ b/configure.in @@ -1238,26 +1238,23 @@ if test "$with_ldap" = yes ; then AC_CHECK_LIB(ldap, ldap_bind, [], [AC_MSG_ERROR([library 'ldap' is required for LDAP])], [$EXTRA_LDAP_LIBS]) -LDAP_LIBS_BE="-lldap $EXTRA_LDAP_LIBS" if test "$enable_thread_safety" = yes; then # on some platforms ldap_r fails to link without PTHREAD_LIBS AC_CHECK_LIB(ldap_r, ldap_simple_bind, [], [AC_MSG_ERROR([library 'ldap_r' is required for LDAP])], [$PTHREAD_CFLAGS $PTHREAD_LIBS $EXTRA_LDAP_LIBS]) - LDAP_LIBS_FE="-lldap_r $EXTRA_LDAP_LIBS" + LDAP_LIBS="-lldap_r $EXTRA_LDAP_LIBS" else - LDAP_LIBS_FE="-lldap $EXTRA_LDAP_LIBS" + LDAP_LIBS="-lldap $EXTRA_LDAP_LIBS" fi AC_CHECK_FUNCS([ldap_initialize]) else AC_CHECK_LIB(wldap32, ldap_bind, [], [AC_MSG_ERROR([library 'wldap32' is required for LDAP])]) -LDAP_LIBS_FE="-lwldap32" -LDAP_LIBS_BE="-lwldap32" +LDAP_LIBS="-lwldap32" fi LIBS="$_LIBS" fi -AC_SUBST(LDAP_LIBS_FE) -AC_SUBST(LDAP_LIBS_BE) +AC_SUBST(LDAP_LIBS) # for contrib/sepgsql if test "$with_selinux" = yes; then diff --git a/src/Makefile.global.in b/src/Makefile.global.in index c118f64040..6fe5afa890 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -284,8 +284,7 @@ AR = @AR@ DLLTOOL = @DLLTOOL@ DLLWRAP = @DLLWRAP@ LIBS = @LIBS@ -LDAP_LIBS_FE = @LDAP_LIBS_FE@ -LDAP_LIBS_BE = @LDAP_LIBS_BE@ +LDAP_LIBS = @LDAP_LIBS@ UUID_LIBS = @UUID_LIBS@ UUID_EXTRA_OBJS = @UUID_EXTRA_OBJS@ LLVM_LIBS=@LLVM_LIBS@ @@ -570,7 +569,7 @@ endif # Cygwin seems to need ldap libraries to be mentioned here, too ifeq ($(PORTNAME),cygwin) -libpq_pgport += $(LDAP_LIBS_FE) +libpq_pgport += $(LDAP_LIBS) endif diff --git a/src/Makefile.shlib b/src/Makefile.shlib index 373d73caef..8765818920 100644 --- a/src/Makefile.shlib +++ b/src/Makefile.shlib @@ -359,7 +359,7 @@ ifeq ($(PORTNAME), cygwin) # Cygwin case $(shlib): $(OBJS) | $(SHLIB_PREREQS) - $(CC) $(CFLAGS) -shared -o $@ -Wl,--out-implib=$(stlib) $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(