Re: Loading 500m json files to database
On 2020-03-23 17:18:45 -0700, pinker wrote: > Christopher Browne-3 wrote > > Well, you're paying for a lot of overhead in that, as you're > > establishing a psql command, connecting to a database, spawning a > > backend process, starting a transactions, committing a transaction, > > closing the backend process, disconnecting from the database, and > > cleaning up after the launching of the psql command. And you're > > doing that 500 million times. > > > > The one thing I left off that was the loading of a single tuple into > > json_parts. [...] > > Reducing that overhead is the single most important thing you can do. > > Yes, I was thinking about that but no idea now how to do it right now. Do you know any programming language (Python, Perl, ...)? You'll probably get a huge gain from writing a script that just opens the connection once and then inserts each file. Copy usually is even faster by a fair amount, but since you have to read the data for each row from a different file (and - if I understood you correctly, a remote one at that), the additional speedup is probably not that great in this case. Splitting the work int batches and executing several batches in parallel probably helps. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
avoid WAL for refresh of materialized view
Hi all We have PostgreSql 9.6 running and started to work with materialized views. To refresh the materialized views, we set up a cron job that refreshes the materialized views on a fix schedule. Since our materialized views cache quite some data, we noticed a considerable increase in WAL files. It seems, that every refresh of a materialized view is logged in the WAL. We tried to figure out how we can alter the materialized view to set it to "UNLOGGED" but this does not seem possible. --> "alter materialized view xyz set UNLOGGED;" leads to "ERROR: "xyz" is not a table, SQL state: 42809" Is there another way to avoid logging a refresh of a materialized view in the WAL? Kind regards, Alain Remund
Re: avoid WAL for refresh of materialized view
Le mar. 24 mars 2020 à 17:00, Remund Alain a écrit : > Hi all > > > > We have PostgreSql 9.6 running and started to work with materialized > views. To refresh the materialized views, we set up a cron job that > refreshes the materialized views on a fix schedule. > > Since our materialized views cache quite some data, we noticed a > considerable increase in WAL files. It seems, that every refresh of a > materialized view is logged in the WAL. > > > > We tried to figure out how we can alter the materialized view to set it to > "UNLOGGED" but this does not seem possible. > > --> "alter materialized view xyz set UNLOGGED;" leads to "ERROR: "xyz" is > not a table, SQL state: 42809" > > > > Is there another way to avoid logging a refresh of a materialized view in > the WAL? > > > As you say, there is no unlogged materialized view. So, no, it will always log to the WAL during refresh. > -- Guillaume.
Re: avoid WAL for refresh of materialized view
Remund Alain writes: > Hi all > > > > We have PostgreSql 9.6 running and started to work with materialized > views. To refresh the materialized views, we set up a cron job that > refreshes the materialized views on a fix schedule. > > Since our materialized views cache quite some data, we noticed a > considerable increase in WAL files. It seems, that every refresh of a > materialized view is logged in the WAL. > > > > We tried to figure out how we can alter the materialized view to set > it to "UNLOGGED" but this does not seem possible. > > --> "alter materialized view xyz set UNLOGGED;" leads to "ERROR: > "xyz" is not a table, SQL state: 42809" > > > > Is there another way to avoid logging a refresh of a materialized > view in the WAL? > The workaround for this is to not use mat view at all but instead materialize the output into an unlogged table that you trunc before every refresh. HTH > > > Kind regards, > > Alain Remund > > > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net
RE: Loading 500m json files to database
From: pinker > it's a cloud and no plpythonu extension avaiable unfortunately You're misunderstanding him. See David's post for an example, but the point was that you can control all of this from an *external* Perl, Python, Bash, whatever program on the command line at the shell. In pseudo-code, probably fed by a "find" command piping filenames to it: while more files do { read in a file name & add to list } while (list.length < 1000); process entire list with \copy commands to 1 psql command I've left all kinds of checks out of that, but that's the basic thing that you need, implement in whatever scripting language you're comfortable with. HTH, Kevin This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.
Re: Loading 500m json files to database
On 3/24/20 11:29 AM, Kevin Brannen wrote: From: pinker it's a cloud and no plpythonu extension avaiable unfortunately You're misunderstanding him. See David's post for an example, but the point was that you can control all of this from an *external* Perl, Python, Bash, whatever program on the command line at the shell. In pseudo-code, probably fed by a "find" command piping filenames to it: while more files do { read in a file name & add to list } while (list.length < 1000); process entire list with \copy commands to 1 psql command I've left all kinds of checks out of that, but that's the basic thing that you need, implement in whatever scripting language you're comfortable with. HTH, Kevin Sorry if I missed it, but have we seen the size range of these json files? This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.
RE: Loading 500m json files to database
From: Rob Sargent > Sorry if I missed it, but have we seen the size range of these json files? Not that I've seen, but that's an implementation detail for whoever is doing the work. As someone else pointed out, pick the value as needed, whether that's 10, 100, 1000, or whatever. But issuing 1000 lines of "\copy file" sort of commands at a time isn't a big deal by itself. OTOH, you have a good point that 1000 could be too much work for the server to handle, especially if the "-1" flag is also used. As always: test, test, test... 😊 Kevin This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.
Re: Loading 500m json files to database
On 3/24/20 11:53 AM, Kevin Brannen wrote: From: Rob Sargent Sorry if I missed it, but have we seen the size range of these json files? Not that I've seen, but that's an implementation detail for whoever is doing the work. As someone else pointed out, pick the value as needed, whether that's 10, 100, 1000, or whatever. But issuing 1000 lines of "\copy file" sort of commands at a time isn't a big deal by itself. OTOH, you have a good point that 1000 could be too much work for the server to handle, especially if the "-1" flag is also used. As always: test, test, test... 😊 My fear is this: the community helps OP load 500M "records" in a reasonable timeframe then OP's organization complains postgres is slow once they start using it... because the transition from file system to rdbms was ill conceived. Are the json files large documents or arbitrary content or are they small data structures of recognizable types. And so on
Re: PG12 autovac issues
Hi, On 2020-03-24 15:12:38 +0900, Michael Paquier wrote: > > Well, there's no logging of autovacuum launchers that don't do anything > > due to the "skipping redundant" logic, with normal log level. If somehow > > the horizon logic of autovacuum workers gets out of whack with what > > vacuumlazy.c does, then you'd not get any vacuums. But a usage level > > triggered analyze could still happen on such a table, I think. > > What surprised me the most is that the same table happened to be > analyzed again and again after the launcher began its blackout. Well, if there's an issue with the "redundant" logic, that would be a not too surprising outcome. It's quite plausible that one or two tables in the database would get enough changes to occasionally need to be analyzed. If the workload is steady, that could e.g. work out to every ~17 minutes. All tables that autovacuum things are not wraparound threatened will be skipped, but ones that are will get both vacuum and analyze queued. The redundant logic could then entirely skip all vacuums - but there's no equivalent for analyze. > > While looking at this issue I found a few problems, btw. That seems more > > like a -hackers discussion, so I started: > > https://postgr.es/m/20200323235036.6pje6usrjjx22zv3%40alap3.anarazel.de > > Yes, let's discuss there. Cool. Would also be good if you could expand on the thread introducing the "redundant" logic. Greetings, Andres Freund
PG 12: Partitioning across a FDW?
Is it even possible to use PG partitioning across a Foreign Server?
PostgreSQL 13: native JavaScript Procedural Language support ?
Hi and hope this finds you well, PostgreSQL has state of the art JSON(B) data type and query support, and basic updates as well, eg jsonb_set(). How about enabling users to do more advanced processing of JSON data inside the database itself, using JavaScript? There's the PLV8 extension https://github.com/plv8/plv8 but it's not available for many major db providers (eg Heroku), making it hard to adopt it. Having it supported in the standard PostgreSQL distribution, besides the existing 4 languages ( https://www.postgresql.org/docs/12/xplang.html), would be awesome! Would you kindly consider this? Thanks & stay safe, Marius