Re: How to avoid trailing zero (after decimal point) for numeric type column
On Wed, Feb 28, 2018 at 1:33 PM, pkashimalla wrote: ... > I did insertion from java program with below code snippet > > Double object = 10.0; > String inserRecordQuery_NEWWAY11 = "INSERT INTO BLOB_TEST_TABLE(id)VALUES > (?)"; > selectPrepareStmt.setObject(1, object,Types.NUMERIC); > int count = selectPrepareStmt.executeUpdate(); That's incomplete. But why are you using Double to bind a NUMERIC? IIRC BigDecimal seems to me the natural java conunterpart to it. > > it inserted like this. > / > select id from blob_test_table; > > id > numeric > - > 10.0/ > > > In this case, when a decimal point is equal to 0 then, I don't want to see > the precision and the value in the column should just 10 And the driver is supposed to know this how? > And If I execute code, > > Double object = 10.5801 > String inserRecordQuery_NEWWAY11 = "INSERT INTO BLOB_TEST_TABLE(id)VALUES > (?)"; > selectPrepareStmt.setObject(1, object,Types.NUMERIC); > int count = selectPrepareStmt.executeUpdate(); > > Now ,the value in the column should be 10.5801 as the precision is greater > than ZERO Doubles do not have precision ( well, they have, double precision ). Also note doubles are binary, and that 10.5801 is not bounded in binary. This is perl but you can see the thing: $ perl -w -e 'printf q(%1$f %1$.20f %1$a %1$.20a)."\n", 10.5801' 10.580100 10.580099983800 0x1.52902de00d1b7p+3 0x1.52902de00d1b7000p+3 Note how the decimal varies with the requested precision, as 10.5801 cannot be represented exactly in binary, and how the hex representation does not vary ( because hexadecimal has an exact binary representation ). > Because of this, the migrated data (from Oracle) is without PRECISION ZERO > and the new data which is being inserted is with PRECISION ZERO. I think you are hitting an implementation-defined behaviour of the driver. To convert a double to a numeric you must select a precision, and it seems oracle and postgres do it differently. I would try BigDecimal which has less uncertainity. > Oracle's NUMBER column type is handling it as I expected. Oracle behaves as you are used to, and so is what you expect. > I migrate Oracle's NUMBER column as just NUMERIC column in PostgreSQL Probably using a BigDecimal would do the trick, as you can convert sql numeric => java bigdecimal => sql numeric without losing info, but not with Doubles. Bear in mind numeric is decimal, double is binary, and one thing such simpel looking as 0.1 does not have an exact binary representation: $ perl -w -e 'printf q(%1$f %1$.20f %1$a %1$.20a)."\n", 0.1' 0.10 0.1555 0x1.ap-4 0x1.a000p-4 See the trailing 555 in the second? or the continous 1.(9) fraction rounded to a in excess on the least significant place in hex? In numeric you can hace 1.0, 1.0, 1.1 and 0.99, but in doubles they all map to 1.0 Francisco Olarte.
Re: Enforce primary key on every table during dev?
Melvin: On Thu, Mar 1, 2018 at 1:47 AM, Melvin Davidson wrote: > But this begs the question, > why are "developers" allowed to design database tables? That should be the > job of the DBA! That's the DBA wearing her developer hat. ( I agree with the spirit ) Francisco Olarte.
Re: Enforce primary key on every table during dev?
On Thu, Mar 1, 2018 at 9:20 AM, Alban Hertroys wrote: > Not to mention that not all types of tables necessarily have suitable > candidates for a primary key. They do if they are in 1NF. ( no dupes alllowed ) > An example of such tables is a monetary transaction table that contains > records for deposits and withdrawals to accounts. It will have lots of > foreign key references to other tables, but rows containing the same values > are probably not duplicates. That's a bad example. They would normally have a transaction id, or a timestamp, or a sequence counter. PKs can expand all non-nullable columns. You could try to come with a real example, but all the times I've found these in one of my dessigns is because I didn't correctly model the "real world". > Adding a surrogate key to such a table just adds overhead, although that > could be useful in case specific rows need updating or deleting without also > modifying the other rows with that same data - normally, only insertions and > selections happen on such tables though, and updates or deletes are > absolutely forbidden - corrections happen by inserting rows with an opposite > transaction. And normally you would need to pinpoint an individual transaction for selection, hard to do if you do not have a pk. Francisco Olarte.
Re: Enforce primary key on every table during dev?
On Thu, Mar 1, 2018 at 1:07 PM, Rakesh Kumar wrote: ... > I routinely add surrogate keys like serial col to a table already having a > nice candidate keys > to make it easy to join tables. SQL starts looking ungainly when you have a > 3 col primary > key and need to join it with child tables. It does, but many times useful, let me explain: table currencies ( code text, description text), primary key code ( i.e. "USD", "US Dollars" ) table sellers ( currency text, id number, .), primary key (currency, id), foreign key currency references currencies table buyers ( currency text, id number, .), primary key (currency, id) foreign key currency references currencies table transactions ( currency text, seller_id number, buyer_id number, trans_id number ) primery key trans_id, foreign key currency references currencies, foreign key (currency, seller_id ) references sellers, foreign key (currency, buyer_id ) references buyers This is a bit unwieldy, but it expreses my example constraint, buyers can only buy from a seller with the same currency, there is no way to insert a cross-currency transaction. Of course, 3 femtoseconds after deployment the PHB will decide you can do cross-currency sales. Francisco Olarte.
Re: Enforce primary key on every table during dev?
On Thu, Mar 1, 2018 at 5:22 PM, Steven Lembark wrote: > If you can say that "rows containing the same values are not > duplicates" Not a native speaker, but "Rows having the same values" seems to me the definition of duplicate ( ;-), J.K. ) > then you have a database that cannot be queried, audited, > or managed effectively. The problem is that you cannot identify the > rows, and thus cannot select related ones, update them (e.g., to > expire outdated records), or validate the content against any external > values (e.g., audit POS tapes using the database). Good point. All the times I've found myself with complete duplicates allowed I've alwasy found the correct model is no duplicates + count field ( with possible splits as you pointed below ). I would not have a "marbles" table with (red, red, blue, white, red, white), I would switch it to red=3, blue=1, white=2. Francisco Olarte.
Re: Command to execute SQL string
Hello: On Thu, Mar 8, 2018 at 1:26 PM, Łukasz Jarych wrote: > i have a table when i have string in each row, > for example "DELETE FROM t_test WHERE ID = 3". > > I want to execute for each row this command. > It is possible? I supose you mean execute the command contained in each row. Anyway, it is easy with a procedure, look for example at https://www.postgresql.org/docs/10/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN just do a loop on your command table, follow the examples and you should be done. Francisco Olarte. PS: > Jacek Is this the same as Łukasz or Jarych ? F.O.
Re: Trigger to create string to inverse SQL statement
Hello: On Thu, Mar 8, 2018 at 1:32 PM, Łukasz Jarych wrote: > > > i have a trigger which is added log history: > > > > I'll trust it is, but I cannot really see any thing in there, it may be better if in the future you used something like psql and pasted the data in a text format, which people with no-so-good eyesight like me can zoom in. > It is possible to create additional column here with string with inversed > SQL statement? > > So in this case: "Delete FROM t_trig WHERE ID=1". > > And what i want is to have possibility to loop through table and execute > inversed sql statement for each row. > It should be. For inserts it is not that difficult, for updates YMMV, it is difficult to reverse a command exactly, but you get new and old values so you could write it. For deletes just reinsert. But I doubt there is a function which does it for you, you'll have to code something, and I suspect it may be easier to just capture op, old and new in your triggers and use an external program to build the inverted queries. Francisco Olarte.
Re: Trigger to create string to inverse SQL statement
Hello: On Thu, Mar 8, 2018 at 1:51 PM, Łukasz Jarych wrote: > what do you mean use external program to build inverted queries. Do you have > any examples? Please, do not top quote, or the thread will get difficult to follow fast. That being said. Capturing a DML in a trigger is relatively easy. On recent postgres I've seen just using json to capture the whole new and old rows, which looks promissing. But then, building a 'rollback' DML is difficult to do in sql or pl-pgsql, you would normally want a more normal programming language. i.e., I would normally turn to perl for this, having used it since the mid 90s. Then, you have pl-perl, but this is difficult to debug/manage. So what I would normally do is to just capture the changes in a trigger and then have a program which queries the log table, builds the anti-query and executes it ( no point in keeping it, since once you execute it there is nothing to undo ). The beauty of this is you can take a sample from your log table and easily test the program just implementing a debug flag which prints the queries instead of executing ( and does not touch the log table, which I assume a real undoer will need to fro record-keepint ). This is assuming the log is used as an "undo log", which is what I would assume from the very scarce information I have. And this kind of programs normally are seldom used, so postponing the query building to a later time and logging minimal info fast is normally better ( is like backups, you normally make backups faster, then optimize what you can of restores, as ideally they would never be used, or transactions, you normally optimize for commits first, then rollbacks ). Francisco Olarte.
Re: Trigger to create string to inverse SQL statement
Jacek: On Thu, Mar 8, 2018 at 1:53 PM, Łukasz Jarych wrote: > maybe somebody wrote SP to invert update statement? Maybe, but if you are going to ask for it you better define the problem a little. Francisco Olarte.
Re: Doubts about replication..
On Thu, Apr 19, 2018 at 8:48 PM, Edmundo Robles wrote: > Yes, you are right, the replication is not a backup ;) , actually i > backup database daily at 3:00 am but if database crash, the amount of > data is big! that is the reason i want to replicate to reduce the data > loss. By the way a few days ago a job partner did a delete with no where. Replication will happily replicate a delete with no where. Backup will at least have the day starting point. For this kind of things you need to do log-archiving + base backups. With this you can use PITR to roll forward from the last full backup to just before the delete. Francisco Olarte.
Re: Domain based on TIMEZONE WITH TIME ZONE
On Thu, May 10, 2018 at 10:03 AM, Ben Hood wrote: ... > Am I missing the point of how Postgres stores TIMEZONE WITH TIME ZONE > internally? After reading in the follow up TIMEZONE is a typo for TIMESTAMP, yes, you are. Aproximately, postgres stores ( virtually ) a point in the time line, for both with and without ts types, same format. Something like a real number. The difference is mainly for transforming from/to text ( bear in mind when you put a constant in a query you are trasnforming from text ). In the with time zone case it formats/expects it as a time string in the session configured time zone, in the without case it treats it ( aproximately ) as if it was in utc ( and then discards the "+00" after formating ). Maybe I'm confussing you more, its not too easy to explain. The point is TIMEZONE is not stored in either of them. Francisco Olarte.
Re: Domain based on TIMEZONE WITH TIME ZONE
On Thu, May 10, 2018 at 12:19 PM, Ben Hood wrote: >> On 10 May 2018, at 09:59, Francisco Olarte wrote: >> Maybe I'm confussing you more, its not too easy to explain. >> The point is TIMEZONE is not stored in either of them. > > Many thanks for clarification, very much appreciated. > > Your point is consistent with the manual: “All timezone-aware dates and times > are stored internally in UTC” Tom lane, which is much more knowledgeable than me, points they are stored in UTC. Maybe, but, IIRC, they are stored in 8 bytes, I do not know exactly how, last time I read it that could be either a float8 number or a int8 ( maybe to int4 numbers ). I think they use the same type of storage as the unix timestamps ( unix timestamp is normally a signed number of integer seconds from an arbitrary start point, 19700101T00 UTC, and they designated an instant in time. 7200 designates an instant, I can format it for the reader in many ways, 19700101T02+, "tres de la mañana del primero de enero de mil novecientos ochenta, hora de Madrid", "1970-01-01 03:00:00 +01" ). But it is not UTC, it is the designation of an instant in time. Timestamps do not have time zones. BUT the postgres data types timestamptz is a way to store a timestamp. So is timestamp. And so is float8. The difference is how it is converted and interacts with other types. > Rather it should read “is enforcing the submission of UTC denominated > timestamps in the server by using a domain a sensible way to enforce a policy > that will blow up when apps attempt to use non-UTC timestamps (when they > shouldn’t be)”. > So the question is not how does the timestamp get stored, rather, is it an > anti-pattern to use Postgres as a linter for apps that forget to use UTC > exclusively? That poses a problem. You must check the input representation. I mean..., you were using extract on the value, but this happens ( YMMV ): cdrs=# show timezone; TimeZone --- Europe/Madrid (1 row) cdrs=# select extract(timezone from current_timestamp); date_part --- 7200 (1 row) cdrs=# set timezone TO 'UTC'; SET cdrs=# select extract(timezone from current_timestamp); date_part --- 0 (1 row) cdrs=# select extract(timezone from '2018-01-01 07:00:00+0100'::timestamptz), extract(timezone from '2018-01-01 00:00:00-0300'::timestamptz); date_part | date_part ---+--- 0 | 0 (1 row) cdrs=# set timezone to 'Europe/Madrid'; SET cdrs=# select extract(timezone from '2018-01-01 07:00:00+0100'::timestamptz), extract(timezone from '2018-01-01 00:00:00-0300'::timestamptz); date_part | date_part ---+--- 3600 | 3600 (1 row) cdrs=# select extract(timezone from '2018-01-01 07:00:00+'::timestamptz), extract(timezone from '2018-07-01 00:00:00+'::timestamptz); date_part | date_part ---+--- 3600 | 7200 (1 row) As you can see you are getting the seconds offset for the client, may be in a domain for the server, timezone at the instant in time designated by the value. Not what you originally typed to locate the instant in time. For what you want to do I think you'll have to parse the text value, maybe by definig a view with a text columns and using some rule/trigger magic for insert / updates. It seems you want to force your users to use an explcit time zone. This may be better handled above the database. Francisco Olarte.
Re: Enhancement to psql command, feedback.
On Thu, May 10, 2018 at 4:46 PM, Steven Lembark wrote: > The whole point of "#!" working in shell is that the two-bytes > (a) mark the file as executable by a specific shell command and > (b) are a shell comment. Shebang is an unix-ism. It is not part of the shell. The shell just execs whatever you tell it to. In a simple way, in unix when a file is marked executable the loader is called to load and execute it. The loader first looks at the start of the file to try to determine what it is ( it does not use the 'extension' for this as MSDOS and friends ). If it is one of the several formats binary formats, like elf or a.out, it understands it loads and executes. If it is the magic sequence "#!" it tries to search for another executable ( NOT A SHELL COMMAND, this works even if you zap all the shells in your computer ) and recursively invokes it ( this is done by execve(2) in my linux machine, and described in its manual page ). No shell involved: folarte:~/tmp$ type cat cat is /bin/cat folarte:~/tmp$ echo -e '#!/bin/cat\nHello there\nGood bye then' > xx folarte:~/tmp$ chmod +x xx folarte:~/tmp$ ./xx #!/bin/cat Hello there Good bye then folarte:~/tmp$ perl -e 'exec("./xx")' #!/bin/cat Hello there Good bye then You can try other ways to call execv*, nothing magical in the perl way, just avoiding the shell ( which has an exec builtin command, with different behaviour from typing a command name, which does fork, wait in the parent, execv in the child ). Francisco Olarte.
Re: Domain based on TIMEZONE WITH TIME ZONE
On Thu, May 10, 2018 at 10:37 PM, Ben Hood wrote: > On 10 May 2018, at 16:33, Francisco Olarte wrote: > > For what you want to do I think you'll have to parse the text value, > maybe by definig a view with a text columns and using some > rule/trigger magic for insert / updates. > > > Sorry for being unclear - the solution I have in production appears to work > with > > CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK > (EXTRACT(TIMEZONE FROM VALUE) = 0); > > This raises an exception when an app doesn’t use UTC. > This is what I wanted to achieve and the approach so far seems to work. Can you post an example ( correct, error inputs, preferrably done with psql ) ? At this point I do not know if it is working or if it seems to working for you. I see some cases in which it would seem to work, but I would like to know what you mean by "sending non utc timestamps to the database". I mean something like the one attached below, which to me seems to prove it does not work, it just checks that the session timezone is utc, note the 1st transaction tests an explicit timestamp value, without conversions, failing in non-utc, the second one checks an explicit + zone not working in non UTC and the third one checks anything goes , with or without timestamp, when the time zone is utc. cdrs=# show timezone; TimeZone --- Europe/Madrid (1 row) cdrs=# begin; BEGIN cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); CREATE DOMAIN cdrs=# create table t(ts utc_timestamp); CREATE TABLE cdrs=# insert into t values ( current_timestamp ); ERROR: value for domain utc_timestamp violates check constraint "utc_timestamp_check" cdrs=# rollback; ROLLBACK cdrs=# select current_timestamp; now --- 2018-05-12 12:58:03.616949+02 (1 row) cdrs=# begin; BEGIN cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); CREATE DOMAIN cdrs=# create table t(ts utc_timestamp); CREATE TABLE cdrs=# insert into t values ('2015-05-11 10:20:30+'); ERROR: value for domain utc_timestamp violates check constraint "utc_timestamp_check" cdrs=# rollback; ROLLBACK cdrs=# set timezone TO 'UTC'; SET cdrs=# select current_timestamp; now --- 2018-05-12 10:59:47.946338+00 (1 row) cdrs=# begin; BEGIN cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); CREATE DOMAIN cdrs=# create table t(ts utc_timestamp); CREATE TABLE cdrs=# insert into t values ('2015-05-11 10:20:30+'); INSERT 0 1 cdrs=# insert into t values ('2015-05-11 10:20:30+0200'); INSERT 0 1 cdrs=# insert into t values ('2015-05-11 10:20:30'); INSERT 0 1 cdrs=# insert into t values (current_timestamp); INSERT 0 1 cdrs=# select * from t; ts --- 2015-05-11 10:20:30+00 2015-05-11 08:20:30+00 2015-05-11 10:20:30+00 2018-05-12 10:59:54.289827+00 (4 rows) cdrs=# rollback; ROLLBACK Francisco Olarte.
Re: Domain based on TIMEZONE WITH TIME ZONE
Adrian: On Sat, May 12, 2018 at 6:42 PM, Adrian Klaver wrote: > On 05/12/2018 04:04 AM, Francisco Olarte wrote: ... >> Can you post an example ( correct, error inputs, preferrably done with >> psql ) ? >> >> At this point I do not know if it is working or if it seems to working >> for you. I see some cases in which it would seem to work, but I would >> like to know what you mean by "sending non utc timestamps to the >> database". > > In the post below Ben explains that he realizes it is not working the way he > thought: Yep, saw it after posting this. > From what I can gather it comes down where UTC is being enforced. Ben was > hoping that the domain would force timestamps to be only submitted with a > UTC offset with the idea that this would force storage in the field as UTC > only. The realization is that if you have a field of type timestamp with > time zone the value is going to be stored as UTC regardless of the offset > that is presented(even in cases where there is no offset presented, when an > implicit one is assigned). That means there really is no need for the > domain. I think this problem, and similar ones, come from the description in the doc, and in a lot of similar places, as "the timestamp with time zone value is stored as utc", or "timestamps with time zones are normalized to utc". If you look at the docs you realize they are all stored as numbers, and ocupy the same space as timestamp W/O TZ. The thing is you do not need UTC for anything to represent timestamps. Using the (simpler) unix timestamp as an example. If UTC ( and UAT and GMT ) did not exist I could defne it as, say, "number of seconds elapsed since eiight in the morning January the Second in Madrid/Spain local clocks plus one hundred and eight thousands", no UTC needed at all, they are just numbers, they do not have timezones. The only difference is timestamp uses gmtime/timegm for text conversion and timestamptz uses localtime/timelocal. In fact I've found the following in the sources: * Timestamps, as well as the h/m/s fields of intervals, are stored as * int64 values with units of microseconds. (Once upon a time they were * double values with units of seconds.) And from some reading it seems to be like the unix timestamp, but in 64 bits microseconds and referencing extended Julian ( or Gregorian ) calendar, not too sure about it. I've read the definition somewhere, but thinking of them as "just a number" has avoided me a lot of problems. Francisco Olarte.
Re: Domain based on TIMEZONE WITH TIME ZONE
for a correct manipulation, the thing is it is easier to wrtie correct code for tstz than for ts, but they are equally acurate ). In fact, if I store "2010-01-01 12:23:45"::timestamp I know I'm getting "2010-01-01 12:23:45" in the future, independently of where I read it, so normally I use it when I want to preserver "string" values but have some special "numeric" semantics. When I store instants in time I need to make sure my string input values accurately designate them, and I normally use tstz because its default input conversions process those correct formats I allow ( and screen before sending them for conversion ). Tell me one thing you can do with a timestamp with timezone that you cannot do with a int8 number or a with a timestamp without time zone. Given there is an isomorphism between them ( and, in postgres, the text version is the same for the inverse, IIRC "one=the_other at timezone UTC" ) and they have the same representation, I doubt it. They are just convenient ways to manipulate text values and apply functions based on type, but they do not have a time zone. You can not get a time zone from a timestamp with time zone value. What leads to countless confussion is it seems that a tstz-value should be composed of a ts-value plus a tz-value, but they are not. Is just a problem with naming. Francisco Olarte.
Re: case statement within insert
On Fri, May 25, 2018 at 11:04 AM, tango ward wrote: > CASE > WHEN code like '%%PE%%' or code like '%%NSTP%%' > THEN True > ELSE False I cannot advise you on the %% stuff, which I do not totally understand, but CASE WHEN condition THEN true ELSE false END when contition is ( as it should in a CASE ) a boolean expression is a classic antipattern , just use 'condition' or '(condition)' Francisco Olarte.
Re: Doing a \set through perl DBI ?
David: On Wed, Jun 6, 2018 at 5:36 PM, David Gauthier wrote: > Hi: > > Is there a way to do the equivalent of a "\set foo 1" through perl dbi ? > I tried... > $dbh->do("\\set foo 1"); > and got a syntax error > > Of course, I'd also have to be able to access the value of foo once its set. > I'm guessing the usual way ??? (select :foo) You are looking at it wrong. Psql vars are similar to perl vars, and it is psql ( the program in your machine ) who interpolates them before sending the query to the server. The equivalent perl code to \set foo 1 select :foo; Would be my $foo = '1'; $dbh->do("select $foo") Which is not terribly useful, in perl. To use the current ( client ) user in a query ( like your next message states ), you just need $dbh->do("select * from some_table where user_id_column=$<") ( Of course, you could "use English;" to get $UID, or use placeholders to avoid injections, that's just an initial pointer ) ( or "perldoc perlvar" if you do not know what $< is, and that is enough perl stuff for a postres list ) Francisco Olarte
Re: Print pg_lsn as a number?
On Tue, Jun 12, 2018 at 6:31 PM, Scott Stroupe wrote: > According to the documentation[1], pg_lsn is a 64-bit integer that's printed > as two hex numbers separated by a slash, e.g. 68/1225BB70. Is there a way to > get the 64-bit integer in a common numeric representation instead of the > peculiar hex-slash-hex representation? ... > [1] https://www.postgresql.org/docs/current/static/datatype-pg-lsn.html Quoting your own [1] ref :"Two LSNs can be subtracted using the - operator; the result is the number of bytes separating those write-ahead log locations." You can try substraction from an arbitrary origin ( pg_lsn('0/0') seems nice, as arbitrary as Greenwich meridian ), and it worked for me in select pg_lsn('68/1225BB70') , pg_lsn('0/0') , pg_lsn('68/1225BB70') - pg_lsn('0/0') , to_hex((pg_lsn('68/1225BB70') - pg_lsn('0/0'))::bigint) ; ( http://sqlfiddle.com/#!17/9eecb/16272 ) Reconstructing via simple addition does not work, but you can do division, modulus, double to_hex, join with '/', cast to pg_lsn if you like. Francisco Olarte.
Re: Is there a way to be notified on the CREATE TABLE execution?
On Wed, Jun 20, 2018 at 5:20 PM, Igor Korot wrote: >> https://www.postgresql.org/docs/10/static/sql-createeventtrigger.html >> Your scenario suggests you may wish to avail yourself of the Listen and >> Notify commands as well. > > I did look at the Listen/Notify. > Unfortunately the listening is done on the channel versus listening > for the specific event. Channels are cheap. You just listen on "whatever" and in the event trigger you notify "whatever", payload is optional. The event trigger is the one which takes care of filtering the event and notifying selectively. You can use a channel per event. Francisco Olarte.
Re: Is there a way to be notified on the CREATE TABLE execution?
Igor: On Wed, Jun 20, 2018 at 7:07 PM, Igor Korot wrote: > Just one more question: > This page https://www.postgresql.org/docs/current/static/xfunc-c.html#DFUNC > does not reference > Windows/MSVC/MinGW as a way to compile the code. Sorry, I don't do windows. You do not need C extension functions anyway, unless your usage pattern is truly bizarre a triger for ddl could be written in any pl. Francisco Olarte.
Re: Is there a way to be notified on the CREATE TABLE execution?
Melvin: Maybe old eyes, but ... On Wed, Jun 20, 2018 at 7:03 PM, Melvin Davidson wrote: >>Is there a way to be notified on the CREATE TABLE execution? > Here is sample code that will notify for a CREATE or DROP table: Doesn't this trigger just log the events? I think it's missing something like "Notify create_or_drop" somewhere after the logging ( so a listener knows to look at the log table ). ( Normally I would use something like this, listen to a channel, do a first scan from the saved last tbl_cl_key, and then do another scan from the last each time listen fires, it seems the difficult work is done but it misses the notify to signal listeners and avoid having to rescan on a timer or a similar thing ). ( Maybe I missed the line, it would not be the first time, that's why I ask ). Francisco Olarte.
Re: Convert Existing Table to a Partition Table in PG10
On Sun, Jul 1, 2018 at 12:15 AM, Clifford Snow wrote: . > I also leaned that my range partition value I used on a timestamp needed to > have fractional seconds. I used a range of 2017-01-01 00:00:00 to > 2017-23:59:59 which failed when I attempted to add a record that had a > timestamp of 2017-23:59:59. Adding a fractional second to the range solved > the problem. Your problem probably comes from using closed intervals. Timestamps are like real numbers, partitions on real numbers are best done using half closed interval. You can conver the real line using non overlapping half open intervals, but you cannot do it with open or closed ones ( non ov. ). Assuming you are yearly range partitions ( 2017-23:59:59 should be 2017-12-31 23:59:59 ), to use closed interval you have to rely on "real" second numbers being stored in the computer with a finite precision ( so you can, say, add up to the microseconds, and pray it does not change to picoseconds in a future release ). If you use half open ( 2017-01-01 00:00:00 <= ts < 2018.01.01 00:00:00 ) the problem is much easier. You can even drop the HMS ( 2017-01-01 <= ts < 2018-01-01 ) and it will work, even if the systems peeks a different HMS value for each year, as you use the same value for an interval start as for the previous end. And, if timestamp supported defaulting the M and D like it does with HMS ( which it does not ) you could even drop them. And I think postgres does not use leap seconds, but If it did '2016-12-31 23:59:60.9" does not need to be remembered in the half-open style. This is a general problem, not a postgres or timestamp related one. Anything which has decimals is generally better partitioned with half-open intervals. With integer-like things ( like dates, which are countable ) it does not matter that much, I use half-open for easier upgrading if I need to, but you can translate open-closed-half. Francisco Olarte.
Re: Split daterange into sub periods
On Thu, Jul 5, 2018 at 4:16 PM, hmidi slim wrote: > In fact I'm trying to split a period in sub periods. Following this example > : > If I have a period =[2018-01-01, 2018-01-31] and two other periods > [2018-01-04, 2018-01-06] and [2018-01-08, 2018-01-08]. > If I split the base period '[2018-01-01, 2018-01-31]' by the other two > periods '[2018-01-04, 2018-01-06]' and '[2018-01-08, 2018-01-08]' I will got > such a result: > [2018-01-01, 2018-01-03] > [2018-01-07, 2018-01-07] > [2018-01-09, 2018-01-31]. Your example maybe fine to illustrate a definition, but it is totally inadequate to describe what you want. You'll, at least, need to refine your question. Something like "I have a big period and a set of small periods, and want to calculate the set of subperiods of the big one not covered by the small ones", and also some conditions ( i.e., can the small periods interesect? are they completely covered by the big one? can they include the endpoints of the big one? Can they be empty? Does the result need to be minimal ( in the number of result periods ) ? ). Even then, this is not trivial and, as said above, you will probably need a function for it. Francisco Olarte.
Re: Code of Conduct
On Wed, Sep 19, 2018 at 5:27 PM, Fred Pratt wrote: > Keep pg open and free. This smells of PC police. This community can > police itself No comment on this, just kept for context. > Sent from my mobile device. Please pardon my brevity and typos. I am not > responsible for changes made by this device’s autocorrect feature. I will happily pardon brevity ( although I would not call a ten line sig plus a huge bottom quote "breve", and AFAIK it means the same in english as in spanish ) and/or typos, but the "I am not responsible" feels nearly insulting. Did someone force you to use "this device" ( which you seem to perceive as inadequate for a nice answer ) to reply, or did you choose to do it ? ( real, not rethoric question, but do not answer if you feel its inadequate ) As an aside, is this kind of afirmations and/or my response to it a violation of the current CoC ? Francisco Olarte.
Re: How to maintain the csv log files in pg_log directory only for past 30 days
On Fri, Sep 28, 2018 at 7:03 AM, Raghavendra Rao J S V wrote: > Log file will be generated in csv format at pg_log directory in our > PostgreSQL. Every day we are getting one log file. We would like to maintain > only max 30 days. Which setting need to modify by us in “postgresql.conf” in > order to recycle the log files after 30 days. I have similar problems in a lot of things and normally use "find -name -mtime + -delete" in the daemons cron ( gnu find, on linux, I assume other os have a similar command ). For postgres-only solutions you've been given some advice previously, and like those I only know ways to do it daily/weekly/monthly/yearly. Francisco Olarte.
Re: pg_sleep() inside plpgsql block - pro & cons
Hi: On Tue, Oct 2, 2018 at 12:10 PM, pinker wrote: > There is second time I see that somebody uses pg_sleep function inside > plpgsql block. This case is quite similar to the last one - it's some kind > of wait for data to be loaded. After pg_sleep there is a check if some > condition is true, if not procedure goes to sleep again. As a result an > average duration of this function is 1,5h... > I'm trying to gather pros and cons regarding using pg_sleep this way. What's > coming to my mind are only 2 cons: > * clog contention > * long running open transactions (which is quite good described in here: > https://www.simononsoftware.com/are-long-running-transactions-bad/) > So maybe you'll add some more to the list? With so few details, nothing much can be said. Cons, if the proc is something like do-stuff wait for data to appear, do more stuff, I think the function will also need read-commited or something similar to see the data appear, and fail under serializable. Pattern certainly smells funny. I do some similar things, but I sleep outside of the database, is there a reason this can not be done? Francisco Olarte.
Re: pg_sleep() inside plpgsql block - pro & cons
On Tue, Oct 2, 2018 at 3:44 PM, pinker wrote: ... > Yes, I do try to convince them to do it outside the db, that's the reason > I'm looking for some support here :) I'm not sure those 2 reasons are enough > to convince them, wanted to be prepared... Well, not knowing the logic I cannot be sure, but you can try the "it depends on isolation level" I pointed to. > You know it's always time to do refactoring and in this case it's a lot of > time because of the amount and complexity of the logic in db. Not having seen it, I trust your word. But I doubt the dessign can be correct, it seems a case of "all I know is pl/xx, so I'll do everything I can in it. I got one product with a similar problem in Oracle. But unless the logic is really perverse, it seems to me you could do two procs, the normal one ( without the pg_sleep in a loop ), and a wrapping one testing for the "data appears" condition and calling the first, and call the second in a loop. But again, details will be needed and trusting your words I do not want them on the list ;-> . Francisco Olarte.
Re: Why the index is not used ?
ROS: On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier wrote: > -INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, > pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, > cipher-algo=aes256') FROM generate_series(1,10) AS x(id); > -CREATE INDEX idx_cartedecredit_cc02 ON > cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, > cipher-algo=aes256')); If my french is not too rusty you are encrypting a credit-card, and then storing an UNENCRYPTED copy in the index. So, getting it from the server is trivial for anyone with filesystem access. Francisco Olarte.
Re: Filtering before join with date_trunc()
Hi Phil: On Mon, Oct 15, 2018 at 5:57 PM, Phil Endecott wrote: ... > For each of these tables I have a view which rounds the timestamp > to the nearest minute, and ensures there is only one row per minute: ... > I then join these tables on the rounded time: > Now I'd like to find the values for a particular short time period: For what I propose I assume the SHORT time is really short ... > This works, but it is inefficient; it seems to create all the rounded > data, do the join on all of it, and then filter on the time period. > Ideally it would filter the raw data, and then need to round and join > far fewer rows. I think you are expecting too much, I mean, you are expecting the server to know it can expand your time-period into a (maybe) bigger one which covers the original data and push that condition down. In my experience this is normally too much. ... > Is there anything I can do to make this more efficient when the tables > are larger? If your periods are really short you could try to replace the time condition on the views to a time condition in the tables and do the rounding and grouping afterwards. I mean, use a half-open interval to catch the relevant chunks of the tables and then join the short results. I think with a trio of CTE selecting with a time interval on the WHERE and doing the date_trunc()/MAX() group by you should be able to do three index scans producing short results which can then be full-joined and coalesced. If you want the interval from $A to $B ( rounded to minutes ), do something like WITH pressures AS ( SELECT date_trunc('minute'::text, tbl."time") AS "time", max(tbl.pressure) AS pressure FROM tbl -- Chop the relevant time.. WHERE time >= $A and time < $B + '1 minute'::interval -- There may be easier ways to make the above condition if you are generating the text, but always use half-open GROUP BY 1 ), -- Easier to type, and no order-by here ( and I normally label order by in views as a code smell ) , yada, yada-- repeat for temperatures, rain SELECT SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time", rain.rain, pressures.pressure, temperatures.temperature FROM rain FULL JOIN pressures USING ("time") FULL JOIN temperatures USING ("time") ORDER BY 1; -- ORDER BY GOES HERE, I suspect your example got ordered by chance, not by dessign. ( use other names, I just used the view names for C&P, lazy me ). Francisco Olarte.
Re: compressing network packets
On Thu, Aug 1, 2019 at 5:43 PM farjad.farid wrote: > Under windows environment, is there a way of compressing the network packet > reaching postgresql server? You'll need to explain a bit more. You have windows, I assume for either client, server or both, and you have a pg server, on an unespecified environment. Also packets "reaching" the server does not sound good, you probably want to compress packets "sent" to the server and DECOMPRESS them upon reaching the server, or something like that. Try to be a bit clearer, something like "Is there a way to compress network traffic between a pg windows client and a windows server on another machine" and some windows savvy reader may be able to help you ( I can think of a couple solutions for that on my basic desktop OS with the basic tools, I'm sure something like windows must have dozens ). Francisco Olarte.
Re: Having the issue in while selecting the data and feltering in order by.
Nikhil ( or is it Raj? ): On Thu, Aug 8, 2019 at 11:47 AM nikhil raj wrote: > Here are the detailes. > Windows : Zapped image showing windows locale "English-united-states.1252" > linux : Zapped image showing Linux locale "En-us.utf-8" > both are in same version 11.3 > How can i get the sorting order by like windows how to change the lc_collate > Please can you help me over here OR else is there an other way we get the > sort order like window through code 1st, it is a locale mismatch problem. I do not own windows, so I can not test it, but IIRC win1252 is a latin-1 variant. I do not know how their locales sort, but it seems they are doing something like the C-utf-8 locale. You could try to recreate the database using the same locale as in windows. If you are able to do this ( dump, recreate, restore) it may be the easier way. I do not know if there are windows-like locales for linux, of how does that locale sort ( I mean, how does it process things like á or ö or ñ, your example show only the most basic stuff ). You could probably use C locale, in ascii, win1252 or utf-8 depending on your data content to get good results ( locale comes from the OS, but pg has the charsets ). With the restricted data in your examples, just switching to C locale will suffice, but I do not know if you have some more complex examples / sort orders ( someone knowledgeable in windows may be able to help. It seems like windows locales are not as sophisticated as Linux one, and IIRC this was true 20 years ago when I used them ). Other option is to modify either your columns, to include a collate option in the relevant ones, or the order-by clauses in the relevant queries, which support the collate clause to. You may want to read https://www.postgresql.org/docs/11/charset.html and investigate a bit with some test dbs. I would test something like creating a database with C locale, charset win1252, this may be a good starting point. But test some more queries in windows, i.e., things like this: $ echo -e 'a\nA\nb\nB1' | LC_ALL=en_US.UTF-8 sort a A b B1 $ echo -e 'a\nA\nb\nB1' | LC_ALL=C.UTF-8 sort A B1 a b Because it may matter for you, and I doubt the interactions can be properly tested without a proper dataset AND a windos pg server. Francisco Olarte.
Re: Having the issue in while selecting the data and feltering in order by.
Luca: On Thu, Aug 8, 2019 at 12:33 PM Luca Ferrari wrote: > On Thu, Aug 8, 2019 at 11:20 AM nikhil raj wrote: > > Same when i run this in linux machine i am getting this out in different > > sort order on the same query. > A collation problem? > What does this query do? > SELECT * > FROM (VALUES ('a'), ('___b1'), ('_a1'),('a2'),('a3'),('a5'), ('a2')) t > (val) order by val COLLATE "C"; It will probacly do the expected thing for him, but I think another problem may be what does this query: SELECT * FROM (VALUES ('0'),('1'),('a'),('A'),('á'),('à'),('b'),('B'),('ñ'),('Ñ')) order by 1; Does on the Windows server, what he is trying to replicate. Finding a locale which does not ignore _ as the windows one does is easy, but replicating all the doodahs may be difficult without knowing how the windows locale sorts all the chars ( and I'm not getting at 1a vs a1, or _ vs a vs A). I suspect C locale may be ok though. Francisco Olarte.
Re: constrain with MATCH full and NULL values in referenced table
Stan: On Mon, Aug 12, 2019 at 5:11 PM stan wrote: > > I am creating a table that has 2 values in it which are keys pointing to 2 > other tables. I need for the UNIQUE combination of these 2 keys to exist in > a fourth table. It has been recommended to use a foreign key constraint with > the MATCH FULL parameter. > > Here is my question, does this deal with NULLS in the 4th table? I am > concerned that this constraint might fail to reject an entry if one, or both > of the 2 key values being inserted in the table are NULLS,. If you have: Table TA (a: PK) Table TB (b: PK) Table TAB( a, b,) PK(A,B), FK(a ref TA), FK(b ref TB) Table FOURTH(a,b,...) FK((A,B) ref TAB mach full) Note TAB cannot have nulls in A,B as it is the PK. And you insert (null, null) in FOURTH it will be treated as in single column, allowed by the fk ( but you may have non null constraints on either a or b). If you try to insert (a1, null) or (null, b1), it will ber rejected, MATCH FULL does not allow null/non-null mix. OTOH, if you use MATCH SIMPLE the partial-null cases will be not checked at all, as if they where not null. As stated in the docs, you can use extra single column FK in a and/or b to get them checked in TA/TB, and also you can put non-null constraints on either on them. The exact combo depends on what you are trying to model, which gives you what you want. I.e., say I want to: 1.- check a,b combos. 2.- Allow (a,null) but have it checked against ta. 3.- Forbid (null,b) 4.- Aloow (null, null) You can use MATCH simple FK(a,b) against TAB for (1,4), single column FK(a) against TA for(2) and a check constraint (A is not null OR B is null , If I'm not confused ) for (3,4). ( Note you do not have to check b against tb, because if b is present, a is present, a,b is checked against TAB and TAB.b is checked against TB ). (match simple checks 1 and allows 2,3,4, FK(a) checks 2, and the check constraint forbids 3) The DB deals with nulls in many way, you just have to enumerate your conditions and elaborate on that. Note in this case it FAILS to reject an entry if b is null, because I dessigned it that way, but DOES REJECT if a is null and B is not. Regards. Francisco Olarte.
Re: SELECT all the rows where id is children of other node.
Pablo: On Tue, Aug 20, 2019 at 6:49 PM pabloa98 wrote: > Thank you for your responses Rob. Appreciated. The problem with recursive > queries is that they are executed several times and it has and impact in > performance. > I need a subset of those rows and I want them in one pass. > I discovered that ltree extension could be useful. I will play with it today. > I am sure there's a way to find al the nodes in O(n) time with n = size of > the resulset ... Unless you have some extra conditions in a table ( like "autoincremented immutable primary key and parents are always created before childs" ) I think your problem of "get all the descendant ( i do not like to call them children ) nodes of a given id" can not be solved in one pass. I mean, if you are getting descendants of the node N1, you need to read the last node, NL, of the table to know if it is a child of N1. But then you have to read the table again to find childs of NL. Of course, if you have something like "hierarchical ids" you can traverse ordering by it and know NL MUST be childless, and build the tree rooted on node N1 as you go, but without some of this conditions I do not think it can be done in an "ideal" db ( which lets you scan in any order you can define from just a row without cost ) in one scan ( storing and prunning the whole tree as you go is cheating ). Also, if your node ids come from a serial and are immutables, or you take a little care when mutating them, you can do it traversing by id, but you need a full scan, a recursive query with several index scans may easily be faster in wide trees. Francisco Olarte.
Re: SQL equivalint of #incude directive ?
Stan: On Fri, Aug 30, 2019 at 3:49 PM stan wrote: ... > I am defining a bunch of functions, and I would prefer to store them in a > separate file, which then gets "source" by the main DB init file. > Is there a standard way to do this? I assume you mean "standard for postgres", AFAIK there is not even a (multidb) standard way to feed an sql script to a server (content of queries / results are standard, how to send them / get them from the server is not ). \i in psql has been pointed. I do not know if this is one of your "workarounds", but what I normally do for this is trat the "init" script as a program and "make" it. I've done it using many of the preprocessor freely available around ( i.e. m4, cpp ) and also using a perl program ( but this is becuase I'm quite fluent in it and it makes some thing easier ), and even doing "cat *.sql | psql" and naming the scripts appropiately. You'll probably have it easier going by that route ( generating the sql script from chunks using a tool of your choice, it's just a big text chunk and you can easily debug it with a text editor ), and it is not that complex. Francisco Olarte.
Re: SQL equivalint of #incude directive ?
Tony: On Sun, Sep 1, 2019 at 10:05 AM Tony Shelver wrote: > I think you will find that most DBAs build their own scripts using tools like > Perl or Python, or a commercial product. Tools like Perl are great for building text output, it's got a lot of shortcuts for a lot of things. Not that sure about python. But I think this are more used by developers wearing dba hats, or for dbas which double duty as developer / sysadmins. Any dba needs some programming skills, specially when dealing with dbs with procedure capabilities, but not that much. For a pure DBA I would recommend more things like a classic preprocessor (I've got some success with m4, specially when using m4sugar with it, and I'm sure there are better ones around there) or some templating system. For just text CPP can be used, but it knows a bit about C syntax which can become a nuisance, but for just includes it can be fine. > A similar situation exists for HTML, there is no standard pre-processor > directive. IMO the reason is the same. SQL and HTML are not "source code", they are the final product, the thing you send over an http connection ( in html case ) or an implementation defined way ( i.e. inside the socket connections psql makes using libpq ). They are like the window exes, or the unix elf binaries. In the HTML case there are lots of templating systems which you use to build the HTML content, in many cases on the fly, and many of them can be reused easily for SQL. I think the trick is not treating the SQL as the source, but as an EXE file. Once you start thinking of it as such lots of things in your usual toolbox become useful. I routinely do something similar for configuration files, using just shell scripts full of echo+multiline constant + source aux files + source some echoing files and, if quotes are used for something like column names, cat + here docs. Francisco Olarte.
Re: How to reformat output of "age()" function
David: On Wed, Sep 11, 2019 at 5:39 PM David Gauthier wrote: > How can I change the default output of the "age" function to be, for example, > in minutes? > E.g. > dvdb=> select age('2019-09-11 09:00:00','2019-09-09 11:00:00'); > age > > 1 day 22:00:00 > (1 row) > I want the equivalent of that time delta in minutes. Some answers have already been posted, but also consider what you are doing. Intervals have three components, months, days, seconds for a reason, ( ignoring leap seconds for now ) not all days have 24h ( daylight saving time changes ) and not all months have 28/29/30/31 days. IIRC interval normalization for epoch assumes all months have 30 days, all days have 24 hours. If you want to know the elapsed minutes between two timestamps, it might be better to do it directly, extract the epoch from both ( seconds ), substract, divide by 60 truncating/rounding if you need to. This is what happens in one case on my timezone ( Europe/Madrid ): test=# select '2019.11.20 20:00:00'::timestamptz as end, '2019.06.10 10:00:00'::timestamptz as start; end | start + 2019-11-20 20:00:00+01 | 2019-06-10 10:00:00+02 (1 row) test=# select age('2019.11.20 20:00:00'::timestamptz, '2019.06.10 10:00:00'::timestamptz) as interval; interval - 5 mons 10 days 10:00:00 (1 row) test=# select extract(epoch from age('2019.11.20 20:00:00'::timestamptz, '2019.06.10 10:00:00'::timestamptz)) as interval_seconds; interval_seconds -- 1386 (1 row) test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz) - extract(epoch from '2019.06.10 10:00:00'::timestamptz) as elapsed_seconds; elapsed_seconds - 14122800 (1 row) Francisco Olarte.
Re: How to reformat output of "age()" function
OOps, I got it bad: On Thu, Sep 12, 2019 at 1:50 PM Francisco Olarte wrote: > timestamp / interval arithmetic is really a hairy thing. ( As shown > below, start point carefully taken to avoid crossing dsts ) It was chosen to FORCE, not AVOID, crossing dst. > cdrs=# select x, '2019.11.20 20:00:00'::timestamptz + x::interval as > sum from ( values ('5 mons 10 days 10:00:00'),('163 days > 11:00:00'),('3923:00:00')) as v(x); > x| sum > -+ > 5 mons 10 days 10:00:00 | 2020-05-01 06:00:00+02 > 163 days 11:00:00 | 2020-05-02 07:00:00+02 > 3923:00:00 | 2020-05-02 08:00:00+02 > (3 rows) cdrs=# select x, '2019.11.20 20:00:00'::timestamptz + x::interval as sum from ( values ('0'),('5 mons 10 days 10:00:00'),('163 days 11:00:00'),('3923:00:00')) as v(x); x| sum -+ 0 | 2019-11-20 20:00:00+01 5 mons 10 days 10:00:00 | 2020-05-01 06:00:00+02 163 days 11:00:00 | 2020-05-02 07:00:00+02 3923:00:00 | 2020-05-02 08:00:00+02 (4 rows) Subject is just complex enough I should avoid making this typos! Apologies, my fault, bad proof reading, hungry Francisco Olarte.
Re: How to reformat output of "age()" function
Adrian: On Wed, Sep 11, 2019 at 11:19 PM Adrian Klaver wrote: > On 9/11/19 9:34 AM, Francisco Olarte wrote: ... > > If you want to know the elapsed minutes between two timestamps, it > > might be better to do it directly, extract the epoch from both ( > > seconds ), substract, divide by 60 truncating/rounding if you need to. > > test=# select extract(epoch from age('2019.11.20 > > 20:00:00'::timestamptz, '2019.06.10 10:00:00'::timestamptz)) as > > interval_seconds; > > interval_seconds > > -- > > 1386 > > (1 row) > > > > test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz) - > > extract(epoch from '2019.06.10 10:00:00'::timestamptz) as > > elapsed_seconds; > > elapsed_seconds > > - > > 14122800 > > (1 row) > > Epoch is not the issue, age() is. Leave age() out of it: Well, I did not point to age as the problem assuming it was clear in the examples I gave. Two usages of epoch, one with other without age. > set timezone = 'Europe/Madrid'; > > test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz) - > test-# extract(epoch from '2019.06.10 10:00:00'::timestamptz) as > test-# elapsed_seconds; > elapsed_seconds > - > 14122800 > test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz > -'2019.06.10 10:00:00'::timestamptz) as > elapsed_seconds; > elapsed_seconds > - > 14122800 > (1 row) Ah, I see your point now. The problem is age and substraction gives different invervals. >cdrs=# select '2019.11.20 20:00:00'::timestamptz -'2019.06.10 >10:00:00'::timestamptz as diff_interval; diff_interval --- 163 days 11:00:00 (1 row) cdrs=# select age('2019.11.20 20:00:00'::timestamptz, '2019.06.10 10:00:00'::timestamptz) as age_interval; age_interval - 5 mons 10 days 10:00:00 (1 row) And I see it still translate it to normalized days. I still prefer substracting epoch values, as the 163d/11h seems really tricky to work with, and it took me a time to parse & diff your queries against mines, they all looked the same. It seems like PG does date substraction by substracting epochs and converting to a normalized interval, I prefer to go to number of seconds directly when working with "elapsed time"as otherwise I get confused. I mean something like this ( using it just to show the differences ): cdrs=# select (extract(epoch from '2019.11.20 20:00:00'::timestamptz) - extract(epoch from '2019.06.10 10:00:00'::timestamptz)) * '1 second'::interval as pure_seconds_interval; pure_seconds_interval --- 3923:00:00 (1 row) A third representation! which gives the same result for epoch, but I'm not sure it does for arithmetic( tested it, it does not ) I thought substraction would give me that, clearly it does not ( both give the same when using epoch, as lacking tz info it has to assume something, and it seems to assume no dst changes ). timestamp / interval arithmetic is really a hairy thing. ( As shown below, start point carefully taken to avoid crossing dsts ) cdrs=# select x, '2019.11.20 20:00:00'::timestamptz + x::interval as sum from ( values ('5 mons 10 days 10:00:00'),('163 days 11:00:00'),('3923:00:00')) as v(x); x| sum -+ 5 mons 10 days 10:00:00 | 2020-05-01 06:00:00+02 163 days 11:00:00 | 2020-05-02 07:00:00+02 3923:00:00 | 2020-05-02 08:00:00+02 (3 rows) Francisco Olarte.
Re: How to reformat output of "age()" function
Adrian: On Thu, Sep 12, 2019 at 4:23 PM Adrian Klaver wrote: > > pure_seconds_interval > > --- > > 3923:00:00 > > (1 row) > > > > A third representation! which gives the same result for epoch, but I'm > > not sure it does for arithmetic( tested it, it does not ) > > > > I thought substraction would give me that, clearly it does not ( both > > give the same when using epoch, as lacking tz info it has to assume > > something, and it seems to assume no dst changes ). > > See doc information above. It uses the SET timezone. See below for more > information: I know how to set the timezone and operate with it, and Iknow for the purpose of calculating elapsed seconds ( minutes / hours ) both approaches are the same. What I'm saying is you must take care on how to define and calculate your intervals if you are going to add them to TS directly, as an 86400 seconds one has the same epoch than a 1day one, but they represent different things when being added to a tstz, and I know that the difference depends on the current timezone. I do a lot of these, working with phone calls to extract ring / talk time from setup/connect/disconnect time, and I do not hit problems because I never do interval arithmetic on elapsed times , its not a problem if you are aware that there are "infinite" ways to represent an epoch in an interval . Francisco Olarte.
Re: How to represent a bi-directional list in db?
Pankaj: On Sun, Sep 22, 2019 at 4:25 PM Pankaj Jangid wrote: > CREATE TABLE stages ( >id SERIAL PRIMARY KEY, >name VARCHAR(80) NOT NULL, >created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, >updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, >prev_stage_id SERIAL REFERENCES stages NULL, >next_stage_id SERIAL REFERENCES stages NULL, >process_id SERIAL REFERENCES processes NOT NULL > ); > Failed with: conflicting NULL/NOT NULL declarations for column > "prev_stage_id" of table "stages" > Is it not possible to create "nullable" self referencing foreign keys? Serial seems wrong. It means integer, not null, defaul next value from a sequence. What you probably want is just "prev_stage_id INTEGER" ( NULL by default ), as you do not want the prev/next stage ids to be generated, you normally would want to assign values from other tuples. Also, you may have problems populating this kind of table, as you will not have the ids from either prev or next stage when building it. And lastly, in SQL you do not really need a doubly linked list, just populate prev_stage_id, and index it and you can query next stage of a tuple using it. Francisco Olarte.
Re: How to represent a bi-directional list in db?
Pankaj: On Mon, Sep 23, 2019 at 4:07 AM Pankaj Jangid wrote: > Thanks. This resolved my problem of NULL/NOT NULL conflict. I wasn't > aware that SERIAL is by default NOT NULL. Not only that. Once you strip the annoying NOT NULL the only thing remaining on a serial column is a "default nextval", which you normally do not want ( you could think of populating the table in creative ways, but they are on a different sequence than the one you use for the ID column ). > > Also, you may have problems populating this kind of table, as you will > > not have the ids from either prev or next stage when building it. > If NULL value is allowed I can fill it up with NULL initially. Right? Or > is there something wrong here. There is not, you can use (id,prev,next) = (1,null,null) and then update, but you are going to need to travel up and down a lot, or store a lot of data. If you use the trick I comment later of just using "prev", you can do, on a table having (id=serial, prev=int), build a sequence by doing "prev_id=null"; insert (id,prev,other_data) returning id; copy return value to prev_id, rinse and repeat. Also note that you can query the sequence AND advance it and then insert all rows without default values. > > And lastly, in SQL you do not really need a doubly linked list, just > > populate prev_stage_id, and index it and you can query next stage of a > > tuple using it. > Could you please elaborate? Suppose I have this table, > CREATE TABLE stages ( > id SERIAL PRIMARY KEY, > name VARCHAR(80) NOT NULL, > next_id INTEGER REFERENCE stages NULL, > ); > What would be the backward query in that case? Forward is clear. This is > forward query, > SELECT name FROM stages WHERE next_id = 123; No. That is a BACKWARDS QUERY. You are in row 123, you go BACK to its preceedeing one. If you need a traversable list containing (ListID, id,name) = x,1,A; x,2,b; x,3;c ( I've added the ListId column to make it more interesting/reallistic, you normally do not have a single table) In sql you can build a (ListId, id, prev_id, name ) table ( PREV is easier, as when you insert a row, in a normal application, you know the previous one, but not the next one ) with the data (x,1,null,a),(x,2,1,b),(x,3,2,c) ( the last one is a synthetic sentinel and assumes nullable id), you can do it in a lot of ways. To traverse it forward you just querying "select id where listid=x and next_id is null" to locate the head (1), and then just go forward by selecting with prev_id = last got id until you hit zero results. To traverse backwards there are several ways. In the real cases I've used I always had a "list row" where I could store the node for the 1st stage. In that cases i linked them circularly, (id=1, prev=3), so bidirectional traversing was easy. Or you can use a special sentinel node ( with a marker, like name=null). The thing is you locate the last row, and then just query with id=last got prev_id. I do not remember the details, but probably your "stages" are stages of something which has a row, which can readily have a "first_stage_id" or something similar. Lists in tables are not the same as in C, where you directly store pointers which point outwards. In this case any unique data serves as a pointer, slow ( table scan ) by default, faster if you index the column. Anyway, unless you need the "linked list" functionality for something ( really heavy manipulation of large stage lists, splicing things around ), I've normally found it's easier, in sql, to model this kind of thing with a master-detail + order column. ( whatever = (id, , first_stage_id), stages=(id, order, ) ) Francisco Olarte.
Re: How to represent a bi-directional list in db?
Pankaj: On Mon, Sep 23, 2019 at 4:07 PM Pankaj Jangid wrote: ... > My stages are stages of processes. So yes processes are also stored in a > table. I got the idea. I'll add another column in the processes table > which points to the first stage (first_stage_id). And quries > Forward pass: ... > Backward pass: .. > This is assuming I also create a circular list. I can also store > last_stage_id in the process table if we don't want to create circular > list in db. That's exactly the idea. A pointer in C becomes a pair of fields with corresponding value, and it can be traversed in any direction, slowly by table scan or fastly with indexes. In fact a pair of values becomes equivalent to two pointers, as it can be traversed either way ( but the indexing acceleration has to be applied to each direction ). That being said, linked lists are procedural data structures, SQL is declarative, so they are not a great match, that's one of the reasons why they are rarely seen. Things like master-detail have less impedance mismatch. Francisco Olarte.
Re: SELECT d02name::bytea FROM ... && DBI::Pg
Matthias: On Thu, Oct 10, 2019 at 1:25 PM Matthias Apitz wrote: > I can SELECT && print a column in hex with: > pos71=# select d02name::bytea from d02ben where d02bnr = '1048313' ; > ... > \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 > ... > > but when I use the same in Perl DBI::Pg with: > $sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = > '1048313'"); ... > It prints the UTF-8 string and not the hex string: May be because perl does not need the contents in hex DBI converts it to a native string, which is very similar to a "bytea" inside the db ( not in the wire protocol, but it does a similar thing for numbers ). Notice once you have a string in perl it is trivial to convert it to hex: $ perl -e 'print(unpack("H*", "1234ABCD"),"\n")' 3132333441424344 $ perl -e 'print(unpack("H*", "día año"),"\n")' 64c3ad612061c3b16f Francisco Olarte.
Re: SELECT d02name::bytea FROM ... && DBI::Pg
Matthias: On Thu, Oct 10, 2019 at 7:26 PM Matthias Apitz wrote: > Hmm. But *I* do need the content in hex to see if the varchar column > contains correct encoded UTF-8 data. We're on the way to port a huge > database application from Sybase to PostgreSQL and are facing any kind of > problems one can think of. Magically, sometimes strings, expected to be > coded in UTF-8, arrive in the Perl $variables coded in ISO-8859-1 and than > cause > other problems when German Umlauts should be translated into HTML > encodings like ü etc. to be presented in the web browser. This seems to be a perl problem ( I've had my share of them ). I suppose you can convince perl to upgrade all your vas to utf-8, but I do not remember how. Anyway, if you want "the hex representation of the bytea equivalent of a field ( of which I do not rememeber the original type ), why don't you ask for it? If you ask for bytea you are asking for a binary string. The fact it is transmitted hex-encoded in the wire is an implementation detail. psql is a text mode interface. To represent binary strings to you it needs text, so it uses hex. perl can do binary, so it puts the content in binary. After all your bytea could have been a jpeg image for all perl/DBI knows. But you can easily tell perl to pass a binary to hex, with unpack. OTOH, if you want, you can tell postgres to send you hex, look for encode/decode in the relevant manual pages, they are under binary string functions, IIRC. The pg will build a text hex string, send it on the wire by whatever mechanism it chooses and you'll get the hex data from DBI. Do not ask pg+DBI for binary ( ::bytea ) and expect text. > Perl (and Java) sucks, it does magic things below the surface of > string (objects). That's why I like C :-) They have differing degrees of suckines. I've read the Java String sources, and been horrified by them. Perl, OTOH, sucks for many things, but has its points. And C, I try to use C++ for everything I can ( many times using it as just a better ( for me ), C compiler, I've been known for writting huge chunks of C++ but use malloc/stdio and friends all around, but I've found programs have way less bugs if using thighter types ). It's a taste question, and of course, I wouldn't like to do my hundreds of <100 liners in perl for parsing huge texts and extracting some data in C. But utf-8/latin-1, you're right, you have umlauts, we have accendts and n-tildes, I've been there and will be again. Enough off-topic anyway. Francisco Olarte.
Re: Is there any configuration in postgresql.conf or any other configuration of postgres which will make this possible to listen on particular interface
On Wed, Oct 16, 2019 at 12:07 PM M Tarkeshwar Rao wrote: > Above processes are bind on all the interfaces of a machine. Is there any > configuration in postgresql.conf or any other configuration of postgres which > will make this possible to listen on particular interface? Listening on an INTERFACE is a tricky thing. But if you can manage with listening on a particular ADDRESS and letting the os pick the interfaces for you, you can try listen_address ( which is trivially found on https://www.postgresql.org/docs/12/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS , although being lazy I just searched for "listen" on the manual index ) Francisco Olarte.
Re: CPU SPIKE
You'll have to provide more data: On Mon, Oct 21, 2019 at 7:14 AM Pawan Sharma wrote: > Having real high CPU issue (95-98%), with SELECT statements and select > queries contains multiple AND operator, is it will cause any CPU Spike..??? If you do a query the only reason it is not hitting 100% cpu is it is waiting for something, like disk or network or locked data, so with all data caches, fast networks ( or small results ) a standalone query will use 100% CPU during its run ( it must, if it is not doing it something is rotten in the scheduler ). If some set of queries is using 100% cpu during, i.e., 5 minutes, they may be asking for cached data that is normal. A proper question will be "x queries of type y are eating 95% CPU in my machine Z, with this type of data, I think it is not correct because blah-blah". An spike during the query execution is the correct outcome, CPU time is not like petrol, if you do not use it is lost, and careis taken doing things like paralell jobs to insure queries use as much CPU time as possible ( because typically a query needs, say, 5 core-seconds, if you use 10% cpu in an octacore that is 3.75 wall secs, if you use 100% it is 0.375 ). Also note locks/big result transmissions aside a fully cached DB is cpu-limited ( and would be for selects with complex conditions ), I would expect 100% usage if enough ( for 24 cpus ) clients are doing complex queries against a cached database. Your problem may be "it is using 100% of 24 cpus during a minute" where you think "it should be just a sec", but the 100% figure is better, you do not want your cpu to sit iddle. > apps team is using sub-partition, PG11, CPU:24, Mem: 16GB ... > effective_cache_size > -- > 22GB ... > max_worker_processes > -- > 8 I may be mislead, but isn't 16Gb a little spartan for 24 CPU with 8 workers ( per query ? ). Also, I assume the 22Gb is because you are accounting for a huge host cache. Francisco Olarte.
Re: Upgrade procedure
On Wed, Oct 30, 2019 at 6:00 PM rihad wrote: > Hi, all. Why is it normally suggested to stop the server, upgrade it, > then start it? Wouldn't it be easier & quicker to simply upgrade the > package in-place and restart the service? On OSen that allow > modification of currently running binaries, which is most Unix OS, M$ > Windows being a notable exception ) Not sure exactly why, and whether postmaster protects itself from this, but in many years of using linux, which supports it, I've encountered my share of (minor) problems when doing that, typically due to the program not having loaded all the modules it can use ( firefox is a classic for me, as I have it normally open while I do my background updates, nothing a restart does not solve, but I would get nervous doing that to the database ). That being said, I've upgraded my TEST servers without stopping them ( although lately the upgrade scripts do a restart at the end ), but for production I prefer to download everything and prepare as much as I can and do a stop-finish upgrade-start, the last ( file extracting ) phase is normally much faster than the server restart machinery, so no much is gained by doing it in paralell. And other thing, MODIFICATION of currently running binaries is BAD, and IIRC many OS use them for paging and will not be happy ( maybe in these days they transparently switch to normal swap ). What you normally want is unlinking and replacing, modification of the names aka dir contents (not being able to do this is one of my strongest dislikes of windows, it makes so many things so much simpler ). Francisco Olarte.
Re: Getting following error in using cursor to fetch the records from a large table in c language(current transaction is aborted, commands ignored until end of transaction block)
On Fri, Nov 1, 2019 at 1:44 PM M Tarkeshwar Rao wrote: > Getting following error in using cursor to fetch the records from a large > table in c language. Regarding this, "c language", I'll comment later > Can you please suggest why it is coming and what is the remedy for this. > Nov 1 13:21:54 sprintstd2 postgres[18604]: [10-1] < 2019-11-01 13:21:54.212 > CET > ERROR: current transaction is aborted, commands ignored until end of > transaction block > Nov 1 13:21:54 sprintstd2 postgres[18604]: [10-2] < 2019-11-01 13:21:54.212 > CET > STATEMENT: BEGIN > Nov 1 13:21:54 sprintstd2 postgres[18604]: [11-1] < 2019-11-01 13:21:54.324 > CET > ERROR: current transaction is aborted, commands ignored until end of > transaction block > Nov 1 13:21:54 sprintstd2 postgres[18604]: [11-2] < 2019-11-01 13:21:54.324 > CET > STATEMENT: BEGIN > Nov 1 13:21:54 sprintstd2 postgres[18604]: [12-1] < 2019-11-01 13:21:54.356 > CET > ERROR: current transaction is aborted, commands ignored until end of > transaction block > Nov 1 13:21:54 sprintstd2 postgres[18604]: [12-2] < 2019-11-01 13:21:54.356 > CET > STATEMENT: BEGIN > Nov 1 13:21:54 sprintstd2 postgres[18604]: [13-1] < 2019-11-01 13:21:54.360 > CET > ERROR: current transaction is aborted, commands ignored until end of > transaction block > Nov 1 13:21:54 sprintstd2 postgres[18604]: [13-2] < 2019-11-01 13:21:54.360 > CET > STATEMENT This seems incomplete, but I's telling you the cause. You had an error, you need to terminate the transaction before issuing a new one, i.e., do a commit ( which, IIRC, will rollback if the transaction is in error ) or rollback. > Sample Code snippet used As you stated C I cannot comment too much, but notice: > theCursorDec = (RWCString)"DECLARE " + mySqlCursor + " CURSOR FOR " > + theSql; > myFetchSql = "FETCH " + fetchStr + " IN " + mySqlCursor; Neither of these are C, they COULD be C++ > // Begin the cursor Same as this comment. > PQexec(connection, ,"BEGIN")) > PQexec(connection, ,"myFetchSql”) And these are definitely not C ( no ; ) and, if you generated them by editing, myfetchsql is quoted which smells fishy. I won't comment more until you confirm that is the real code, but anyway it seems to me you issue transaction start queries without properly terminating them with a transaction end one. Francisco Olarte
Re: Getting following error in using cursor to fetch the records from a large table in c language(current transaction is aborted, commands ignored until end of transaction block)
M Tarkeshwar Rao: On Sat, Nov 2, 2019 at 5:16 PM M Tarkeshwar Rao wrote: > Thanks Yes it is in c++. Actually we just written this code. > Due to vaccum full cursor query failing on a connection and all the > subsequent queries are failing and we > found shared errors in /var/logs of the postgres installed machine. ... Some more things followed by a huge bottom quote-to-the-sig. Your writing style makes it really hard to decipher what you are trying to say. I hope your problem has been solved, because I am not going to try to decipher it or comment more. Francisco Olarte.
Re:
Sébastien: On Thu, Nov 21, 2019 at 3:42 PM Sébastien Bihorel wrote: . > /home/sebastien/data $ ls -l > drwxr-xr-x 2 sebastien postgres 4096 Nov 19 09:14 pgdata96_sebastien > Starting psql as sebastien, I tried the following requests and got the > following errors: ... > The manual states "The location must be an existing, empty directory that is > owned by the PostgreSQL operating system user." So, I am not sure what I am > doing wrong. Web searches ran on these error messages did not enlighten me > more... Your directory is owned by OS USER sebastien GROUP postgres, try "chown -v postgres /home/sebastien/data " to fix it. The server ( postgres ) does not have permissions ( rwx >>r-x<< r-x, only read and chdir ) to create anything in there. The fact you start psql as sebastien does not matter, it must be owned by the SERVER user ( as psql is separated, it communicates by a socket, can be in another machine ). Francisco Olarte.
Re: System column xmin makes anonymity hard
Johannes. On Tue, May 12, 2020 at 8:05 PM Johannes Linke wrote: > since 9.4, VACUUM FREEZE just sets a flag bit instead of overwriting xmin > with FrozenTransactionId [1]. This makes it harder to build applications with > a focus on data reduction. > We have an app that lets people anonymously vote on stuff exactly once. So we > save the vote in one table without any explicit connection to the voting > user, and separate from that a flag that this person gave their vote. That > has to happen in the same transaction for obvious reasons, but now the xmin > of those two data points allows to connect them and to de-anonymize the vote. > We can of course obfuscate this connection, but our goal is to not keep this > data at all to make it impossible to de-anonymize all existing votes even > when gaining access to the server. The best idea we had so far is more of a > workaround: Do dummy updates to large parts of the vote table on every insert > so lots of tuples have the same xmin, and them VACUUMing.[2] And even without the xmin someone could cump ctid and correlate them if you are not careful. You problem is going to be hard to solve without taking extra steps. I think doing a transaction which moves all the votes for period ( using insert into with the result of a delete returning ) and then inserts them back ( with some things like a insert into of a select order by random ) may work ( you may even throw a shuffled flg along the way ). An then throw in vacuum so next batch of inserts overwrites the freed space. But for someone with the appropiate access to the system, partial deanonimization is possible unless you take very good measures. Think of it, here in spain we use ballot boxes. But voter order is recorded ( they do double entry check, you get searched in an alphabetic list, your name is copied on a time ordered list, and your position on the list recorded in the alphabetic one, all in paper, nice system, easy to audit, hard to cheat ). If you can freeze time, you can carefully pick up votes from the box and partially correlate them with the list, even with boxes much larger than the voting envelopes they tend to stack with a nice order. And this is with papers, computers are much better on purposelessly ordering everything because it is easier to do it this way. > Does anyone have a suggestion better than this? Is there any chance this > changes anytime soon? Should I post this to -hackers? Something which may be useful is to use a stagging table for newly inserted votes and move them in batches, shuffling them, to a more permanent one periodically, ad use a view to joing them. You can even do that with some fancy partiotioning and an extra field. And move some users already-voted flags too, on a different transaction. Doing some of these things and adding some old votes to the moving sets should make the things difficult to track, but it all depends on how hard your anonimization requirements are ( I mean, the paper system I've described leaves my vote perfectly identificable when I've just voted, but it is regarded as a non issue in general, and I suspect any system you can think leaves the last vote identifiable for a finite amount of time ). In general, move data around, in single transactions so you do not lose anything, like shaking a ballot box periodically ( but ensure the lid is properly taped first ). Francisco Olarte.
Re: Catching errors with Perl DBI
Stan: On Thu, Jul 2, 2020 at 5:03 PM stan wrote: > How can I catch the errors generated whne I call an INSERT that violates a > constraint? I have coded like this: > > my $sth = $dbh->prepare($stmt); > my $rv = $sth->execute() or die $DBI::errstr; > if ( $rv < 0 ) { > print $DBI::errstr; > } > > But, if the INSERT violates a constraint, it never gets the the evaluation > of the $rv I assume you mean the if($rv<0) is what it is not executed. In perl this happens because something died. I assume it is not the one you coded. This means some of your handles have the RaiseError attribute, lookit up in the perldoc. > Is this a setting for the DBI? I do not remember if it has a global setting, but it sure has a database handle setting ( which percolates down ). I use it routinely for easier error handling. I'm not sure if you know how to from your message, but if something is dying you can use the block eval construct: eval { # potentially dying code... my $sth = $dbh->prepare($stmt); my $rv = $sth->execute() or die $DBI::errstr; if ( $rv < 0 ) { print $DBI::errstr; } 1; # This forces the eval to return true if execution gets here. } or do { # Whatever you want, $@ has the codes. } to trap it in perl. About RaiseError, it is common to set it to true in the handle, put all your code in a sub() and catch it, in programs where you donot have easy recovery of errors, and use local eval if needed to catch this kind of prim. key violation things. Francisco Olarte.
Re: Apply LIMIT when computation is logically irrelevant
Robins: On Mon, Jul 6, 2020 at 1:37 PM Robins Tharakan wrote: > When an SQL needs to UNION constants on either side, it should be possible to > implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect > understanding, > or something already discussed but rejected for some reason? Maybe the optimization does not hold its weight. I mean, the increased complexity in the optimizer, bigger memory footprint, testing and developer usage, is not worth it. > This need came up while reviewing generated SQL, where the need was to return > true when > at least one of two lists had a row. A simplified version is given below: I.e., I do not think this is a "need", specially in generated SQL, seems more like a deficiency in the generator ( specially since generators are able, and some do it, to massively overcondition the generated code to insure the optimizer does not miss anything ), and wrapping things in a limit 1 when just testing for row existence seems easy to do while generating. > (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class); > vs. > (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- > Faster As an aside, isn't it easier, more correct ( in my opinion ) and clearer to just use exists for row-existence test? Because you have to at least see it there is a result above, probably using exists, and you can do... select exists(SELECT 1 FROM pg_class) or exists(SELECT 1 FROM pg_class); to get a direct boolean and benefit from shortcircuiting, by putting the most likely one first, and from the internal knowledge the optimizer may have to not fully evaluate queries, which may be greater than deducting from the union/limit case. Francisco Olarte.
Re: Doubt in mvcc
Rama: On Mon, Jul 13, 2020 at 9:52 AM Rama Krishnan wrote: > I m preparing for interview one of the recruiter asked me mvcc drawbacks as i > told due to mvcc it use more space and need to perform maintenance activity. > Another one is the same data causes an update conflict because two different > transactions can update the same version of the row. > he told its wrong, kindly tell me will you please tell me its correct or > wrong? I'm not sure I understand your question too well, you may want to refresh/expand. One interpretation is, on a pure MVCC contest, two transactions, say 5 and 6, could try to update a tuple valid for [1,) and end up generating two new tuples, [5,), [6,) and closing the original at either [1,5) or [1,6) . That's why MVCC is just a piece, locking is other. On a MVCC the tuples are locked while a transaction manipulates them. Other transactions may read them, which is why readers do not block writers, but two updates on the same tuple serialize. Francisco Olarte.
Re: Doubt in mvcc
Rama: 1st of all, please do not top post, specially if you want to discuss examples, I'm not goint to hunt up and down for the relevant sections. On Mon, Jul 13, 2020 at 10:52 AM Rama Krishnan wrote: > Still, I have a doubt as per your example both are trying to update the same > tuple so it may produce two different copies right? First, a loud advise. MY EXAMPLE IS NOT A REAL EXAMPLE OF HOW A REAL MVCC DATABASE WORKS. It just tried to show that if you just use the min-max transactions on storage without using more techniques, which many people will consider part of MVCC, it will not work. > I read some blocks they mentioned drawback above two things > It may lead to lost update also > For example, two transactions are going to increase the amount on the same > account by $100 . The first transaction reads the current value ($1000) and > then the second transaction reads the same value. The first transaction > increases the amount (this gives $1100) and writes this value. The second > transaction acts the same way: it gets the same $1100 and writes this value. > As a result, the customer lost $100. PROPER mvcc, like postgres does, will not lead to this if used properly. If both transactions use UPDATE first will read AND lock the row, update value, write it, commit and unlock. Second will try to read and WAIT for lock, read 1100, write 1200. Some things happens if both use select for update and/or use the adequate isolation levels to force the engine to use appropiate locks. BUT if both transactions do a select, wait for a bit, then do an update set ( NOTE: for a banking application the CORRECT way to do a deposit is "update accounts set balance=balance+100", not "select balance from accounts into $B; update accounts set balance=$B+100 ). You may end up which what look like a lost update, but is really not a DB problem. If you do it in two ops, the DB does not know they correlate. For what it knows your bank might be a room with money in tin boxes, you read it to chek a box, counted the box, noticed the discrepancy and sent an update to fix it, and did it twice to be sure. Update correlates more with "I opened the box and put 100$ in without looking at what was there". Update returning would be "and I counted the box afterwards", and select for update would be "I took the box to my table, counted, added 100, counted the new pile, and returned the box to the safe". > Will u please provide more details No, I will not. You need to read a lot more than a mail can hold, MVCC is a complex topic which I just do not fully understand and I'm not going there. But there are tons of info floating around, more or less easy to find. > Serialize is the solution to this issue. That's what locks and isolation levels do. Francisco Olarte.
Re: Improvement for query planner? (no, not about count(*) again ;-))
Tobias, 1st some etiquette stuff. - You have replied just to me, directly. I'm CCing the list. Remember to use reply all. Usual practice in the postgres lists is to reply to the list and everyone involved in the thread ( doing reply all achieves this normally ). - It's not a biggie in this particular mail, but please do not top-post, specially if you want to get answers on any complex question. Trim unnecessary parts from the quoted text and reply below. Having to scroll to a big chunk which includes even my signature is not a thing I like. On Mon, Jul 20, 2020 at 2:23 PM wrote: > I have tried the queries > select name1 from games union select name2 from games > but not > select distinct name1 from games union > select distinct name2 from games > since it's just the same and easy for the optimizer to realize (I thought?) There are several other things you have not done. You have not provided any info ( statistics ) on your table, just the cardinality. You have not provided any explain output, which is normally needed if you really want people to help you. On the subject, I'm not really sure both queries are identical or can be optimized, but when one does bulk queries like that it is better to help it. Your query is a corner case, a one of loading, and many optimizers do not catch that things properly, as putting code for them means increasing bug surface on a feature of dubious utility ( I, personally, would prefer having to put your sample query manually than risking optimizer bugs OR paying the price of the optimizer trying to catch that on every query I send ). Also, the optimizer may be catching many things, a explain output may help to see what it's doing. > I've given Postgres a few Gigs (I think 4?) as work_mem, having 16 GB in > total. Still it's not using them. This does not seem correct. work_mem is per operation, it can be used several times on a single query. See https://www.postgresql.org/docs/12/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY . Again, some explain/show combos may clear up things. > Seems like my mistake was creating that table with a primary key. But the > query itself without inserting into anything should've been fast then, which > it wasn't. I'll remember your trick of creating the primary key afterwards > and will try just select name1 from games to see how it goes. The PK stuff is bulk-loading 101. Try explain AND explain analyze of some variants, remember to analyze your tables ( seems redundant, but the PK & redundant hash key stuff leads me to think you are not too experienced on postgres usage ). Francisco Olarte.
Re: Bytea Example
Naveen: 1st, do not top post, it is frowned upon here. AAMOF I was tempted to skip your message due to this, as it makes replying too hard. AK asked you (twice), "What programming language and database driver are you using?" On Sun, Aug 16, 2020 at 7:11 PM Naveen Kumar wrote: > I am connected to PostgreSQL in a Windows environment. Below is my version: > "PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit" AK question is relevant because pg is a client-server database. I assume you have the server correctly installed and running, but we need to know what client software you are using, i.e - "I'm using the bundled psql command line program." In this case you interact by sending sql commands directly (I'm not able to put a sample because I never use bytea from psql, but many on this list do and can help you. - "I'm using pgAdmin4", and some pgAdmin user can help you. - "I'm using a C program using libpq" - "I'm using a Java program using JDBC" > I am just trying to practice LOB objects, like byteA data type, in > PostgreSQL. Unfortunately, I didn't find good links regarding this so I > raised the issue. Nothing else I am doing. There are lot of links, many people know them by heart, but you need to help them by answering the question. The driver/language phrasing in AK question is because normally no one uses a plain sql ( psql/pgadmin) tool for LOBs, but if you do there are links which can be provided. Bear in mind that for all we know you could be connecting using your own gopher based tool which translates coloquial swahili to sql. Francisco Olarte.
Re: Reconstructing transaction content after the fact
Lionel: On Wed, Sep 2, 2020 at 5:46 PM Lionel Bouton wrote: ... > The software uses a modern version of PostgreSQL (9.6 or later) and uses > a transaction to store a vote and the corresponding register entry in > two separate tables. ...> > There's already an obvious angle of attack to correlate a vote and a > register entry : the WAL. That said the WAL can be configured to be > short lived to minimize the attack surface. By my understanding a low > Another angle I could think of are the transaction ids. If I'm not > mistaken they are stored on disk inside the files storing table and > index data (my quick read of the documentation lets me think the txid is ... > Is there a way to access these values by connecting to a PostgreSQL > server instead of analyzing in-memory or on-disk data ? If you assume attackers can read your disks, or even query arbitrary tables, you are going to find plenty of potential attacks. Apart from xid databases tend to lay insertions sequentially on disk, so if your main way of filling the tables is inserting a pair you've got a nice probability of correlating them with just two "select * from voters/entries" ( or just dumping the raw table files if they get file but not sql access ). In fact even paper voting can be attacked this way ( in spain they write every voter in the order in which they vote on a sheet, and if you do not take care to shake the ballot boxes you can get a very decent aproximation to the envelope insertion order ( you will need raw access to the boxes, like you would need raw sql or disk access in your case ) ) . ( Not sure if ballot box is the correct term ) For something like that I would try to insure no disk access, no raw sql access, give the apps a single point of access to the DB mediating every query/operation with a stored procedure/function, using accounts with access to only those, even for selects, so you have tight control and easier auditing. Francisco Olarte.
Re: Network performance optimization
Fabio: On Wed, Sep 9, 2020 at 1:05 PM Fabio Ugo Venchiarutti wrote: > Even if network datagrams moved at the speed of light and with no > serialisation/forwarding delay (which they don't), you're still going to > deal with several thousand KMs of distance; I'm positively surprised > you're getting such short round-trip times as it stands. Light travels at about a foot per nanosecond in air/vacuum, 20 cm per nanosecond in glass IIRC, so you can RTT 10cm per nanosecond in fiber. This amounts to 100km per millisecond. 1200 to 1600 km in 12-16 ms. East-1/2 are N.Virginia / Ohio, which can be from 240 to 950 km apart ( on a quick google maps measure, not several thousands ), depending on the exact place. And Amazon has really fast pipes going between their regions, so this is not surprising. I have 55ms from ireland (eu west 1) to N.Virg., and they are 5500 km over the great circle. Francisco Olarte.
Re: How to write such a query
Igor: > My problem is that I want to emulate Access behavior. > As I said - Access does it without changing the query internally (I presume). > I want to do the same with PostgreSQL. Use access connected to Postgres. > I'm just trying to understand how to make it work for any query > I can have 3,4,5 tables, query them and then update the Nth record in the > resulting recordset. > Access does it, PowerBuilder does it. Now, jokes aside. Access, Powerbuilder are applications used to edit databases ( among other things ). Postgres is a database server which provides databases. > I just want to understand how. If you want to do the same thing in your app, they normally pull the result of a query which some columns uniquely identifying the row, or use a cursor and use the "WHERE CURRENT OFF" positioning. But YOUR APP does it, postgres does no do it. When Access does it the database is in postgres, or swl server, or jet, or whatever. Access is not a database, in the sense postgres it. Jet is. Francisco Olarte.
Re: How to migrate column type from uuid to serial
Hemil: On Wed, Oct 7, 2020 at 2:49 PM Hemil Ruparel wrote: > I was integrating a payment gateway for my app when I noticed its maximum > length of customer id string is 32. SIze of UUID is 36 (32 characters and 4 > dashes). So I want to change the type of customer id to serial. The problem > is by now, the column is being used at many places. How to migrate the column > to serial without dropping the data? An uuid is just a 128 bit number, as noted by many. Your problem is the default encoding is 32 hex digits plus 4 hyphens. If your payment gateway uses a 32 CHARs string ( "maximum length of customer id string is 32" ) you can just use other encodings. Passing to integer and using decimal wont cut it ( you'll need 39 digits ), but using just hex (without hyphens) will drop the string representation to 32 characters ( you can encode with a subst and postgres accepts it without hyphens ). If you want to buy a little more space for your own purposes you can even fit 128 bits in 22 base64 chars with a couple pairs of bits to spare, and IIRC you can do it with a creative decode/encode step after killing the dashes. And if your payment gateway uses unicode codepoints instead of ASCII chars as units you could probably use more creative encodings ;-) , but probably using a "drop the dashes" subst in the interface will be your simpler option. Francisco Olarte.
Re: How to migrate column type from uuid to serial
On Thu, Oct 8, 2020 at 6:14 AM Hemil Ruparel wrote: > I was thinking UUID was not a very good choice for id. Serial would be a > better one because I don't have a billion customers. It is more like a > thousand. So when I saw the customer ID of the payment gateway cannot accept > more than 32 characters, I thought UUID is overkill. So I want to migrate to > using a serial int instead as the primary key., I do not like them. They are typically random 128 bit integers. If you think of it, random 16 bits ints are not a good idea for keys, even if the table has only a couple dozen keys. UUIDs are the same at a bigger scale, they look nicer and safer, but have got problems. The version 1-2, the one with mac address with theoretically guaranteed uniqueness were nice for their intended use, avoiding collisions without a central authority. But the namespace hash and random version are practically collision free, but not theoretically, which displeases me. When I need to present an obscured name to an outside service ( i.e., in your case, presenting the user id to the gateway directly leaks information, as they can estimate your consumer growth ) I just encrypt ( not hash, encrypt ) them. For these you can do something like treating a 64 bit number like an 8 byte array, encrypt it with a "secret" key and you send something which is random enough to the gateway, but is easy for you to manage in the database, and preserves the counter niceties, like knowing there are approximately 1000 customers between IDs 3000 and 4000 ( some may have been deleted /skiped ). Also, depending on your patterns, IDs typically correlate nicely with table order, which makes index access a bit faster, you do not jump around. And 4 byte ints make for a much denser index than 16 byte uuid. Francisco Olarte.
Re: Strange behavior
Olivier: On Sat, Oct 10, 2020 at 6:13 PM Olivier Leprêtre wrote: > I’m surprised by this behavior I noticed in pgadmin3 and postgresql 9.6 ... > select v1 from test1 where v1 not in (select v1 from test2) This is called a correlated subquery ( google and search for it, it is even in wikipedia ). It has many uses. Basically, it refers to the v1 from the outside query. Get in the habit of using (potentially aliased ) column names whenever you have any moderately complex query, i.e. if the inner v1 would have been v2 ( due to a typo ), writing your query as : select t1.v1 from test1 as t1 wher t1.v1 not in ( select t2.v1 from test2 as t2 ) Would have caught it. Francisco Olarte.
Re: Another user error?
On Sun, Nov 1, 2020 at 7:54 PM Rich Shepard wrote: > > When trying to populate the locations table I get this error: > psql:locations.sql:2105: ERROR: syntax error at or near ";" > LINE 2105: ...E Airport Way',null,'Portland','OR','97218',null,null,null); > ^ > Line 2105 is the last line in the file and is terminated with the only > semi-colon in the file. Every line prior to this one is terminated with a > colon. All other syntax errors have been corrected. > > I've not encountered this issue before. Where should I look for the error? This is nearly impossible to diagnose without a chunk of the query ( i,e, the first lines and the last ones ). It smells to missing parentheses. If you use some editor witch matches them, try adding one BEFORE the semicolon. If it matches something above you've got it. But really, show your code, otherwise you are not going to get much help. Francisco Olarte.
Re: Another user error?
Rich: On Sun, Nov 1, 2020 at 8:04 PM Rich Shepard wrote: > > On Sun, 1 Nov 2020, Francisco Olarte wrote: > > > This is nearly impossible to diagnose without a chunk of the query ( > > i,e, the first lines and the last ones ). > Francisco, > $ psql -d bustrac -f locations.sql This is NOT the first line of the query. Even "head locations.sql" is not guaranteed to give the first line of the query ( you know it, we do not, it may have a few selects ata the top. What I mean is something like >>>>>>>> insert into locations balh, blerg, blub values (1,2,3), .. snip the middle lines. (4,5,6), (7,8,9); <<<<<<<<<<<< > > It smells to missing parentheses. If you use some editor witch matches > > them, try adding one BEFORE the semicolon. If it matches something > > above you've got it. > Here're the last two lines: > (2297,1,'Principal place of business','600 SE Bay > Boulevard',null,'Newport','OR','97365','541-265-7758','541-265-4235',null), > (2298,1,'Principal place of business','7200 NE Airport > Way',null,'Portland','OR','97218',null,null,null); > And psql found a few missing terminating parentheses and commas which were > fixed. I expected this last run to successfully complete. Good luck. BTW, if you know a bit of scripting in perl/awk/python whatever, it's normally useful to generate big insertion queries programatically, to avoid mismatched quotes problems. And if your editor is half-decent ad you have th disk space you can normally try to make a big file, create an empty table like the target one, insert into so you can bisect the input ( and once you have everything in spool tables you just insert the whole table into the final one and drop it ). I mean, you do "create table location_tmp like locations". Then, split your input files into easily managed ones, say some thousand lines. Insert every file into tmp, do bisections on the ones which fail to find errors. Once you have everything in tmp just insert from it into final table. Francisco Olarte.
Re: Increased size of database dump even though LESS consumed storage
Thorsten: On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning wrote: ... > I've changed the new table "datagram" to be a partitioned one with > partitions containing rows per year and per half-year. Each partition > contains far less rows than before this way and while I only tested > with "--compress=9" this time, the numbers are quite interesting: > > > unpartitioned: 6,4 GiB > > half-yearly parts: 4,8 GiB > > yearly parts: 4,8 GiB > > The interesting part this time is that the table layout for all > partitions is the same like before, only the number of rows per table > is different. Though, the number of rows overall is the same like > before, the same data, IDs etc. Though, this time the dump really is > smaller than with the OLD schema containing far more data because of > duplicate IDs and stuff. > I wouldn't have expected table layout to be that important. Compresion is dependent on detectable redundancy on the input. pg_dump more or less gzips per-table "copy to stdout" dumps. If your data evolves in a determined way having it sorted by ingestion time may increase detectable redundancy a lot, and partitioning sorts partially by date ( or fully if you have made the partitions by range-querying via index scan ). In this case it may not be the layout, but the order. Given you seem to be able to test, you may try sorting the full table by the column you use for partitioning. IIRC cluster will do the trick if it is indexed. ( This has happened to me compressing document dumps, presorting by some chosen fields improved my compression ratio a bit, IIRC it was due to the compressor searching for duplicates on a limited window only, this is why I use "detectable" redundancy ) Francisco Olarte.
Re: Increased size of database dump even though LESS consumed storage
Thorsten: On Wed, Feb 10, 2021 at 12:36 PM Thorsten Schöning wrote: > Guten Tag Francisco Olarte, > am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie: > > [...]and partitioning sorts partially > > by date ( or fully if you have made the partitions by range-querying > > via index scan ).[...] > That statement is especially interesting not only because of my > dump-size, but I'm running into the problem that queries spanning more > than one partition seem to prefer sequential scan over using indexes. > My indexe seems to only be used when querying the rows of one > partition. You know the drill, show your indexes, post some explain for these, those statements are too fuzzy to infer any useful advice. > So, the following is my definition, should that be "range-queried via > index scan" properly? :-) > > CREATE TABLE datagram > > ( > > id bigserial NOT NULL, > > src_re integer NOT NULL, > > src_cltinteger NOT NULL, > > src_meter integer NOT NULL, > > captured_attimestamp with time zone NOT NULL, > > captured_rssi smallint NOT NULL, > > oms_status smallint NOT NULL, > > oms_encbytea, > > oms_decbytea > > ) PARTITION BY RANGE (captured_at); > > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > > ('1970-01-01') TO ('1970-07-01'); > > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > > ('1970-07-01') TO ('1971-01-01'); > > [...] Given that definition there is no possible index scan, as there are no indexes. > > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart; But given this what I was suggesting is not true. What I meant is that if you had big datagram_unpart table indexed by captured_at with good index correlation, which given the name of the field would be expected, and populated the partitions one by one by doing range queries over datagram unpart you may have hit index scans on each population query and end up with perfectly sorted partitions, and that could help compressions. Given your query, unless the optimizer is performing some supernatural ( for me ) trickery my bet would be on a full table scan plus insertion which would left you with whatever order you had originally for each partition. But if your data, or your "data shape" has some natural correlation with captured timestamps ( i.e., I know that my CDR data has it ), puttig similar data together could have improved your compression ratios. Bear in mind I do not know it. Anyway, I would not worry too much about the backup ratios with the sizes you have, I would worry more on doing things like augmenting fill factors and similar things if your data processing is similar to mine ( data enters at time captured_at, sometimes is fixed due to errors but normally never moves after being in the tables for a couple of months, and when it does it is a special problems which can be manually tuned ). It will not improve the backup too much, but can improve other things. i.e., in my big CDR ( that means call detail record, telephony ) tables I let them insert into "normal" monthly partitions, but when they are some month olds I coalesce them into fully sorted, unindexed, 100% filled yearly partitions. Those partitions are then put into an "historic" schema which I do NOT backup automatically, I only do it after the ( manually fired ) coalescing and the ( manually done ) very ocasional long term fixups ( I rarely have to touch nothing older than a month ). Regards. Francisco Olarte.
Re: Increased size of database dump even though LESS consumed storage
Thorsten: On Wed, Feb 10, 2021 at 12:46 PM Thorsten Schöning wrote: UUppzz, completely missed the second message. > > CREATE INDEX idx_datagram_for_time_window ON datagram USING btree > > (src_meter, captured_at DESC); Depending on the query and data "shape", this may be chosen or not for queries. You should be the one more probable to know it. Francisco Olarte.
Re: getting tables list of other schema too
Atul: On Wed, Feb 24, 2021 at 11:24 AM Atul Kumar wrote: > yes I know that, but my doubt is why \dt is showing tables of other > schemas even I am setting the search_path. AFAIK dt list "tables", not "tables in the schemas in search path". It states " By default, only user-created objects are shown; supply a pattern or the S modifier to include system objects.", but these sys schema does not seem to be a system one. Francisco Olarte.
Re: getting tables list of other schema too
On Wed, Feb 24, 2021 at 12:12 PM Francisco Olarte wrote: > AFAIK dt list "tables", not "tables in the schemas in search path". > It states " By default, only user-created objects are shown; supply a > pattern or the S modifier to include system objects.", but these sys > schema does not seem to be a system one. Zap it, after (incorrectly after posting) searching for "schema" in the docs I found, in a paragraph far, far, away, \d* uses search path. Francisco Olarte.
Re: getting tables list of other schema too
Atul. Due to your top posting style and not being a native english speaker I'm unable to understand your question. As all the quotes at the bottom seemed to belong to me, I'm assuming you referred to some of my postings. As the last one said I tried to point a thing after checking some docs, then I noticed some error in my part and tried to tell everybody to ignore my previous post. On Wed, Feb 24, 2021 at 3:16 PM Atul Kumar wrote: > > I am sorry but I am not clear from your response, as I have created > another instance with same version 9.6 but there no system schema or > its tables are visible. Regarding these, I do not know what a "system schema" is May be "public". Note my post talked about "user cretaed" vs "system" ( I?ve normally observed those are things like information schema vies, pg_class an similar tables, which are always there after DB creation and are normally needed for the server to work ) OBJECTS. Also, naming an schema "sys" does not make it a system schema. Francisco Olarte
Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines
Hi Daniel: On Fri, 11 Mar 2022 at 19:38, Daniel Verite wrote: > > These values are 'normal'. I'm not use to CSV, but I suppose > > such newlines > > must be encoded, perhaps as \n, since AFAIK CSV needs to be 1 line per row, > > no? > No, but such fields must be enclosed by double quotes, as documented > in RFC 4180 https://datatracker.ietf.org/doc/html/rfc4180 CSV is really poiosonous. And in the multiplan days, which was nearly RFC4180, it was tolerable, but this days where everybody uses excel to spit "localized csv" it is hell ( in spain it uses ; as delimiter because it localizes numbers with , as decimal separator, you may have similar problems ). Anyway, I was going to point RFC4180 is a bit misleading. In 2.1 it states: >>> 1. Each record is located on a separate line, delimited by a line break (CRLF). For example: aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF <<< Which may lead you to believe you can read by lines, but several lines after that in 2.6 it says >>> 6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example: "aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx <<< Which somehow contradicts 2.1. In C/C++ it's easily parsed with a simple state machine reading char by char, wich is one of the strong points of those languages, but reading lines as strings usually leads to complex logic. Francisco Olarte.
Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines
Dominique: On Fri, 11 Mar 2022 at 21:13, Dominique Devienne wrote: > But sure, if TEXT does the kind of pseudo-CSV I need, I'd change it to use it. Text, the original format for copy, is much easier to manage than CSV. It can easily be managed as you can split the whole input on newlines to get records, split each record on tabs to get fields, then unescape each field. In C++ you can easily read it a char at a time and build along the way or, if you have a whole line, unescape it in place and build a vector pointing two the buffer. If you are testing, the split on newline/split on tab approach gives you a list of escaped strings easily compared to escaped patterns. I've never had problems with it in decades, and in fact I use a extension of it ( with a \E code similar to the \N trick for nulls for 0-element lines, which are not useful in db dumps, but I and I need to use, as "\n" decodes to {""} but I need to express {}, which I emit as "\\E\n". It is and old problem, "join by tabs join by newlines" makes things "prettier" but can lead to no final new line and no way to express empty sets, "terminate with tabs terminate with newlines" leads to uglier/harder to read lines but can express them). Francisco Olarte.
Re: Fresh eyeballs needed: input into error [FIXED]
Rich: On Wed, 27 Apr 2022 at 14:38, Rich Shepard wrote: > On Tue, 26 Apr 2022, Bruce Momjian wrote: > > I am curious what OS psql was using that was fixed by a re-login? > This desktop's running Slackware64-14.2. Where you using cut&paste / selection or something similar? Lately I've been getting weird errors on paste operations due to bracketed paste, which are puzzling, It seems to have interactions with readline and other stuff depending on where the cursor is and some other things. Last I remembered when pasting some things to guile invisible chars ( paste brackets ) where intercepted by its REPL leading to bizarre errors. And sometimes after a relogin things are done a bit different and make problem disappear. Francisco Olarte.
Re: External psql editor
Hi Rich: On Fri, 29 Apr 2022 at 17:55, Rich Shepard wrote: > I do all my postgres work using the psql shell. Editing a command reguires > moving character-by-character and I'd like to use my small text editor (joe) > because it allows more control over line movement. I do a similar thing, but normally edit queries in an editor window and just use selection or clipboard to paste them into the xterm where I have psql running. I also used joe a lot ( its key sequences where easy coming from wordstar(cp/m->msdos) ). > A web search found a stackexchange thread that suggested adding to > ~/.bash_profile the line: > export PSQL_EDITOR=/usr/bin/joe > so I did this yesterday. > > Today I've learned that the keyboard chords I use in joe in other > applications aren't working here. For example, C-w should delete the word to > the right of the point (cursor location). It doesn't. Instead, it deletes > from the cursor postion to the head of the line. C-x doesn't move the cursor > one word to the right, but cancels the command. What do you mean by "here"? IIRC PSQL_EDITOR sets the editor for \e, not for the psql command line. For that you could try writing a binding for readline ( which you could also use in bash if you like them ) with the joe keyseqs, but I fear it's editing model is a nit different. I've done "PSQL_EDITOR=joe psql" to refresh my memory and it is in fact as I remembered. > Is there a way for me to specify use of joe at the psql command line? (I'd > use emacs but that's gross overkill.) The use of joe AS EDITOR for a single command can be done with the above method. The use of joe keys for editting the normal psql line would probably require readline wizardry. I've read joe has slave shell sessions. Other thing you could try ( I've done it with emacs shell mode ) is use that, but I fear it only works well with single line queries. Or look if it has some kind of sql modes ( interactive sql, not sql-script-syntax-highlight ). Francisco Olarte.
Re: Backing up a DB excluding certain tables
Jorge: On Wed, 4 May 2022 at 18:12, JORGE MALDONADO wrote: > I just found the solution (or 1 solution). > It seems that the problem resides in tables with names containing characters > other than lowercase letters. > I want to exclude the following tables from the backup: > AspNetRoleClaims > AspNetRoles > AspNetUserClaims > AspNetUserLogins > AspNetUserRoles > AspNetUserTokens > AspNetUsers > __EFMigrationsHistory > > One pg_dump command that worked correctly is as follows: > > pg_dump -f c:/temp/respaldo.backup -n riopoderoso -F p -h localhost -p 5433 > -U postgres -W -s -T *.?sp?et* -T *.*igrations?istory riopoderoso > > As you can see, uppercase letters and underscore characters were avoided in > both -T arguments. In this way, the dumped file was generated successfully > without the files listed above. > It was not even necessary to issue the command cmd.exe /c chcp 1252 in the > command prompt before running pg_dump. > I also tried different combinations of single quotes and double quotes but > none worked. > This behavior of characters other than lowercase letters in table names is > present no matter if the database is originally created with UTF8 encoding. > The problem persists in such a case too. When I initially saw your question talking about chcp and similar stuff I skipped it, not having used windows for more than a decade. The problem you are seeing may be due more to your shell/OS combo than to other things. In unix, where psql is easier to work with, to execute a program the OS passes it the arguments vector properly separated. The different shells are responsible for building these arguments, process wildcards and other things an have very well documented quoting rules to let the user generate exactly what he wants passed to the programs. This is why in unix we have to quote * whenever it appears in a table name and similar stuff. IIRC In MSDOS mode, whith windows inherited, the shell does some substitutiton to the command line and then passes whole to the process, which, if it is a C program, is then responsible of reparsing it, reprocessing quotes, expand wildcards and call main. Quoting is poorly documented and processing may vary for each program using different CRTs ( that is C runtime, not cathode ray tube ). Now for the thing. I'm going to use braces for quoting... If you need to send a table name with uppercase letters to postgres related programs you normally need to insure it receives it in argument with double-quotes, i.e. {"AspNetRoles"}. In unix this easy, in the bash shell I use the easy way is to surround the double quotes with single quotes, {'"AspNetRoles"'} ( you may need a non proportional font to read that), or just escape the quotes {\"AspNetRoles\"} or several other variations. But in windows the quoting rules are difficult to master, and I'm not sure if you can do that easily ( back in the days I had a program which dumped the arguments it received to check what the shell was doing to my command line ). At the end of https://www.postgresql.org/docs/14/app-pgdump.html there is a sample double quoting, but I do not know if this is for a windows shell. It would work with unix shells, but is a little convoluted, so may be it is the thing to try ( try it with an schema only dump redirected to dev/null ( I do not remember it is windows name, I think it was NUL: )). An I remember pg_dump had a catalog mode, but it seems to have dissapeared in recent versions ( you gave it a switch, it wrote a commented list of IDs which you could edit ( normally avoiding reorders ) and feed back with another switch to make partial backups, it was really useful for selective backups of complex stuff ). Francisco Olarte.
Re: Backing up a DB excluding certain tables
Replying to self... On Wed, 4 May 2022 at 19:19, Francisco Olarte wrote: > An I remember pg_dump had a catalog mode, but it seems to have > dissapeared in recent versions ( you gave it a switch, it wrote a > commented list > of IDs which you could edit ( normally avoiding reorders ) and feed > back with another switch to make partial backups, it was really useful > for selective > backups of complex stuff ). Got it wrong, after consulting some old notes it is pg_restore which has the -l/-L switches, what we did, was full-dump a db for upgrading, write TOC, edit TOC to restore only schema on some tables, restore it faster, go live, then use rest of the TOC to restore the missing tables ( they where huge logs, just affected reports which could wait ). Francisco Olarte.
Re: Psycopg3 fails to resolve 'timezone localtime' on MacOS
Jerry: On Fri, 6 May 2022 at 03:04, Jerry Sievers wrote: > Has anyone run into This? > Psycopg3 fails to resolve timezone localtime on MacOS (Catalina). > > It falls back to UTC regardless of whether running with/without the > tzdata package which I did try. > > There is a /etc/localtime symlink on this box pointed at the correct > file in the tz data directory, which apparently doesn't get used. > > Adding a temporary symlink directly into the tz data directory got > things working but I'm skeptical of such a hack. I'm not using Mac, but Linux, and not python, but I think you have some concepts mixed. In Linux, and it seems Mac is pretty similar, to define a timezone you put something readable in tzfile(5) format ( i.e. a file or a symlink to a file as you did ) in /usr/share/zoneinfo ( this is the "tzdata directory" ). Then, to set the system default timezone you copy ( or link ) a tzfile in /etc/localtime. Note this DOES NOT define a named timezone that you can use. And there is NOT a timezone named "localtime". Normaly you use a symbolic link from etc/localtime to the timezone directory, which has the nice side effect of letting you read the link to know the name, but it does not have to be. I think this is leading you into thinking "localtime" is a valid name, which is not, or was not until you defined it but creating the name by adding the temporary symlink. If you want to have a zone named "localtime" pointing to the default zone, adding a symlink in the tzdata dir named "localtime" ( directory entries is what define the names ) pointint to /etc/localtime will do it, but I would not recommend it. Your errors are pretty extrange. Are you exporting TZ or whichever env var MacOs uses or a similar thing set to "localtime" in any of the relevant environments ( i.e., the server or client process ), or setting the server timezone to "localtime"? Because normally when you want "localtime" what you want is the process system default timezone, so you do not set anything in the process ($TZ) included, and let the library fallback to the system default timezone ( unnamed ). This seems like some configuration problem. Framcisco Olarte.
Re: generated column cast from timestamptz to timestamp not OK.
On Fri, 13 May 2022 at 12:47, alias wrote: > CREATE TABLE test_g ( > a timestamptz, > b timestamp GENERATED ALWAYS AS (a::timestamp) STORED > ); > then an error occurred. >> ERROR: 42P17: generation expression is not immutable Cast to timestamp uses current session time zone, current session time zone is not inmutable. Try forcing the time zone ( a at timezone $whatever ) ( or use a view if you need it ). > However the following 2 commands is ok. ... >> b bigint GENERATED ALWAYS AS (a::bigint) STORED ... >> b text GENERATED ALWAYS AS (a::text) STORED Because conversion from int to bigint is inmutable, just extend sign bit, and from varchar to text too, they are the same. > I didn't get it. timestamptz changes then timestamp also changes. timestamp > is part of timestamptz... I think you are falling in the common misconception that a timestamp is something like "MMDDhhmmss" and a timestamptz is the same plus a time zone. They are not ( look at the docs, they both have the same size ). A timestamptz dessignates a point in time, is like a real number, which is printed ( and read by default ) in the timezone of the user session. It DOES NOT STORE A TIME ZONE. Its external ( text ) representation varies with the session parameters. Use set timezone and play a bit to see it. A timestamp is more or less the same, but is always printed/read as if it where in the UTC timezone, and the time zone is not printed ( as it is always the same ). But inside is also just a real number. It is just a compact and convenient way of storing "MMDDhhmmss", calculate the timepoint in that utc time and store it. To convert between them in a inmutable way you need to specify how and use the at tz operator. Try it in a command line. FOS
Re: How to get response message
On Fri, 10 Jun 2022 at 18:38, Rama Krishnan wrote: > > Hi All, > > I am want to delete old records using function so my senior has function like > below but I want to get response of this particular inside query wheter it is > successful or failure > How to get response of the function status > Drop table test_old; > Create table test_old as select * from sales where bill_date '1 year'; > Delete table sales where sales_id in (select sales_id from test_old; I do a similar thing routinely and use a "move", ( insert into archive delete from live where yadayada returning whatever ). I suppose you could do a simiar trick. drop table test_old; -- Beware of this, it makes your functiondangerous, if you execute it twice you loose data. create table test_old as delete from sales where bill_date
Re: ADD COLUMN ts tsvector GENERATED too slow
On Wed, 6 Jul 2022 at 11:55, Florents Tselai wrote: > Also, fwiw looking at top the CPU% and MEM% activity, looks like it does data > crunching work. ... > >> On 06.07.22 10:42, Florents Tselai wrote: > >>> I have a beefy server (40+ worker processes , 40GB+ shared buffers) and a > >>> table holding (key text, text text,) of around 50M rows. > >>> These are text fields extracted from 4-5 page pdfs each. How big is yout table? from your query it seems you expect more than 1M-1 ( left... ), but if you have very big text columns it may be spending a lot of time fully decompressing / reading them ( I'm not sure if it left(..) on toasted values is optimized to stop after reading enough ). Also, it has to rewrite a lot of data to insert the columns, it it takes some ms per row which I would not discard 50M rows * 1 ms / row = 50ksecs = 500k secs ~=13.9 hours per ms-row, so at 2 ms ( which may be right for reading a big row, calculating the vector and writing an even bigger row ) it would take more than a day to finish, which I would not discard given you are asking for a heavy thing. If you have stopped it I would try doing a 1000 row sample in a copied table to get an speed idea. Otherwise, with this query, I would normally monitor disk usage of disk files as an indication of progress, I'm not sure there is another thing you could look at without disturbing it. FWIW, I would consider high mem usage normal in these kind of query, hi cpu would depend on what you call it, but it wouldn't surprise me if it has at least one cpu running at full detoasting and doing vectors, I do not know if alter table can go paralell.. Francisco Olarte.
Re: ADD COLUMN ts tsvector GENERATED too slow
Hi Florents: On Wed, 6 Jul 2022 at 12:35, Florents Tselai wrote: > 50M+ rows and iirc pg_relation_size was north of 80GB or so. Somebody already mentioned pg_table_size, but anyway this gives you 1.6Kb per row, which I would suspect is extremely low given your pdf content and the 1M truncation you preventively applied. When working with this kind of sizes it really pays to do some good ole "back of the envelope" calculations and/or some tests with small batches. Bear in mind if you have, say, 10k per row ( which I would label as really low for pdf content ) you are going to have 500Gb of data, if you manage to process this at 100Mb per second you will have more than an hour ( and I think I'm estimating really low ). FOS
Re: postgresql generate ddl returns FK with `()` in it
On Fri, 8 Jul 2022 at 12:37, Roman Gavrilov wrote: > I have a schema with two tables, where the first table's composite 2-columns > FK points to the second table's 2-columns composite PK. But the order of > columns is reversed. Note sql does not look at the NAMES of the foreign key columns or the primary key columns, it matches them by position. Also note you will have a higher success probability in this kind of question if you provide the table definitions postgres is using, pg_dump with the -s, --schema-onlydump only the schema, no data -t, --table=PATTERN dump the specified table(s) only options is a simple way to get those. You have not even cited which programs has this "generate ddl" command. > When I use `Generate DDL` on the first table, FK definition comes back as: > ALTER TABLE eval.rubric_questions ADD CONSTRAINT rubric_questions_fk FOREIGN > KEY (rubric_id,rubric_version_id) REFERENCES (); > How can I fix that `();` issue? Is this documented behavior? > (I realize that I need to fix the schema, but in time-being I need to > generate code using entity framework, and it chokes in that). Ah, you expect people to hunt your link to SO: > Full details on SO: > https://stackoverflow.com/questions/72898296/what-is-in-postgresql-ddl This is frowned upon and lowers your success rate, AAMOF I've only replied to this because I'm waiting for someone to get out to lunch and burning some minutes, normally I would have just bitbucketed your mail. >From the SO one can see you are using DBbeaver, If I copied it right. This seems to be a bug in it, not in postgres. You can have a pk on columns (a,b,c,d) and have an fk on columns (w,x,y,z), in fact it is common, like in the classical tree (pk=id, fk=parent_id references same table). But anyway, be careful, postgres does not match columns in keys by name, but by position, and from the little detail you provided in SO it appears that aside from trigering a bug in dbbeaver your schema is totally wrong and will malfunction. Post more details and someone may be able to help you. Regards. Francisco Olarte.
Re: Error when pasting function blocks into psql
Hi Ludwig: On Fri, 8 Jul 2022 at 12:37, Ludwig Isaac Lim wrote: > I noticed that I'm getting errors when copy and pasting code for stored > procedure from Windows directly into psql running in putty. > To reproduce create a procedure that has multiple statements with each > statements having multiple lines, similar to the ones below: ... > Copy and paste from Windows editor (or IDE) to a psql running in a Putty. > Chances are it will complain of syntax error. The error doesn't appear when I > copy and paste it to an empty file in vim, and the run: > psql -f > Is there a way of prevent this kind of error when pasting directly into psql? > I'm using the latest version of PostgreSQL (server and client. Version 14.4) You omitted the OS version which is running psql/vim, I assume it is some king of *ix, probably linux, and which kind of paste you are using in putty, IIRC it has a couple. Probably you are sending extra spaces or \015 or something like that which vim filters for you on save. You can probably confirm these by pasting your lines into "cat > file" ( remember EOF ing after paste, enter+^D is the usual way ) and trying to feed that file to psql. Not having used windows or vim in 20 years I can only think on one thing to help you with these kind of problems. You'll probably have od (octal dump) installed. Try pasting the code to 'od -tx1 -tc' which will give you a nice dump from where you can see if your IDE/EDITOR/putty is plain nasty tricks ( if you tried pasting into cat you can cat that file and the one from vim ). Francisco Olarte.
Re: - operator overloading not giving expected result
Hi Rajesh: On Fri, 8 Jul 2022 at 12:36, Rajesh S wrote: > We are migrating our database from Oracle to Postgresql. In oracle we have > used this syntax "SELECT ('1999-12-30'::DATE) - ('1999-12-11'::DATE)" to get > difference between two dates as a integer output (ex: 19). But in Postgres > the same query returns result as "19 days". Because of this we are getting > errors while assigning this query output to a numeric variable saying "ERROR: > invalid input syntax for type numeric: "1825 days"" and "ERROR: operator does > not exist: interval + integer". To avoid changing the application code in > many places to extract the number of days alone, we tried operator > overloading concept as below. Are you posting exact code? Because current doc states: date - date → integer Subtract dates, producing the number of days elapsed date '2001-10-01' - date '2001-09-28' → 3 I remember it always working that way and 8.0 docs document it that way too ( and I've used a lot of date arithmetic even in 7.x ) Maybe your EXACT code is not what you have pasted, but rather you inserting dates in timestamp columns and then substracting said columns? Because even your operations are defined in terms of timestamptz, not dates. Francisco Olarte.
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
On Tue, 19 Jul 2022 at 18:50, Sebastien Flaesch wrote: >> If that's the behavior you want, you can build it out of standard SQL >> facilities (e.g. update a one-row table). > Can you elaborate please? > Do you mean the code should use an UPDATE on a one-row table to acquire a > lock? That's the usual SQL way. You can, using the appropiate isolation level, select the max() from the table using the IDs ( but this will probably lock it whole ), you can use a one-row table with a current int column as a sequence, and set current=current+1, but this will lock everyone inserting ( and need an extra table ). You can use a name-current unique sequences table and update that ( it saves tables but it can worsen locking, as many engines locks whole pages ). The consecutive gapless sequence problem surfaces a lot, but there is no good way to solve it without locking, as once you get the value everybody else needing it has to wait until you decide whether to commit on rollback. Sequences are for when you need autogenerated keys. They use consecutive integer blocks nearly everywhere because they are easy to code, play well with btree indexes, correlate well with insertion times, but they give up the "no hole" approach to improve performance. They even can return values unordered in different sessions ( sequences acquire, and burn, blocks in a locking way, when you need high concurrency you make them return big blocks, but them values from different sessions can be delivered out of order ). If you need correlative numbers, byte the bullet and code it, trying to do it with sequences will probably end with an approach which will fail in some corner cases and be slower than just coding it with a single row table ( just remember to check it, so you do not have the fun which ensues in oracle when someone puts two rows in dual in the old days ). All the queries you have posted are probably more complex than a couple procedures to update and read a table, even using eval to simulate named sequences if you want ( just remember to always pre-increment before reading, there is a reason everybody does it that way ). As a remark, in Spain bill numbers need to be gapless increasing. I have done it with a sequence ( with cache 1, so I doubt it is much more performant than a single row table, as it has a backing element which needs to be saved ), and just used another process which swept the values and filled the gap a posteriori ( there is a way to fill a gap, has it legal uses, you can use it for that as long as you do not have too many, basically insert cancelled bills in the gaps ). I probably would do it with the one-row table now for simplicity, I doubt I gained enough performance to justify the 20-something perl lines plus the crontab entry to use sequences. As beard turns grey I apreciate more the simpler ways to do things. Francisco Olarte.
Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
Hi Gavan. On Wed, 20 Jul 2022 at 00:10, Gavan Schneider wrote: > On 20 Jul 2022, at 4:08, Francisco Olarte wrote: > As a remark, in Spain bill numbers need to be gapless increasing. I > have done it with > > One answer to this problem has been around for a while, and my version is > shown below. If I read it correctly, leaving the 700 stuff aside, this function gives you the first free transaction_ref. This is not valid for my problem, I need them increasing, 1-3-2 is not a valid sequence. The trick I use is that I can have "dummy" records, which do not have real data ( or dates, which is what, among other things, makes me need them increasing ), so I generate 1-3-4 and then insert 2 in batch which values adequate for legal (i.e., I can just use 1.date for 2.date ). I think what you do is generate account numbers which should be gapless in the medium term and should be increasing as needed, wherte the gapless is more important to you than the increasing. > No extra table is needed, just a windowed look at the table where the > reference is used. It is normally never needed, thay are used solely for performance and to avoid locking. A current_transaction_seq > My usage is at human speed so performance is not an issue but it should be > pretty good if there are minimal holes. What is not addressed is that a > reference can be reissued upto the time the calling process commits an entry > in the table and takes that reference out of circulation. There are different > approaches to handling/preventing such collisions. Your construct seems pretty expensive, and I'm not sure how much locking it does at serializable. Also, given it needs recovery ( the single row table plus back-fill does not, if you consider the back-filling "bussines as usual" ). Also the reissuing of a number is a no-go in my automated systems, it would need extensive recovery, in that case I can use a gap-less approach by simpy selecting max(id) ( and given it is indexed, using a select for update on the max record by ordering desc and limit 1, if I'm not too confused this would give no gaps, increasing at the expense of reduced concurrency ). ... > $COMMENT$ > Determines the next available reference number, making sure to fill any holes. No going back allowed in my case. Simplifying it, a bill may reference an older one in its data, and its number MUST be less. The gap problem is there MUST be NO gaps WHEN I "print" them. And I can use dummies, but I SHOULD not use them. In practice, you rarely use more than a couple of dummies a year. I could strictly print dummies when I need them, but they MUST be always the same, and the easier way is to insert them. Francisco Olarte.
Re: How does postgres sort large strings?
On Fri, 22 Jul 2022 at 16:46, Sergey Burladyan wrote: > I thought that the sorting values are stored entirely in work_mem, but in > fact it works somehow differently. > Can anyone suggest how this works? In the classic go by chunks way? To sort values you need to compare them, to compare strings you do not need the whole string, i.e. if you have to 1000 byte strings, one is 500A,500B, other is 1000A, to compare them ( using C locale, others can be done in a similar way ) you can read 10 bytes from each and compare, if they are the same, read 10 more, if they are not you are done, if you hit the end of both strings, they are equal, if you hit the end of one ( the shorter ), that one goes first. You can even do it a character at a time. In the example, after looping 50 times on 10A you hit 10B, 10A, second string goes first, you do not even need to look at the rest. A char at a time will end on the 501 char. And probably PG can compare the strings in the shared buffers, so it only needs some housekeeping information in work mem, and rely on its infrastructure to bring the contents into shared buffers. I do not think you are estimating memory usage right. Francisco Olarte. > > For example, I created this 512MB incompressible file and test table: > > tr -dcs '[:print:]' '[:print:]' < /dev/urandom | tr -d '"'\' | dd bs=1K > count=512K of=asciidump > > create unlogged table t1 (v text); > insert into t1 select pg_read_file('asciidump') from generate_series(1, 10); > > select pg_column_size(v), octet_length(v) from t1 limit 1; > pg_column_size | octet_length > +-- > 536870912 |536870912 > > set work_mem to '64MB'; > > Now I think that 64MB is not enough to sort such large values and postgres > will use temp files, > but in fact it does not. > > select temp_files, temp_bytes from pg_stat_database where datname = > current_catalog; > temp_files | temp_bytes > + > 0 | 0 > > explain (analyze,verbose,buffers) select v from t1 order by v; > QUERY PLAN > --- > Sort (cost=94.38..97.78 rows=1360 width=32) (actual time=6433.138..6433.140 > rows=10 loops=1) >Output: v >Sort Key: t1.v >Sort Method: quicksort Memory: 25kB >Buffers: shared hit=543881 read=679794 written=118012 >-> Seq Scan on public.t1 (cost=0.00..23.60 rows=1360 width=32) (actual > time=0.007..0.009 rows=10 loops=1) > Output: v > Buffers: shared hit=1 > Planning Time: 0.035 ms > Execution Time: 6433.155 ms > > > Sort Method: quicksort Memory: 25kB > > select temp_files, temp_bytes from pg_stat_database where datname = > current_catalog; > temp_files | temp_bytes > + > 0 | 0 > > WOW! How does it work?! :-) > > -- > Sergey Burladyan > >
Re: Password reset link / 'less' does not exit in psql version 13.4
Michael: On Mon, 25 Jul 2022 at 12:01, Michael J. Baars wrote: > 2) I have three psql clients running, a version 12.6, a version 13.4 and a > version 14.3. Until now a 'select * from table;' showed the output in 'less' > or something alike and exited from 'less' when > the output was complete. Both version 12.6 and version 13.4 work that way. > Version 14.3 does not exit from 'less' when the output is complete. Did > anyone notice this already? AFAIK psql just pipes and waits ( and I've piped through some programs of mine which could normally show bizarre behaviour if psql did otherwise )This seems like you tuned the pager command or the default less options, from man less: >>> -e or --quit-at-eof Causes less to automatically exit the second time it reaches end-of-file. By default, the only way to exit less is via the "q" command. -E or --QUIT-AT-EOF Causes less to automatically exit the first time it reaches end-of-file. <<< This can come from a variety of sources, see man less, search for lesskey, ... Check also PAGER and PSQL_PAGER, and .psqlrc and friends, you may have some things there. Francsico Olarte. > > Best regards, > Mischa Baars. > > > >
Re: Purpose of DEFERRABLE _and_ INITIALLY DEFERRED foreign key constraint checking?
Ron: On Thu, 4 Aug 2022 at 02:30, Ron wrote: > DEFERRABLE > NOT DEFERRABLE > This controls whether the constraint can be deferred. A constraint that is > not deferrable will be checked immediately after every command. Checking of > constraints that are deferrable can be postponed until the end of the > transaction [/quote] > But yet a DEFERRABLE FK constraint in a transaction immediately failed on a > FK constraint violation. Because, as the name hints, it is DEFERRABLE not DEFERRED. > INITIALLY IMMEDIATE > INITIALLY DEFERRED > If a constraint is deferrable, this clause specifies the default time to > check the constraint. If the constraint is INITIALLY IMMEDIATE, it is checked > after each statement. This is the default. If the constraint is INITIALLY > DEFERRED, it is checked only at the end of the transaction. > INITIALLY DEFERRED solved my problem. Why do both clauses exist? Note DEFAULT TIME. Both claused exists because you can change that with SET CONSTRAINTS. Not knowing your case I cannot comment on it, but one use case I've found is when your system NORMALLY holds constraints valid across statements, so you have DEFERRABLE INITIALLY IMMEDIATE to check it and fail early and loudly on any bug. But you have some SPECIAL_OP (tm), which only holds constraints at transaction end, because it does some carefully controlled manipulations. You do this one with SET CONSTRAINTS DEFERRED ( and triple testing as you have weakened your safety net ).You can even toggle it according to the docs, i.e. toglle a constraint to deferred, do several statements which end up in a valid state, toggle to immediate to check it is really valid. It's all in the set constraints docs. > (A naive interpretation just by looking at the clause words led me to think > that INITIALLY DEFERRED would not check record validity when a constraint is > added to a table, but obviously that's wrong too.) Not that obvious, but true. In the effort to make it read nice, like natural languages, SQL is difficult to interpret precisely, like natural languages. FOS.
Re: Behavior of identity columns
On Thu, 4 Aug 2022 at 10:52, sivapostg...@yahoo.com wrote: > I see 'identity' column values increment even when some error(s) occurs while > inserting data. Is that a known bug or known behavior? Known behaviour, explained thousand of times over the years. Identity is not a way to generate consecutive values, not ever guaranteed ascending values, it is for generating unique values. The implementation burn the values before commit/rollback for better concurreency, search archives or docs for details. In your example, id is allocated and burnt to generate the complete failing row, when it fails it is gone. Search lists, google, the docs, its been explained ad nauseam. Francisco Olarte.
Re: Creating constraint dynamically
On Mon, 22 Aug 2022 at 09:29, sivapostg...@yahoo.com wrote: > We populate constraint string dynamically and add it to the table with alter > table command. It gets added, but without the required brackets. Brackets are NOT required in your example. You are doing a classic form, OR of ANDs, sometime called sum of products. And has greater precedence than OR ( amongst other things because this form of conditions is used a lot ), as it has been pointed, so brackets ( round brackets are usually called parentheses abbreviated to parens, in programming brackets is normaly nor used for those ) are not necessary. What the system is doing is parsing and normalizing your condition ( note how it changes bpchar to text and add types in some places ) and reconstructing the condition, without brackets because they are not required. > Since there are AND and OR conditions, without brackets the whole conditions > becomes useless. Before stating these things, test, try to insert a condition violating row to see if the condition is really useless, it is very rare to find a bug ( changing the semantics of the condition would be one ) like these. > How to create a constraint like the above one, with braces in tact ? Or any > other way that we can implement a check constraint as above? The condition IS correctly implemented, but nothing guarantees you the text returned by pgadmin will be the exact same condition you sent. In fact, AFAIK, nothing guarantees you can recover a condition set on a column. It is transformed to an equivalent. I'm not sure how it is exactly done, but it probably also does whitespace normalization and constant folding. Francisco Olarte.
Re: Resolving host to IP address
On Mon, 12 Sept 2022 at 14:23, Sebastien Flaesch wrote: > I managed to establish the secure connection, by using > DN=root.strasbourg.4js.com for the self-signed root CA, and > DN=toro.strasbourg.4js.com for the server certificate, DN=pgsuser for the > client certificate. > I have created my client certificate by using the root CA. > pg_hba.conf: > hostssl all pgsuser toro.strasbourg.4js.com md5 > clientcert=verify-ca > > Server and client are on the same Debian 11 machine. > It works, if I comment out the /etc/hosts line set by Debian Linux for my > host name: > # 127.0.1.1 toro.strasbourg.4js.com toro > The name "toro" is then resolved to the IP address provided by my DHCP server: > root@toro:~# host toro > toro.strasbourg.4js.com has address 10.0.40.61 > > root@toro:~# host toro.strasbourg.4js.com > toro.strasbourg.4js.com has address 10.0.40.61 > > However, if I put the 127.0.1.1 line back in /etc/hosts, re-create all > certificates (is this actually needed? I guess no), restart the PostgreSQL > server, I get this error: > $ psql > 'postgresql://toro.strasbourg.4js.com:5437/test1?user=pgsuser&sslmode=verify-ca&sslrootcert=./root.crt&sslcert=./client.crt&sslkey=./client.key' > psql: error: connection to server at "toro.strasbourg.4js.com" (127.0.1.1), > port 5437 failed: FATAL: no pg_hba.conf entry for host "127.0.0.1", user > "pgsuser", database "test1", SSL encryption > What looks strange to me in this error is that first it mentions 127.0.1.1 > (ok) but then, 127.0.0.1 (not having your full data, guessing a bit on typical configs here ). Your loopback interface, "lo" which is used to connect to net 127.* has probably the address localhost=127.0.0.1. Postgres is probably binding to wilcard address. So when you tell psql to connect to 127.0.1.1 it starts the tcp connection selecting the interface address as source, per the route table, so your connection is source=127.0.0.1, destination 127.0.1.1. The error message up to the "failed:" is probably psql telling you where it sent the connection, to toro=127.0.1.1. The second part is the server telling you where it sees the connection comming from. > What am I missing here? Probably some tcp tracing to see it in action. If you bind to *:5437 you can receive connections on any 127.* address. Your hosts uses this trick for unknown reasons. When you zap the host line everything works well because your interface is probably 10.0.40.61, so the route table says use 10.0.40.61 as source. You would probably experience the same problem if you added a second IP address, say 1.2.3.4 to your interface and told dhcp to resolve toro to it. In localhost you do not have to do anything because any 127 address can be used as source or connected to in loopback, it is magic. Also, if you want it to work you would need a second hostssl line listing localhost as the source address, or, IIRC, you can try to force the source address for connections to be toro using some deep magic jujus, as psql does not seem to suipport setting it. Francisco Olarte.
Re: Resolving host to IP address
A little off topic but may be useful for someone: On Mon, 12 Sept 2022 at 16:11, Tom Lane wrote: ... > Linux doesn't seem to show its routing of the loopback domain in netstat > output, but it's probably much like what macOS shows explicitly: Among other things it can be shown by "ip route show", although it is a little too verbose: $ ip route show table all default via 192.168.0.1 dev enp39s0 onlink 192.168.0.0/24 dev enp39s0 proto kernel scope link src 192.168.0.2 broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 broadcast 192.168.0.0 dev enp39s0 table local proto kernel scope link src 192.168.0.2 local 192.168.0.2 dev enp39s0 table local proto kernel scope host src 192.168.0.2 broadcast 192.168.0.255 dev enp39s0 table local proto kernel scope link src 192.168.0.2 ::1 dev lo proto kernel metric 256 pref medium fe80::/64 dev enp39s0 proto kernel metric 256 pref medium local ::1 dev lo table local proto kernel metric 0 pref medium local fe80::2d8:61ff:fe9f:267b dev enp39s0 table local proto kernel metric 0 pref medium multicast ff00::/8 dev enp39s0 table local proto kernel metric 256 pref medium ( table local trims it a bit ). For debugging this things my first go is to "ip route get", which gives the selected one: $ ip route get 127.1.2.3 local 127.1.2.3 dev lo src 127.0.0.1 uid 1000 cache Francisco Olarte.
Re: Resolving host to IP address
Hi Sebastien: On Mon, 12 Sept 2022 at 16:40, Sebastien Flaesch wrote: > I think I got it: PostgreSQL should listen to the real, non-loopback network > interface. Not really. Your problem is not where postgres listens, but where your host line resolves, the 127.0.1.1 stuff, plus your rules. In many OS you can bind a listening TCP socket to IN_ADDR_ANY plus a port. The good thing of doing it is it will work even if you add / delete new IP addresses. Postgres does this and, IMO, is a good thing. You seem to have it configured that way- > Just for info (local dev config, not prod): > sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ grep listen_addresses postgresql.conf > listen_addresses = 'localhost,toro.strasbourg.4js.com' No, you do not, you list your interfaces.. Unless you have a complex setup, which does not seem to be the case, I would recommend just using the default "*", specially if you intend to run your servers firewalled ( which 10.xxx hints to ). This would lead to something like .. tcp0 0 0.0.0.0:54320.0.0.0:* LISTEN tcp6 0 0 :::5432 :::*LISTEN ( not done with pg, done with nc -l, but it worked the same last time I checked it ) > sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ netstat -tl -W | grep 5437 > tcp0 0 localhost:5437 0.0.0.0:* LISTEN > tcp0 0 toro.strasbourg.4js.com:5437 0.0.0.0:* > LISTEN > tcp6 0 0 localhost:5437 [::]:* LISTEN There is a piece of info missing here, where does your localhost resolve to. > sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ netstat -tl -W --numeric-hosts | > grep 5437 > tcp0 0 127.0.0.1:5437 0.0.0.0:* LISTEN > tcp0 0 127.0.1.1:5437 0.0.0.0:* LISTEN > tcp6 0 0 ::1:5437:::*LISTEN But from this it seems, if you have restored the 127.0.1.1 hosts line, to 127.0.0.1+::1 > sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ ip route get 127.0.1.1 > local 127.0.1.1 dev lo src 127.0.0.1 uid 1000 > cache This is expected. IMO your problem arises from your pg_hba.conf, and your bizarre ( IMO, I use debian too and one of the first things I do after installing is dropping the 127.0.1.1 line, but I used fixed IP on a desktop, so YMMV ). If you just lists localhost as origin ( which means you have to duplicate the pg_hba.conf lines, listen_address=* will work for both ) it will work right, whatever your listen_adresses is ). You can also RTFM a bit and notice the line "You can also write all to match any IP address, samehost to match any of the server's own IP addresses, or samenet to match any address in any subnet that the server is directly connected to." ( it is easy to miss, I had to search for it, even if I knew it existed ). Note: localhost is a name, like toro, and is resolved, samehost/samenet/all are magic. Sumarizing, I would: - Use listen_adresses=* - Use samehost in your rules. Which will lead to a lean config which probably does what you want. Francisco Olarte.
Re: Using CTID system column as a "temporary" primary key
On Wed, 29 Mar 2023 at 21:11, Sebastien Flaesch wrote: > Oh the use of default keyword is new to me, thanks for that. > But to make PostgreSQL more Informix-compatible, zero should have been > considered as well. NONONO please! Someone already pointed a sequence can generate zero, but even without that some of us may need to insert 0 ( i.e. because we deleted the row with the zero-id and want recreate it or just because, even if the default-generating sequence does not spit zeroes, the PHB wants us to insert is employee-record with ID=0 or other reasons). AFAIK serial just creates an integer column with a default, doc (8.1) says its range is from 1, but I'm not even sure this is enforced, when I've looked at the description of a serial column I do not remember seeing anything more then the default, so you could probably insert negatives. I , and I suspect others, would prefer to be able to insert any int than copying a hacky ( and I suspect non standard ) trick from informix. Just write to informix and suggest them to implement DEFAULT on inserts, it is much better ;-> . Being more informix-compatible may sound as a feature to yoy, to me it sounds like a misfeature, like needing DUAL to be more oracle-compatible. Francisco Olarte.
Re: Do BRIN indexes support MIN/MAX?
On Wed, 29 Mar 2023 at 22:07, Vladimir Sitnikov wrote: > > Is it correct that BRIN indexes don't support MIN/MAX operations ? > In theory, it should be possible to implement min/max scan support for BRIN, > however it is not implemented yet. > > Just in case, min/max query would require to read all BRIN pages, and then it > would require to read the corresponding pages in table. > For instance, imagine the table has N pages. Then BRIN would have N/128 pages > with the default pages_per_range=128, so your max(..) query would take N/128 > + 128 pages to read. In theory it would be sequential, however, under > concurrent load it might not be that sequential for the disk. I think BRIN would require N/128 RANGES, not pages, and if I am not mistaken it fits several ranges in an index page. It talks of summary tuples, and I suspect a summary tuple for say, an integer, is not gonna be longer, than 128 bytes, in which case you could fit 64 of them in a 4k page. Also, if you account for possible concurrent load disturbing your index+partial scan, you also have to account for the ( more likely ) disruption on the full scan. I.e., I have this table apc | apc_cdrs_p2022_12 | table | postgres | permanent | heap | 860 MB | N/128 pages implies N/128 bytes, so index would be 6.7Mb in your numbers, but apc | apc_cdrs_p2022_12_cuando_idx | index | postgres | apc_cdrs_p2022_12 | permanent | brin | 64 kB | apc | apc_cdrs_p2022_12_raw_id_idx | index | postgres | apc_cdrs_p2022_12 | permanent | brin | 64 kB | 1st one is on a timestamp column, second on an integer. And several empty partitions hace 48kB indexes, so it seems data is just 16k for the 860 ranges. That could be about 20 bytes/range which more or less fits to a couple of values. In my experience, BRIN are ridiculously small. I use them on that particular table because both cuando and raw_id correlate with insertion order and I normally only read several megabytes ranges indexed on them, so they work very well in limiting the scan range to nearly what I need. > For instance, 80GiB table would be like 10’000’000 pages, so the default BRIN > would take about 78’000 pages (625MiB), so the min/max scan would read 626 MiB > If pages per range is increased to ~3162, then index size would be ~3162 > pages (25MiB), and each index entry would cover 25MiB range. Then the query > would have to read ~50MiB to fetch min/max. It is not clear if that is really > practical though. If you assume your index fits 64 tuples per page your index read drops to about 10Mb, plus the 1Mb range. Also, I suspect you will have to read all unsummarized ranges ( probably before the summarized ones, as unsummarized can discard summarizeds, but not the other way ). Francisco Olarte.
Re: Using CTID system column as a "temporary" primary key
On Thu, 30 Mar 2023 at 10:01, Dominique Devienne wrote: >> 2) 0 can be a valid sequence value: > Of course. Yet, as above, if that is opt-in as specified in the `create > table` DDL somehow, then why not? > BTW, default and 0 are not the same thing. You cannot bind "default" in place > of > an integer-valued prepared-statement placeholder, in a binary mode insert. So > it is > definitely not the same thing. IMNSHO if you need to select between default and explicit in an insert via binding you have a design problem, and down this path lies madness. > So while I can accept that not implementing that particular informix > compatibility wart > is a perfectly valid position, for impl and maintenance cost, the arguments > I've read so > far can be "easily" side-stepped from a technical perspective I suspect. FWIW. Do not forget the runtime costs, once you start piling informix warts over oracle warts over access warts over sybase warts over mysql warts over sql server warts it adds up. I do not think postgres target should be compatibilty with (select sql_engine order by random limit n). Normally it tries to follow standards, and do something reasonable when not possible, but this informix wart sounds particularly worthless to implement. Beside your use case I do not think it would serve for anything else than encouraging people to use an ill dessigned informix feature. Francisco Olarte.