from:"Francisco Olarte"

Re: How to avoid trailing zero (after decimal point) for numeric type column

2018-03-01 Thread Francisco Olarte

On Wed, Feb 28, 2018 at 1:33 PM, pkashimalla
 wrote:
...
> I did insertion from java program with below code snippet
>
> Double object = 10.0;
> String inserRecordQuery_NEWWAY11 = "INSERT INTO BLOB_TEST_TABLE(id)VALUES
> (?)";
> selectPrepareStmt.setObject(1, object,Types.NUMERIC);
> int count = selectPrepareStmt.executeUpdate();

That's incomplete. But why are you using Double to bind a NUMERIC?
IIRC BigDecimal seems to me the natural java conunterpart to it.

>
> it inserted like this.
> /
> select id from blob_test_table;
>
> id
> numeric
> -
> 10.0/
>
>
> In this case, when a decimal point is equal to 0 then,  I don't want to see
> the precision and the value in the column should just 10

And the driver is supposed to know this how?

> And If I execute code,
>
> Double object = 10.5801
> String inserRecordQuery_NEWWAY11 = "INSERT INTO BLOB_TEST_TABLE(id)VALUES
> (?)";
> selectPrepareStmt.setObject(1, object,Types.NUMERIC);
> int count = selectPrepareStmt.executeUpdate();
>
> Now ,the value in the column should be 10.5801 as the precision is greater
> than ZERO

Doubles do not have precision ( well, they have, double precision ).
Also note doubles are binary, and that 10.5801 is not bounded in
binary. This is perl but you can see the thing:

$ perl -w -e 'printf q(%1$f %1$.20f %1$a %1$.20a)."\n", 10.5801'
10.580100 10.580099983800 0x1.52902de00d1b7p+3
0x1.52902de00d1b7000p+3

Note how the decimal varies with the requested precision, as 10.5801
cannot be represented exactly in binary, and how the hex
representation does not vary ( because hexadecimal has an exact binary
representation ).

> Because of this, the migrated data (from Oracle) is without PRECISION ZERO
> and the new data which is being inserted is with PRECISION ZERO.

I think you are hitting an implementation-defined behaviour of the
driver. To convert a double to a numeric you must select a precision,
and it seems oracle and postgres do it differently. I would try
BigDecimal which has less uncertainity.

> Oracle's NUMBER column type is handling it as I expected.

Oracle behaves as you are used to, and so is what you expect.

> I migrate Oracle's NUMBER column as just NUMERIC column in PostgreSQL

Probably using a BigDecimal would do the trick, as you can convert sql
numeric => java bigdecimal => sql numeric without losing info, but not
with Doubles. Bear in mind numeric is decimal, double is binary, and
one thing such simpel looking as 0.1 does not have an exact binary
representation:

$ perl -w -e 'printf q(%1$f %1$.20f %1$a %1$.20a)."\n", 0.1'
0.10 0.1555 0x1.ap-4 0x1.a000p-4

See the trailing 555 in the second? or the continous 1.(9) fraction
rounded to a in excess on the least significant place in hex?

In numeric you can hace 1.0, 1.0, 1.1
and 0.99, but in doubles they all map to
1.0

Francisco Olarte.

Re: Enforce primary key on every table during dev?

2018-03-01 Thread Francisco Olarte

Melvin:

On Thu, Mar 1, 2018 at 1:47 AM, Melvin Davidson  wrote:
> But this begs the question,
> why are "developers" allowed to design database tables? That should be the
> job of the DBA!

That's the DBA wearing her developer hat. ( I agree with the spirit )

Francisco Olarte.

Re: Enforce primary key on every table during dev?

2018-03-01 Thread Francisco Olarte

On Thu, Mar 1, 2018 at 9:20 AM, Alban Hertroys  wrote:

> Not to mention that not all types of tables necessarily have suitable 
> candidates for a primary key.

They do if they are in 1NF. ( no dupes alllowed )

> An example of such tables is a monetary transaction table that contains 
> records for deposits and withdrawals to accounts. It will have lots of 
> foreign key references to other tables, but rows containing the same values 
> are probably not duplicates.

That's a bad example. They would normally have a transaction id, or a
timestamp, or a sequence counter. PKs can expand all non-nullable
columns. You could try to come with a real example, but all the times
I've found these in one of my dessigns is because I didn't correctly
model the "real world".

> Adding a surrogate key to such a table just adds overhead, although that 
> could be useful in case specific rows need updating or deleting without also 
> modifying the other rows with that same data - normally, only insertions and 
> selections happen on such tables though, and updates or deletes are 
> absolutely forbidden - corrections happen by inserting rows with an opposite 
> transaction.

And normally you would need to pinpoint an individual transaction for
selection, hard to do if you do not have a pk.

Francisco Olarte.

Re: Enforce primary key on every table during dev?

2018-03-01 Thread Francisco Olarte

On Thu, Mar 1, 2018 at 1:07 PM, Rakesh Kumar  wrote:
...
> I routinely add surrogate keys like serial col to a table already having a 
> nice candidate keys
> to make it easy to join tables.  SQL starts looking ungainly when you have a 
> 3 col primary
> key and need to join it with child tables.

It does, but many times useful, let me explain:

table currencies ( code text, description text), primary key code (
i.e. "USD", "US Dollars" )
table sellers ( currency text, id number, .), primary key
(currency, id), foreign key currency references currencies
table buyers ( currency text, id number, .), primary key
(currency, id)  foreign key currency references currencies
table transactions ( currency text, seller_id number, buyer_id number,
trans_id number )
   primery key trans_id,
   foreign key currency references currencies,
   foreign key (currency, seller_id ) references sellers,
   foreign key (currency, buyer_id ) references buyers

This is a bit unwieldy, but it expreses my example constraint, buyers
can only buy from a seller with the same currency, there is no way to
insert a cross-currency transaction.

Of course, 3 femtoseconds after deployment the PHB will decide you can
do cross-currency sales.

Francisco Olarte.

Re: Enforce primary key on every table during dev?

2018-03-01 Thread Francisco Olarte

On Thu, Mar 1, 2018 at 5:22 PM, Steven Lembark  wrote:
> If you can say that "rows containing the same values are not
> duplicates"

Not a native speaker, but "Rows having the same values" seems to me
the definition of duplicate ( ;-), J.K. )

> then you have a database that cannot be queried, audited,
> or managed effectively. The problem is that you cannot identify the
> rows, and thus cannot select related ones, update them (e.g., to
> expire outdated records), or validate the content against any external
> values (e.g., audit POS tapes using the database).

Good point. All the times I've found myself with complete duplicates
allowed I've alwasy found the correct model is no duplicates + count
field ( with possible splits as you pointed below ). I would not have
a "marbles" table with (red, red, blue, white, red, white), I would
switch it to red=3, blue=1, white=2.

Francisco Olarte.

Re: Command to execute SQL string

2018-03-08 Thread Francisco Olarte

Hello:

On Thu, Mar 8, 2018 at 1:26 PM, Łukasz Jarych  wrote:
> i have a table when i have string in each row,
> for example "DELETE FROM t_test WHERE ID = 3".
>
> I want to execute for each row this command.
> It is possible?

I supose you mean execute the command contained in each row. Anyway,
it is easy with a procedure, look for example at

https://www.postgresql.org/docs/10/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN

just do a loop on your command table, follow the examples and you
should be done.

Francisco Olarte.

PS:
> Jacek
Is this the same as Łukasz or Jarych ?

F.O.

Re: Trigger to create string to inverse SQL statement

2018-03-08 Thread Francisco Olarte

Hello:

On Thu, Mar 8, 2018 at 1:32 PM, Łukasz Jarych  wrote:

>
>
> i have a trigger which is added log history:
>
>
>
>
I'll trust it is, but I cannot really see any thing in there, it may be
better if in the future you used something like psql and pasted the data in
a text format, which people with no-so-good eyesight like me can zoom in.

> It is possible to create additional column here with string with inversed
> SQL statement?
>
> So in this case: "Delete FROM t_trig WHERE ID=1".
>
> And what i want is to have possibility to loop through table and execute
> inversed sql statement for each row.
>

It should be. For inserts it is not that difficult, for updates YMMV, it
is difficult to reverse a command exactly, but you get new and old values
so you could write it. For deletes just reinsert. But I doubt there is a
function which does it for you, you'll have to code something, and I
suspect it may be easier to just capture op, old and new in your triggers
and use an external program to build the inverted queries.

Francisco Olarte.

Re: Trigger to create string to inverse SQL statement

2018-03-08 Thread Francisco Olarte

Hello:

On Thu, Mar 8, 2018 at 1:51 PM, Łukasz Jarych  wrote:
> what do you mean use external program to build inverted queries. Do you have 
> any examples?

Please, do not top quote, or the thread will get difficult to follow fast.

That being said. Capturing a DML in a trigger is relatively easy. On
recent postgres I've seen just using json to capture the whole new and
old rows, which looks promissing.

But then, building a 'rollback' DML is difficult to do in sql or
pl-pgsql, you would normally want a more normal programming language.
i.e., I would normally turn to perl for this, having used it since the
mid 90s.

Then, you have pl-perl, but this is difficult to debug/manage. So what
I would normally do is to just capture the changes in a trigger and
then have a program which queries the log table, builds the anti-query
and executes it ( no point in keeping it, since once you execute it
there is nothing to undo ). The beauty of this is you can take a
sample from your log table and easily test the program just
implementing a debug flag which prints the queries instead of
executing ( and does not touch the log table, which I assume a real
undoer will need to fro record-keepint ).

This is assuming the log is used as an "undo log", which is what I
would assume from the very scarce information I have. And this kind of
programs normally are seldom used, so postponing the query building to
a later time and logging minimal info fast is normally better ( is
like backups, you normally make backups faster, then optimize what you
can of restores, as ideally they would never be used, or transactions,
you normally optimize for commits first, then rollbacks ).

Francisco Olarte.

Re: Trigger to create string to inverse SQL statement

2018-03-08 Thread Francisco Olarte

Jacek:

On Thu, Mar 8, 2018 at 1:53 PM, Łukasz Jarych  wrote:
> maybe somebody wrote SP to invert update statement?

Maybe, but if you are going to ask for it you better define the
problem a little.

Francisco Olarte.

Re: Doubts about replication..

2018-04-20 Thread Francisco Olarte

On Thu, Apr 19, 2018 at 8:48 PM, Edmundo Robles  wrote:
> Yes, you are right, the replication is not  a backup ;)  , actually   i 
> backup  database  daily at  3:00 am but if database crash,  the amount  of 
> data is  big!  that is the reason i want to  replicate to reduce  the data 
> loss. By the way  a few days ago a job partner did a delete with no where.

Replication will happily replicate a delete with no where. Backup will
at least have the day starting point.

For this kind of things you need to do log-archiving + base backups.
With this you can use PITR to roll forward from the last full backup
to just before the delete.

Francisco Olarte.

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-10 Thread Francisco Olarte

On Thu, May 10, 2018 at 10:03 AM, Ben Hood  wrote:
...
> Am I missing the point of how Postgres stores TIMEZONE WITH TIME ZONE
> internally?

After reading in the follow up TIMEZONE is a typo for TIMESTAMP, yes, you are.

Aproximately, postgres stores ( virtually ) a point in the time line,
for both with and without ts types, same format. Something like a real
number.

The difference is mainly for transforming from/to text ( bear in mind
when you put a constant in a query you are trasnforming from text ).
In the with time zone case it formats/expects it as a time string in
the session configured time zone, in the without case it treats it (
aproximately ) as if it was in utc ( and then discards the "+00" after
formating ).

Maybe I'm confussing you more, its not too easy to explain.

The point is TIMEZONE is not stored in either of them.

Francisco Olarte.

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-10 Thread Francisco Olarte

On Thu, May 10, 2018 at 12:19 PM, Ben Hood  wrote:
>> On 10 May 2018, at 09:59, Francisco Olarte  wrote:

>> Maybe I'm confussing you more, its not too easy to explain.
>> The point is TIMEZONE is not stored in either of them.
>
> Many thanks for clarification, very much appreciated.
>
> Your point is consistent with the manual: “All timezone-aware dates and times 
> are stored internally in UTC”

Tom lane, which is much more knowledgeable than me, points they are
stored in UTC. Maybe, but, IIRC, they are stored in 8 bytes, I do not
know exactly how, last time I read it that could be either a float8
number or a int8 ( maybe to int4 numbers ). I think they use the same
type of storage as the unix timestamps ( unix timestamp is normally a
signed number of integer seconds from an arbitrary start point,
19700101T00 UTC, and they designated an instant in time. 7200
designates an instant, I can format it for the reader in many ways,
19700101T02+, "tres de la mañana del primero de enero de mil
novecientos ochenta, hora de Madrid", "1970-01-01 03:00:00 +01" ). But
it is not UTC, it is the designation of an instant in time. Timestamps
do not have time zones. BUT the postgres data types timestamptz is a
way to store a timestamp. So is timestamp. And so is float8. The
difference is how it is converted and interacts with other types.

> Rather it should read “is enforcing the submission of UTC denominated 
> timestamps in the server by using a domain a sensible way to enforce a policy 
> that will blow up when apps attempt to use non-UTC timestamps (when they 
> shouldn’t be)”.
> So the question is not how does the timestamp get stored, rather, is it an 
> anti-pattern to use Postgres as a linter for apps that forget to use UTC 
> exclusively?

That poses a problem. You must check the input representation. I
mean..., you were using extract on the value, but this happens ( YMMV
):

cdrs=# show timezone;
   TimeZone
---
 Europe/Madrid
(1 row)

cdrs=# select extract(timezone from current_timestamp);
 date_part
---
  7200
(1 row)

cdrs=# set timezone TO 'UTC';
SET
cdrs=# select extract(timezone from current_timestamp);
 date_part
---
 0
(1 row)

cdrs=# select extract(timezone from '2018-01-01
07:00:00+0100'::timestamptz),  extract(timezone from '2018-01-01
00:00:00-0300'::timestamptz);
 date_part | date_part
---+---
 0 | 0
(1 row)

cdrs=# set timezone to 'Europe/Madrid';
SET
cdrs=# select extract(timezone from '2018-01-01
07:00:00+0100'::timestamptz),  extract(timezone from '2018-01-01
00:00:00-0300'::timestamptz);
 date_part | date_part
---+---
  3600 |  3600
(1 row)

cdrs=# select extract(timezone from '2018-01-01
07:00:00+'::timestamptz),  extract(timezone from '2018-07-01
00:00:00+'::timestamptz);
 date_part | date_part
---+---
  3600 |  7200
(1 row)

As you can see you are getting the seconds offset for the client, may
be in a domain for the server, timezone at the instant in time
designated by the value. Not what you originally typed to locate the
instant in time.

For what you want to do I think you'll have to parse the text value,
maybe by definig a view with a text columns and using some
rule/trigger magic for insert / updates.

It seems you want to force your users to use an explcit time zone.
This may be better handled above the database.

Francisco Olarte.

Re: Enhancement to psql command, feedback.

2018-05-10 Thread Francisco Olarte

On Thu, May 10, 2018 at 4:46 PM, Steven Lembark  wrote:
> The whole point of "#!" working in shell is that the two-bytes
> (a) mark the file as executable by a specific shell command and
> (b) are a shell comment.

Shebang is an unix-ism. It is not part of the shell. The shell just
execs whatever you tell it to.

In a simple way, in unix when a file is marked executable the loader
is called to load and execute it. The loader first looks at the start
of the file to try to determine what it is ( it does not use the
'extension' for this as MSDOS and friends ). If it is one of the
several formats binary formats, like elf or a.out, it understands it
loads and executes. If it is the magic sequence "#!" it tries to
search for another executable ( NOT A SHELL COMMAND, this works even
if you zap all the shells in your computer ) and recursively invokes
it ( this is done by execve(2) in my linux machine, and described in
its manual page ).

No shell involved:

folarte:~/tmp$ type cat
cat is /bin/cat
folarte:~/tmp$ echo -e '#!/bin/cat\nHello there\nGood bye then' > xx
folarte:~/tmp$ chmod +x xx
folarte:~/tmp$ ./xx
#!/bin/cat
Hello there
Good bye then
folarte:~/tmp$ perl -e 'exec("./xx")'
#!/bin/cat
Hello there
Good bye then

You can try other ways to call execv*, nothing magical in the perl
way, just avoiding the shell ( which has an exec builtin command, with
different behaviour from typing a command name, which does fork, wait
in the parent, execv in the child ).

Francisco Olarte.

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-12 Thread Francisco Olarte

On Thu, May 10, 2018 at 10:37 PM, Ben Hood  wrote:
> On 10 May 2018, at 16:33, Francisco Olarte  wrote:
>
> For what you want to do I think you'll have to parse the text value,
> maybe by definig a view with a text columns and using some
> rule/trigger magic for insert / updates.
>
>
> Sorry for being unclear - the solution I have in production appears to work
> with
>
> CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK
> (EXTRACT(TIMEZONE FROM VALUE) = 0);
>
> This raises an exception when an app doesn’t use UTC.

> This is what I wanted to achieve and the approach so far seems to work.

Can you post an example  ( correct, error inputs, preferrably done with psql ) ?

At this point I do not know if it is working or if it seems to working
for you. I see some cases in which it would seem to work, but I would
like to know what you mean by "sending non utc timestamps to the
database".

I mean something like the one attached below, which to me seems to
prove it does not work, it just checks that the session timezone is
utc, note the 1st transaction tests an explicit timestamp value,
without conversions, failing in non-utc, the second one checks an
explicit + zone not working in non UTC and the third one checks
anything goes , with or without timestamp, when the time zone is utc.

cdrs=# show timezone;
   TimeZone
---
 Europe/Madrid
(1 row)

cdrs=# begin;
BEGIN
cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK
(EXTRACT(TIMEZONE FROM VALUE) = 0);
CREATE DOMAIN
cdrs=# create table t(ts utc_timestamp);
CREATE TABLE
cdrs=# insert into t values ( current_timestamp );
ERROR:  value for domain utc_timestamp violates check constraint
"utc_timestamp_check"
cdrs=# rollback;
ROLLBACK
cdrs=# select current_timestamp;
  now
---
 2018-05-12 12:58:03.616949+02
(1 row)

cdrs=# begin;
BEGIN
cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK
(EXTRACT(TIMEZONE FROM VALUE) = 0);
CREATE DOMAIN
cdrs=# create table t(ts utc_timestamp);
CREATE TABLE
cdrs=# insert into t values ('2015-05-11 10:20:30+');
ERROR:  value for domain utc_timestamp violates check constraint
"utc_timestamp_check"
cdrs=# rollback;
ROLLBACK
cdrs=# set timezone TO 'UTC';
SET
cdrs=# select current_timestamp;
  now
---
 2018-05-12 10:59:47.946338+00
(1 row)

cdrs=# begin;
BEGIN
cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK
(EXTRACT(TIMEZONE FROM VALUE) = 0);
CREATE DOMAIN
cdrs=# create table t(ts utc_timestamp);
CREATE TABLE
cdrs=# insert into t values ('2015-05-11 10:20:30+');
INSERT 0 1
cdrs=# insert into t values ('2015-05-11 10:20:30+0200');
INSERT 0 1
cdrs=# insert into t values ('2015-05-11 10:20:30');
INSERT 0 1
cdrs=# insert into t values (current_timestamp);
INSERT 0 1
cdrs=# select * from t;
  ts
---
 2015-05-11 10:20:30+00
 2015-05-11 08:20:30+00
 2015-05-11 10:20:30+00
 2018-05-12 10:59:54.289827+00
(4 rows)

cdrs=# rollback;
ROLLBACK


Francisco Olarte.

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-12 Thread Francisco Olarte

Adrian:

On Sat, May 12, 2018 at 6:42 PM, Adrian Klaver
 wrote:
> On 05/12/2018 04:04 AM, Francisco Olarte wrote:
...
>> Can you post an example  ( correct, error inputs, preferrably done with
>> psql ) ?
>>
>> At this point I do not know if it is working or if it seems to working
>> for you. I see some cases in which it would seem to work, but I would
>> like to know what you mean by "sending non utc timestamps to the
>> database".
>
> In the post below Ben explains that he realizes it is not working the way he
> thought:

Yep, saw it after posting this.

> From what I can gather it comes down where UTC is being enforced. Ben was
> hoping that the domain would force timestamps to be only submitted with a
> UTC offset with the idea that this would force storage in the field as UTC
> only. The realization is that if you have a field of type timestamp with
> time zone the value is going to be stored as UTC regardless of the offset
> that is presented(even in cases where there is no offset presented, when an
> implicit one is assigned). That means there really is no need for the
> domain.

I think this problem, and similar ones, come from the description in
the doc, and in a lot of similar places, as "the timestamp with time
zone value is stored as utc", or "timestamps with time zones are
normalized to utc". If you look at the docs you realize they are all
stored as numbers, and ocupy the same space as timestamp W/O TZ. The
thing is you do not need UTC for anything to represent timestamps.
Using the (simpler) unix timestamp as an example. If UTC ( and UAT and
GMT ) did not exist I could defne it as, say, "number of seconds
elapsed since eiight in the morning January the Second  in
Madrid/Spain local clocks plus one hundred and eight thousands", no
UTC needed at all, they are just numbers, they do not have timezones.
The only difference is timestamp uses gmtime/timegm for text
conversion and timestamptz uses localtime/timelocal.

In fact I've found the following in the sources:

  * Timestamps, as well as the h/m/s fields of intervals, are stored as
  * int64 values with units of microseconds. (Once upon a time they were
  * double values with units of seconds.)

And from some reading it seems to be like the unix timestamp, but in
64 bits microseconds and referencing extended Julian ( or Gregorian )
calendar,  not too sure about it. I've read the definition somewhere,
but thinking of them as "just a number" has avoided me a lot of
problems.

Francisco Olarte.

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-13 Thread Francisco Olarte

for a correct manipulation, the thing is
it is easier to wrtie correct code for tstz than for ts, but they are
equally acurate ). In fact, if I store "2010-01-01
12:23:45"::timestamp I know I'm getting "2010-01-01 12:23:45" in the
future, independently of where I read it, so normally I use it when I
want to preserver "string" values but have some special "numeric"
semantics. When I store instants in time I need to make sure my string
input values accurately designate them, and I normally use tstz
because its default input conversions process those correct formats I
allow ( and screen before sending them for conversion ).

Tell me one thing you can do with a timestamp with timezone that you
cannot do with a int8 number or a with a timestamp without time zone.


Given there is an isomorphism between them ( and, in postgres, the
text version is the same for the inverse, IIRC "one=the_other at
timezone UTC" ) and they have the same representation, I doubt it.
They are just convenient ways to manipulate text values and apply
functions based on type, but they do not have a time zone.

You can not get a time zone from a timestamp with time zone value.
What leads to countless confussion is it seems that a tstz-value
should be composed of a ts-value plus a tz-value, but they are not. Is
just a problem with naming.

Francisco Olarte.

Re: case statement within insert

2018-05-25 Thread Francisco Olarte

On Fri, May 25, 2018 at 11:04 AM, tango ward  wrote:
>   CASE
> WHEN code like '%%PE%%' or code like '%%NSTP%%'
> THEN True
> ELSE False

I cannot advise you on the %% stuff, which I do not totally understand, but

CASE WHEN condition THEN true ELSE false END

when contition is ( as it should in a CASE ) a boolean expression is a
classic antipattern , just use 'condition' or '(condition)'

Francisco Olarte.

Re: Doing a \set through perl DBI ?

2018-06-07 Thread Francisco Olarte

David:

On Wed, Jun 6, 2018 at 5:36 PM, David Gauthier  wrote:
> Hi:
>
> Is there a way to do the equivalent of a "\set foo 1" through perl dbi ?
> I tried...
> $dbh->do("\\set foo 1");
> and got a syntax error
>
> Of course, I'd also have to be able to access the value of foo once its set.
> I'm guessing the usual way ??? (select :foo)

You are looking at it wrong. Psql vars are similar to perl vars, and
it is psql ( the program in your machine ) who interpolates them
before sending the query to the server.

The equivalent perl code to

\set foo 1
select :foo;

Would be

my $foo = '1';
$dbh->do("select $foo")

Which is not terribly useful, in perl.

To use the current ( client ) user in a query ( like your next message
states ), you just need

$dbh->do("select * from some_table where user_id_column=$<")

( Of course, you could "use English;" to get $UID, or use placeholders
to avoid injections, that's just an initial pointer )
( or "perldoc perlvar" if you do not know what $< is, and that is
enough perl stuff for a postres list  )

Francisco Olarte

Re: Print pg_lsn as a number?

2018-06-12 Thread Francisco Olarte

On Tue, Jun 12, 2018 at 6:31 PM, Scott Stroupe  wrote:
> According to the documentation[1], pg_lsn is a 64-bit integer that's printed 
> as two hex numbers separated by a slash, e.g. 68/1225BB70. Is there a way to 
> get the 64-bit integer in a common numeric representation instead of the 
> peculiar hex-slash-hex representation?
...
> [1] https://www.postgresql.org/docs/current/static/datatype-pg-lsn.html

Quoting your own [1] ref :"Two LSNs can be subtracted using the -
operator; the result is the number of bytes separating those
write-ahead log locations."

You can try substraction  from an arbitrary origin ( pg_lsn('0/0')
seems nice, as arbitrary as Greenwich meridian ), and it worked for me
in

select
  pg_lsn('68/1225BB70')
, pg_lsn('0/0')
, pg_lsn('68/1225BB70') - pg_lsn('0/0')
, to_hex((pg_lsn('68/1225BB70') - pg_lsn('0/0'))::bigint)
;

( http://sqlfiddle.com/#!17/9eecb/16272 )

Reconstructing via simple addition does not work, but you can do
division, modulus, double to_hex, join with '/', cast to pg_lsn if you
like.

Francisco Olarte.

Re: Is there a way to be notified on the CREATE TABLE execution?

2018-06-20 Thread Francisco Olarte

On Wed, Jun 20, 2018 at 5:20 PM, Igor Korot  wrote:
>>  https://www.postgresql.org/docs/10/static/sql-createeventtrigger.html
>> Your scenario suggests you may wish to avail yourself of the Listen and
>> Notify commands as well.
>
> I did look at the Listen/Notify.
> Unfortunately the listening is done on the channel versus listening
> for the specific event.

Channels are cheap. You just listen on "whatever" and in the event
trigger you notify "whatever", payload is optional.

The event trigger is the one which takes care of filtering the event
and notifying selectively.

You can use a channel per event.

Francisco Olarte.

Re: Is there a way to be notified on the CREATE TABLE execution?

2018-06-20 Thread Francisco Olarte

Igor:

On Wed, Jun 20, 2018 at 7:07 PM, Igor Korot  wrote:
> Just one more question:
> This page https://www.postgresql.org/docs/current/static/xfunc-c.html#DFUNC
> does not reference
> Windows/MSVC/MinGW as a way to compile the code.

Sorry, I don't do windows.

You do not need C extension functions anyway, unless your usage
pattern is truly bizarre a triger for ddl could be written in any pl.

Francisco Olarte.

Re: Is there a way to be notified on the CREATE TABLE execution?

2018-06-20 Thread Francisco Olarte

Melvin:

Maybe old eyes, but ...

On Wed, Jun 20, 2018 at 7:03 PM, Melvin Davidson  wrote:
>>Is there a way to be notified on the CREATE TABLE execution?
> Here is sample code that will notify for a CREATE or DROP table:

Doesn't this trigger just log the events? I think it's missing
something like "Notify create_or_drop" somewhere after the logging (
so a listener knows to look at the log table ).

( Normally I would use something like this, listen to a channel, do a
first scan from the saved last tbl_cl_key, and then do another scan
from the last each time listen fires, it seems the difficult work is
done but it misses the notify to signal listeners and avoid having to
rescan on a timer or a similar thing ).

( Maybe I missed the line, it would not be the first time, that's why I ask ).

Francisco Olarte.

Re: Convert Existing Table to a Partition Table in PG10

2018-07-01 Thread Francisco Olarte

On Sun, Jul 1, 2018 at 12:15 AM, Clifford Snow  wrote:
.
> I also leaned that my range partition value I used on a timestamp needed to
> have fractional seconds. I used a range of 2017-01-01 00:00:00 to
> 2017-23:59:59 which failed when I attempted to add a record that had a
> timestamp of 2017-23:59:59. Adding a fractional second to the range solved
> the problem.

Your problem probably comes from using closed intervals. Timestamps
are like real numbers, partitions on real numbers are best done using
half closed interval. You can conver the real line using non
overlapping half open intervals, but you cannot do it with open or
closed ones ( non ov. ).

Assuming you are yearly range partitions ( 2017-23:59:59 should be
2017-12-31 23:59:59 ), to use closed interval you have to rely on
"real" second numbers being stored in the computer with a finite
precision  ( so you can, say, add up to the microseconds, and pray it
does not change to picoseconds in a future release ). If you use half
open ( 2017-01-01 00:00:00 <= ts < 2018.01.01 00:00:00 ) the problem
is much easier. You can even drop the HMS ( 2017-01-01 <= ts <
2018-01-01 )  and it will work, even if the systems peeks a different
HMS value for each year, as you use the same value for an interval
start as for the previous end. And, if timestamp supported defaulting
the M and D like it does with HMS ( which it does not ) you could even
drop them.

And I think postgres does not use leap seconds, but If it did
'2016-12-31 23:59:60.9" does not need to be remembered in
the half-open style.

This is a general problem, not a postgres or timestamp related one.
Anything which has decimals is generally better partitioned with
half-open intervals. With integer-like things ( like dates, which are
countable ) it does not matter that much, I use half-open for easier
upgrading if I need to, but you can translate open-closed-half.

Francisco Olarte.

Re: Split daterange into sub periods

2018-07-05 Thread Francisco Olarte

On Thu, Jul 5, 2018 at 4:16 PM, hmidi slim  wrote:
> In fact I'm trying to split a period in sub periods. Following this example
> :
> If I have a period =[2018-01-01, 2018-01-31] and two other periods
> [2018-01-04, 2018-01-06] and [2018-01-08, 2018-01-08].
> If I split the base period '[2018-01-01, 2018-01-31]' by the other two
> periods '[2018-01-04, 2018-01-06]' and '[2018-01-08, 2018-01-08]' I will got
> such a result:
> [2018-01-01, 2018-01-03]
> [2018-01-07, 2018-01-07]
> [2018-01-09, 2018-01-31].

Your example maybe fine to illustrate a definition, but it is totally
inadequate to describe what you want.

You'll, at least, need to refine your question. Something like "I have
a big period and a set of small periods, and want to calculate the set
of subperiods of the big one not covered by the small ones", and also
some conditions ( i.e., can the small periods interesect? are they
completely covered by the big one? can they include the endpoints of
the big one? Can they be empty? Does the result need to be minimal (
in the number of result periods ) ? ).

Even then, this is not trivial and, as said above, you will probably
need a function for it.

Francisco Olarte.

Re: Code of Conduct

2018-09-19 Thread Francisco Olarte

On Wed, Sep 19, 2018 at 5:27 PM, Fred Pratt
 wrote:
> Keep pg open and free.   This smells of PC police.   This community can 
> police itself
No comment on this, just kept for context.

> Sent from my mobile device. Please pardon my brevity and typos.   I am not 
> responsible for changes made by this device’s autocorrect feature.

I will happily pardon brevity ( although I would not call a ten line
sig plus a huge bottom quote "breve", and AFAIK it means the same in
english as in spanish ) and/or typos, but the "I am not responsible"
feels nearly insulting. Did someone force you to use "this device" (
which you seem to perceive as inadequate for a nice answer ) to reply,
or did you choose to do it ? ( real, not rethoric question, but do not
answer if you feel  its inadequate )

As an aside, is this kind of afirmations and/or my response to it a
violation of the current CoC ?

Francisco Olarte.

Re: How to maintain the csv log files in pg_log directory only for past 30 days

2018-09-28 Thread Francisco Olarte

On Fri, Sep 28, 2018 at 7:03 AM, Raghavendra Rao J S V
 wrote:
> Log file will be generated in csv format at pg_log directory in our
> PostgreSQL. Every day we are getting one log file. We would like to maintain
> only max 30 days. Which setting need to modify by us in “postgresql.conf” in
> order to recycle the log files after 30 days.

I have similar problems in a lot of things and normally use "find
 -name  -mtime + -delete" in the daemons
cron ( gnu find, on linux, I assume other os have a similar command ).

For postgres-only solutions you've been given some advice previously,
and like those I only know ways to do it daily/weekly/monthly/yearly.

Francisco Olarte.

Re: pg_sleep() inside plpgsql block - pro & cons

2018-10-02 Thread Francisco Olarte

Hi:

On Tue, Oct 2, 2018 at 12:10 PM, pinker  wrote:
> There is second time I see that somebody uses pg_sleep function inside
> plpgsql block. This case is quite similar to the last one - it's some kind
> of wait for data to be loaded. After pg_sleep there is a check if some
> condition is true, if not procedure goes to sleep again. As a result an
> average duration of this function is 1,5h...
> I'm trying to gather pros and cons regarding using pg_sleep this way. What's
> coming to my mind are only 2 cons:
> * clog contention
> * long running open transactions (which is quite good described in here:
> https://www.simononsoftware.com/are-long-running-transactions-bad/)
> So maybe you'll add some more to the list?

With so few details, nothing much can be said. Cons, if the proc is
something like do-stuff wait for data to appear, do more stuff, I
think the function will also need read-commited or something similar
to see the data appear, and fail under serializable. Pattern certainly
smells funny. I do some similar things, but I sleep outside of the
database, is there a reason this can not be done?

Francisco Olarte.

Re: pg_sleep() inside plpgsql block - pro & cons

2018-10-02 Thread Francisco Olarte

On Tue, Oct 2, 2018 at 3:44 PM, pinker  wrote:
...
> Yes, I do try to convince them to do it outside the db, that's the reason
> I'm looking for some support here :) I'm not sure those 2 reasons are enough
> to convince them, wanted to be prepared...

Well, not knowing the logic I cannot be sure, but you can try the "it
depends on isolation level" I pointed to.

> You know it's always time to do refactoring and in this case it's a lot of
> time because of the amount and complexity of the logic in db.

Not having seen it, I trust your word. But I doubt the dessign can be
correct, it seems a case of "all I know is pl/xx, so I'll do
everything I can in it. I got one product with a similar problem in
Oracle.

But unless the logic is really perverse, it seems to me you could do
two procs, the normal one ( without the pg_sleep in a loop ), and a
wrapping one testing for the "data appears" condition and calling the
first, and call the second in a loop. But again, details will be
needed and trusting your words I do not want them on the list ;->  .

Francisco Olarte.

Re: Why the index is not used ?

2018-10-07 Thread Francisco Olarte

ROS:

On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier  wrote:

> -INSERT INTO cartedecredit(username,cc)  SELECT 'individu ' || x.id, 
> pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, 
> cipher-algo=aes256') FROM generate_series(1,10) AS x(id);
> -CREATE INDEX idx_cartedecredit_cc02 ON 
> cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, 
> cipher-algo=aes256'));

If my french is not too rusty you are encrypting a credit-card, and
then storing an UNENCRYPTED copy in the index. So, getting it from the
server is trivial for anyone with filesystem access.

Francisco Olarte.

Re: Filtering before join with date_trunc()

2018-10-15 Thread Francisco Olarte

Hi Phil:

On Mon, Oct 15, 2018 at 5:57 PM, Phil Endecott
 wrote:
...
> For each of these tables I have a view which rounds the timestamp
> to the nearest minute, and ensures there is only one row per minute:
...
> I then join these tables on the rounded time:

> Now I'd like to find the values for a particular short time period:

For what I propose I assume the SHORT time is really short

...
> This works, but it is inefficient; it seems to create all the rounded
> data, do the join on all of it, and then filter on the time period.
> Ideally it would filter the raw data, and then need to round and join
> far fewer rows.

I think you are expecting too much, I mean, you are expecting the
server to know it can expand your time-period into a (maybe) bigger
one which covers the original data and push that condition down. In my
experience this is normally too much.

...

> Is there anything I can do to make this more efficient when the tables
> are larger?

If your periods are really short you could try to replace the time
condition on the views to a time condition in the tables and do the
rounding and grouping afterwards. I mean, use a half-open interval to
catch the relevant chunks of the tables and then join the short
results.

I think with a trio of CTE selecting with a time interval on the WHERE
and doing the date_trunc()/MAX() group by you should be able to do
three index scans producing short results which can then be
full-joined and coalesced. If you want the interval from $A to $B (
rounded to minutes ), do something like

WITH pressures AS (
 SELECT date_trunc('minute'::text, tbl."time") AS "time",
max(tbl.pressure) AS pressure
   FROM tbl
-- Chop the relevant time..
WHERE time >= $A and time < $B + '1 minute'::interval
-- There may be easier ways to make the above condition if you are
generating the text, but always use half-open
  GROUP BY 1 ),  -- Easier to type, and no order-by here ( and I
normally label order by in views as a code smell )
,
yada, yada-- repeat for temperatures, rain

SELECT
  SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
rain.rain,
pressures.pressure,
temperatures.temperature
   FROM rain
 FULL JOIN pressures USING ("time")
 FULL JOIN temperatures USING ("time")
ORDER BY 1; -- ORDER BY GOES HERE, I suspect your example got ordered
by chance, not by dessign.

( use other names, I just used the view names for C&P, lazy me ).

Francisco Olarte.

Re: compressing network packets

2019-08-01 Thread Francisco Olarte

On Thu, Aug 1, 2019 at 5:43 PM farjad.farid
 wrote:
> Under windows environment, is there a way of compressing the network packet 
> reaching postgresql server?

You'll need to explain a bit more. You have windows, I assume for
either client, server or both, and you have a pg server, on an
unespecified environment.

Also packets "reaching" the server does not sound good, you probably
want to compress packets "sent" to the server and DECOMPRESS them upon
reaching the server, or something like that.

Try to be a bit clearer, something like "Is there a way to compress
network traffic between a pg windows client and a windows server on
another machine" and some windows savvy reader may be able to help you
( I can think of a couple solutions for that on my basic desktop OS
with the basic tools, I'm sure something like windows must have dozens
).

Francisco Olarte.

Re: Having the issue in while selecting the data and feltering in order by.

2019-08-08 Thread Francisco Olarte

Nikhil ( or is it Raj? ):

On Thu, Aug 8, 2019 at 11:47 AM nikhil raj  wrote:
> Here are the detailes.
> Windows :
Zapped image showing windows locale "English-united-states.1252"
> linux :
Zapped image showing Linux locale "En-us.utf-8"
> both are in same version 11.3
> How can i get the sorting order by like windows how to change the lc_collate  
> Please can you help me over here  OR else is there an other way we get the 
> sort order like window through code

1st, it is a locale mismatch problem. I do not own windows, so I can
not test it, but IIRC win1252 is a latin-1 variant. I do not know how
their locales sort, but it seems they are doing something like the
C-utf-8 locale.

You could try to recreate the database using the same locale as in
windows. If you are able to do this ( dump, recreate, restore) it may
be the easier way. I do not know if there are windows-like locales for
linux, of how does that locale sort ( I mean, how does it process
things like á or ö or ñ, your example show only the most basic stuff
). You could probably use C locale, in ascii, win1252 or utf-8
depending on your data content to get good results ( locale comes from
the OS, but pg has the charsets ). With the restricted data in your
examples, just switching to C locale will suffice, but I do not know
if you have some more complex examples / sort orders ( someone
knowledgeable in windows may be able to help. It seems like windows
locales are not as sophisticated as Linux one, and IIRC this was true
20 years ago when I used them ).

Other option is to modify either your columns, to include a collate
option in the relevant ones, or the order-by clauses in the relevant
queries, which support the collate clause to. You may want to read
https://www.postgresql.org/docs/11/charset.html and investigate a bit
with some test dbs.

I would test something like creating a database with C locale, charset
win1252, this may be a good starting point. But test some more queries
in windows, i.e., things like this:

$ echo -e 'a\nA\nb\nB1' | LC_ALL=en_US.UTF-8 sort
a
A
b
B1
$ echo -e 'a\nA\nb\nB1' | LC_ALL=C.UTF-8 sort
A
B1
a
b

Because it may matter for you, and I doubt the interactions can be
properly tested without a proper dataset AND a windos pg server.

Francisco Olarte.

Re: Having the issue in while selecting the data and feltering in order by.

2019-08-08 Thread Francisco Olarte

Luca:

On Thu, Aug 8, 2019 at 12:33 PM Luca Ferrari  wrote:
> On Thu, Aug 8, 2019 at 11:20 AM nikhil raj  wrote:
> > Same when i run this in linux machine i am getting this out in  different 
> > sort order on the same query.
> A collation problem?
> What does this query do?
> SELECT *
> FROM (VALUES ('a'), ('___b1'), ('_a1'),('a2'),('a3'),('a5'), ('a2')) t
> (val) order by val COLLATE "C";

It will probacly do the expected thing for him, but I think another
problem may be what does this query:

SELECT * FROM (VALUES
('0'),('1'),('a'),('A'),('á'),('à'),('b'),('B'),('ñ'),('Ñ')) order by
1;

Does on the Windows server, what he is trying to replicate. Finding a
locale which does not ignore _ as the windows one does is easy, but
replicating all the doodahs may be difficult without knowing how the
windows locale sorts all the chars ( and I'm not getting at 1a vs a1,
or _ vs a vs A).

I suspect C locale may be ok though.

Francisco Olarte.

Re: constrain with MATCH full and NULL values in referenced table

2019-08-12 Thread Francisco Olarte

Stan:

On Mon, Aug 12, 2019 at 5:11 PM stan  wrote:
>
> I am creating a table that has 2 values in it which are keys pointing to 2
> other tables. I need for the UNIQUE combination of these 2 keys to exist in
> a fourth table. It has been recommended to use a foreign key constraint with
> the MATCH FULL parameter.
>
> Here is my question, does this deal with NULLS in the 4th table? I am
> concerned that this constraint might fail to reject an entry if one, or both
> of the 2 key values being inserted in the table are NULLS,.

If you have:

Table TA (a: PK)
Table TB (b: PK)
Table TAB( a, b,)  PK(A,B), FK(a ref TA), FK(b ref TB)
Table FOURTH(a,b,...) FK((A,B) ref TAB mach full)

Note TAB cannot have nulls in A,B as it is the PK.

And you insert (null, null) in FOURTH it will be treated as in single
column, allowed by the fk ( but you may have non null constraints on
either a or b).
If you try to insert (a1, null) or (null, b1), it will ber rejected,
MATCH FULL does not allow null/non-null mix.

OTOH, if you use MATCH SIMPLE the partial-null cases will be not
checked at all, as if they where not null. As stated in the docs, you
can use extra single column FK in a and/or b to  get them checked in
TA/TB, and also you can put non-null constraints on either on them.

The exact combo depends on what you are trying to model, which gives
you what you want. I.e., say I want to:
1.- check a,b combos.
2.- Allow (a,null) but have it checked against ta.
3.- Forbid (null,b)
4.- Aloow (null, null)
You can use MATCH simple FK(a,b) against TAB for (1,4), single column
FK(a) against TA for(2)  and a check constraint (A is not null OR B is
null , If I'm not confused ) for (3,4).
( Note you do not have to check b against tb, because if b is present,
a is present, a,b is checked against TAB and TAB.b is checked against
TB ).

(match simple checks 1 and allows 2,3,4, FK(a) checks 2, and the check
constraint forbids 3)

The DB deals with nulls in many way, you just have to enumerate your
conditions and elaborate on that.
Note in this case it FAILS to reject an entry if b is null, because I
dessigned it that way, but DOES REJECT if a is null and B is not.

Regards.
Francisco Olarte.

Re: SELECT all the rows where id is children of other node.

2019-08-21 Thread Francisco Olarte

Pablo:

On Tue, Aug 20, 2019 at 6:49 PM pabloa98  wrote:
> Thank you for your responses Rob. Appreciated. The problem with recursive 
> queries is that they are executed several times and it has and impact in 
> performance.
> I need a subset of those rows and I want them in one pass.
> I discovered that ltree extension could be useful. I will play with it today. 
> I am sure there's a way to find al the nodes in O(n) time with n = size of 
> the resulset ...

Unless you have some extra conditions in a table ( like
"autoincremented immutable primary key and parents are always created
before childs" ) I think your problem of "get all the descendant ( i
do not like to call them children ) nodes of a given id" can not be
solved in one pass.

I mean, if you are getting descendants of the node N1, you need to
read the last node, NL, of the table to know if it is a child of N1.
But then you have to read the table again to find childs of NL.

Of course, if you have something like "hierarchical ids" you can
traverse ordering by it and know NL MUST be childless, and build the
tree rooted on node N1 as you go, but without some of this conditions
I do not think it can be done in an "ideal" db ( which lets you scan
in any order you can define from just a row without cost ) in one scan
( storing and prunning the whole tree as you go is cheating ).

Also, if your node ids come from a serial and are immutables, or you
take a little care when mutating them, you can do it traversing by id,
but you need a full scan, a recursive query with several index scans
may easily be faster in wide trees.

Francisco Olarte.

Re: SQL equivalint of #incude directive ?

2019-08-31 Thread Francisco Olarte

Stan:

On Fri, Aug 30, 2019 at 3:49 PM stan  wrote:
...
> I am defining a bunch  of functions, and I would prefer to store them in a
> separate file, which then gets "source" by the main DB init file.
> Is there a standard way to do this?

I assume you mean "standard for postgres", AFAIK there is not even a
(multidb) standard way to feed an sql script to a server (content of
queries / results are standard, how to send them / get them from the
server is not ).

\i in psql has been pointed. I do not know if this is one of your
"workarounds", but what I normally do for this is trat the "init"
script as a program and "make" it. I've done it using many of the
preprocessor freely available around ( i.e. m4, cpp ) and also using a
perl program ( but this is becuase I'm quite fluent in it and it makes
some thing easier ), and even doing "cat *.sql | psql" and naming the
scripts appropiately. You'll probably have it easier going by that
route ( generating the sql script from chunks using a tool of your
choice, it's just a big text chunk and you can easily debug it with a
text editor ), and it is not that complex.

Francisco Olarte.

Re: SQL equivalint of #incude directive ?

2019-09-01 Thread Francisco Olarte

Tony:

On Sun, Sep 1, 2019 at 10:05 AM Tony Shelver  wrote:
> I think you will find that most DBAs build their own scripts using tools like 
> Perl or Python, or a commercial product.

Tools like Perl are great for building text output, it's got a lot of
shortcuts for a lot of things. Not that sure about python. But I think
this are more used by developers wearing dba hats, or for dbas which
double duty as developer / sysadmins. Any dba needs some programming
skills, specially when dealing with dbs with procedure capabilities,
but not that much.

For a pure DBA I would recommend more things like a classic
preprocessor (I've got some success with m4, specially when using
m4sugar with it, and I'm sure there are better ones around there) or
some templating system. For just text CPP can be used, but it knows a
bit about C syntax which can become a nuisance, but for just includes
it can be fine.

> A similar situation exists for HTML, there is no standard pre-processor 
> directive.

IMO the reason is the same. SQL and HTML are not "source code", they
are the final product, the thing you send over an http connection ( in
html case ) or an implementation defined way ( i.e. inside the socket
connections psql makes using libpq ). They are like the window exes,
or the unix elf binaries. In the HTML case there are lots of
templating systems which you use to build the HTML content, in many
cases on the fly, and many of them can be reused easily for SQL.

I think the trick is not treating the SQL as the source, but as an EXE
file. Once you start thinking of it as such lots of things in your
usual toolbox become useful. I routinely do something similar for
configuration files, using just shell scripts full of echo+multiline
constant + source aux files + source some echoing files and, if quotes
are used for something like column names, cat + here docs.

Francisco Olarte.

Re: How to reformat output of "age()" function

2019-09-11 Thread Francisco Olarte

David:

On Wed, Sep 11, 2019 at 5:39 PM David Gauthier  wrote:
> How can I change the default output of the "age" function to be, for example, 
> in minutes?
> E.g.
> dvdb=> select age('2019-09-11 09:00:00','2019-09-09 11:00:00');
>   age
> 
>  1 day 22:00:00
> (1 row)
> I want the equivalent of that time delta in minutes.

Some answers have already been posted, but also consider what you are
doing. Intervals have three components, months, days, seconds for a
reason, ( ignoring leap seconds for now ) not all days have 24h (
daylight saving time changes ) and not all months have 28/29/30/31
days. IIRC interval normalization for epoch assumes all months have 30
days, all days have 24 hours.

If you want to know the elapsed minutes between two timestamps, it
might be better to do it directly, extract the epoch from both (
seconds ), substract, divide by 60 truncating/rounding if you need to.

This is what happens in one case on my timezone ( Europe/Madrid ):

test=# select '2019.11.20 20:00:00'::timestamptz as end, '2019.06.10
10:00:00'::timestamptz as start;
  end   | start
+
 2019-11-20 20:00:00+01 | 2019-06-10 10:00:00+02
(1 row)

test=# select age('2019.11.20 20:00:00'::timestamptz, '2019.06.10
10:00:00'::timestamptz) as interval;
interval
-
 5 mons 10 days 10:00:00
(1 row)

test=# select extract(epoch from age('2019.11.20
20:00:00'::timestamptz, '2019.06.10 10:00:00'::timestamptz)) as
interval_seconds;
 interval_seconds
--
 1386
(1 row)

test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz) -
extract(epoch from '2019.06.10 10:00:00'::timestamptz) as
elapsed_seconds;
 elapsed_seconds
-
14122800
(1 row)

Francisco Olarte.

Re: How to reformat output of "age()" function

2019-09-12 Thread Francisco Olarte

OOps, I got it bad:

On Thu, Sep 12, 2019 at 1:50 PM Francisco Olarte  wrote:

> timestamp / interval arithmetic is really a hairy thing. ( As shown
> below, start point carefully taken to avoid crossing dsts )

It was chosen to FORCE, not AVOID, crossing dst.

> cdrs=# select x, '2019.11.20 20:00:00'::timestamptz + x::interval as
> sum from ( values ('5 mons 10 days 10:00:00'),('163 days
> 11:00:00'),('3923:00:00')) as v(x);
> x|  sum
> -+
>  5 mons 10 days 10:00:00 | 2020-05-01 06:00:00+02
>  163 days 11:00:00   | 2020-05-02 07:00:00+02
>  3923:00:00  | 2020-05-02 08:00:00+02
> (3 rows)

cdrs=# select x, '2019.11.20 20:00:00'::timestamptz + x::interval as
sum from ( values ('0'),('5 mons 10 days 10:00:00'),('163 days
11:00:00'),('3923:00:00')) as v(x);
x|  sum
-+
 0   | 2019-11-20 20:00:00+01
 5 mons 10 days 10:00:00 | 2020-05-01 06:00:00+02
 163 days 11:00:00   | 2020-05-02 07:00:00+02
 3923:00:00  | 2020-05-02 08:00:00+02
(4 rows)

Subject is just complex enough I should avoid making this typos!

Apologies, my fault, bad proof reading, hungry

Francisco Olarte.

Re: How to reformat output of "age()" function

2019-09-12 Thread Francisco Olarte

Adrian:

On Wed, Sep 11, 2019 at 11:19 PM Adrian Klaver
 wrote:
> On 9/11/19 9:34 AM, Francisco Olarte wrote:
...
> > If you want to know the elapsed minutes between two timestamps, it
> > might be better to do it directly, extract the epoch from both (
> > seconds ), substract, divide by 60 truncating/rounding if you need to.

> > test=# select extract(epoch from age('2019.11.20
> > 20:00:00'::timestamptz, '2019.06.10 10:00:00'::timestamptz)) as
> > interval_seconds;
> >   interval_seconds
> > --
> >   1386
> > (1 row)
> >
> > test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz) -
> > extract(epoch from '2019.06.10 10:00:00'::timestamptz) as
> > elapsed_seconds;
> >   elapsed_seconds
> > -
> >  14122800
> > (1 row)
>

> Epoch is not the issue, age() is. Leave age() out of it:

Well, I did not point to age as the problem assuming it was clear in
the examples I gave. Two usages of epoch, one with other without age.

> set timezone = 'Europe/Madrid';
>
> test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz) -
> test-# extract(epoch from '2019.06.10 10:00:00'::timestamptz) as
> test-# elapsed_seconds;
>   elapsed_seconds
> -
>  14122800
> test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz
> -'2019.06.10 10:00:00'::timestamptz) as
> elapsed_seconds;
>   elapsed_seconds
> -
>  14122800
> (1 row)

Ah, I see your point now. The problem is age and substraction gives
different invervals.

>cdrs=# select '2019.11.20 20:00:00'::timestamptz -'2019.06.10 
>10:00:00'::timestamptz as diff_interval;
   diff_interval
---
 163 days 11:00:00
(1 row)

cdrs=# select age('2019.11.20 20:00:00'::timestamptz, '2019.06.10
10:00:00'::timestamptz) as age_interval;
  age_interval
-
 5 mons 10 days 10:00:00
(1 row)

And I see it still translate it to normalized days.

I still prefer substracting epoch values, as the 163d/11h seems really
tricky to work with, and it took me a time to parse & diff your
queries against mines, they all looked the same. It seems like PG does
date substraction by substracting epochs and converting to a
normalized interval, I prefer to go to number of seconds directly when
working with "elapsed time"as otherwise I get confused. I mean
something like this ( using it just to show the differences ):

cdrs=# select (extract(epoch from '2019.11.20 20:00:00'::timestamptz)
- extract(epoch from '2019.06.10 10:00:00'::timestamptz)) * '1
second'::interval as pure_seconds_interval;
pure_seconds_interval
---
 3923:00:00
(1 row)

A third representation! which gives the same result for epoch, but I'm
not sure it does for arithmetic( tested it, it does not )

I thought substraction would give me that, clearly it does not ( both
give the same when using epoch, as lacking tz info it has to assume
something, and it seems to assume no dst changes ).

timestamp / interval arithmetic is really a hairy thing. ( As shown
below, start point carefully taken to avoid crossing dsts )

cdrs=# select x, '2019.11.20 20:00:00'::timestamptz + x::interval as
sum from ( values ('5 mons 10 days 10:00:00'),('163 days
11:00:00'),('3923:00:00')) as v(x);
x|  sum
-+
 5 mons 10 days 10:00:00 | 2020-05-01 06:00:00+02
 163 days 11:00:00   | 2020-05-02 07:00:00+02
 3923:00:00  | 2020-05-02 08:00:00+02
(3 rows)

Francisco Olarte.

Re: How to reformat output of "age()" function

2019-09-12 Thread Francisco Olarte

Adrian:

On Thu, Sep 12, 2019 at 4:23 PM Adrian Klaver  wrote:

> > pure_seconds_interval
> > ---
> >   3923:00:00
> > (1 row)
> >
> > A third representation! which gives the same result for epoch, but I'm
> > not sure it does for arithmetic( tested it, it does not )
> >
> > I thought substraction would give me that, clearly it does not ( both
> > give the same when using epoch, as lacking tz info it has to assume
> > something, and it seems to assume no dst changes ).
>
> See doc information above. It uses the SET timezone. See below for more
> information:

I know how to set the timezone and operate with it, and Iknow for the
purpose of calculating elapsed seconds ( minutes / hours ) both
approaches are the same. What I'm saying is you must take care on how
to define and calculate your intervals if you are going to add them to
TS directly, as an 86400 seconds one has the same epoch than a 1day
one, but they represent different things when being added to a tstz,
and I know that the difference depends on the current timezone.

I do a lot of these, working with phone calls to extract ring / talk
time from setup/connect/disconnect time, and I do not hit problems
because I never do interval arithmetic on elapsed times , its not a
problem if you are aware that there are "infinite" ways to represent
an epoch in an interval .

Francisco Olarte.

Re: How to represent a bi-directional list in db?

2019-09-22 Thread Francisco Olarte

Pankaj:

On Sun, Sep 22, 2019 at 4:25 PM Pankaj Jangid  wrote:
> CREATE TABLE stages (
>id SERIAL PRIMARY KEY,
>name  VARCHAR(80) NOT NULL,
>created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
>updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
>prev_stage_id SERIAL REFERENCES stages NULL,
>next_stage_id SERIAL REFERENCES stages NULL,
>process_id SERIAL REFERENCES processes NOT NULL
> );
> Failed with: conflicting NULL/NOT NULL declarations for column
> "prev_stage_id" of table "stages"
> Is it not possible to create "nullable" self referencing foreign keys?

Serial seems wrong. It means integer, not null, defaul next value from
a sequence.

What you probably want is just "prev_stage_id INTEGER" ( NULL by
default ), as you do not want the prev/next stage ids to be generated,
you normally would want to assign values from other tuples.

Also, you may have problems populating this kind of table, as you will
not have the ids from either prev or next stage when building it.

And lastly, in SQL you do not really need a doubly linked list, just
populate prev_stage_id, and index it and you can query next stage of a
tuple using it.

Francisco Olarte.

Re: How to represent a bi-directional list in db?

2019-09-23 Thread Francisco Olarte

Pankaj:

On Mon, Sep 23, 2019 at 4:07 AM Pankaj Jangid  wrote:
> Thanks. This resolved my problem of NULL/NOT NULL conflict. I wasn't
> aware that SERIAL is by default NOT NULL.

Not only that. Once you strip the annoying NOT NULL the only thing
remaining on a serial column is a "default nextval", which you
normally do not want ( you could think of populating the table in
creative ways, but they are on a different sequence than the one you
use for the ID column ).

> > Also, you may have problems populating this kind of table, as you will
> > not have the ids from either prev or next stage when building it.
> If NULL value is allowed I can fill it up with NULL initially. Right? Or
> is there something wrong here.

There is not, you can use (id,prev,next) = (1,null,null) and then
update, but you are going to need to travel up and down a lot, or
store a lot of data. If you use the trick I comment later of just
using "prev", you can do, on a table having (id=serial, prev=int),
build a sequence by doing "prev_id=null"; insert (id,prev,other_data)
returning id; copy return value to prev_id, rinse and repeat.

Also note that you can query the sequence AND advance it and then
insert all rows without default values.

> > And lastly, in SQL you do not really need a doubly linked list, just
> > populate prev_stage_id, and index it and you can query next stage of a
> > tuple using it.
> Could you please elaborate? Suppose I have this table,
> CREATE TABLE stages (
> id SERIAL PRIMARY KEY,
> name VARCHAR(80) NOT NULL,
> next_id INTEGER REFERENCE stages NULL,
> );
> What would be the backward query in that case? Forward is clear. This is
> forward query,
> SELECT name FROM stages WHERE next_id = 123;

No. That is a BACKWARDS QUERY. You are in row 123, you go BACK to its
preceedeing one.
If you need a traversable list containing (ListID, id,name) = x,1,A;
x,2,b; x,3;c ( I've added the ListId column to make it more
interesting/reallistic, you normally do not have a single table)
In sql you can build a (ListId, id, prev_id, name ) table ( PREV is
easier, as when you insert a row, in a normal application, you know
the previous one, but not the next one ) with the data
(x,1,null,a),(x,2,1,b),(x,3,2,c) ( the last one is a synthetic
sentinel and assumes nullable id), you can do it in a lot of ways.

To traverse it forward you just querying "select id where listid=x and
next_id is null" to locate the head (1), and then just go forward by
selecting with prev_id = last got id until you hit zero results.

To traverse backwards there are several ways. In the real cases I've
used I always had a "list row" where I could store the node for the
1st stage. In that cases i linked them circularly, (id=1, prev=3), so
bidirectional traversing was easy. Or you can use a special sentinel
node ( with a marker, like name=null). The thing is you locate the
last row, and then just query with id=last got prev_id. I do not
remember the details, but probably your "stages" are stages of
something which has a row, which can readily have a "first_stage_id"
or something similar.

Lists in tables are not the same as in C, where you directly store
pointers which point outwards. In this case any unique data serves as
a pointer, slow ( table scan ) by default, faster if you index the
column.

Anyway, unless you need the "linked list" functionality for something
( really heavy manipulation of large stage lists, splicing things
around ), I've normally found it's easier, in sql, to model this kind
of thing with a master-detail + order column.
( whatever = (id, , first_stage_id), stages=(id, order,  ) )

Francisco Olarte.

Re: How to represent a bi-directional list in db?

2019-09-24 Thread Francisco Olarte

Pankaj:

On Mon, Sep 23, 2019 at 4:07 PM Pankaj Jangid  wrote:
...
> My stages are stages of processes. So yes processes are also stored in a
> table. I got the idea. I'll add another column in the processes table
> which points to the first stage (first_stage_id). And quries
> Forward pass:
...
> Backward pass:
..
> This is assuming I also create a circular list. I can also store
> last_stage_id in the process table if we don't want to create circular
> list in db.

That's exactly the idea. A pointer in C becomes a pair of fields with
corresponding value, and it can be traversed in any direction, slowly
by table scan or fastly with indexes. In fact a pair of values becomes
equivalent to two pointers, as it can be traversed either way ( but
the indexing acceleration has to be applied to each direction ).

That being said, linked lists are procedural data structures, SQL is
declarative, so they are not a great match, that's one of the reasons
why they are rarely seen. Things like master-detail have less
impedance mismatch.

Francisco Olarte.

Re: SELECT d02name::bytea FROM ... && DBI::Pg

2019-10-10 Thread Francisco Olarte

Matthias:

On Thu, Oct 10, 2019 at 1:25 PM Matthias Apitz  wrote:
> I can SELECT && print a column in hex with:
> pos71=# select d02name::bytea from d02ben where d02bnr = '1048313' ;
> ...
>  \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 
> ...
>
> but when I use the same in Perl DBI::Pg with:
> $sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = 
> '1048313'");
...
> It prints the UTF-8 string and not the hex string:

May be because perl does not need the contents in hex DBI converts it
to a native string, which is very similar to a "bytea" inside the db (
not in the wire protocol, but it does a similar thing for numbers ).

Notice once you have a string in perl it is trivial to convert it to hex:
$ perl -e 'print(unpack("H*", "1234ABCD"),"\n")'
3132333441424344
$ perl -e 'print(unpack("H*", "día año"),"\n")'
64c3ad612061c3b16f

Francisco Olarte.

Re: SELECT d02name::bytea FROM ... && DBI::Pg

2019-10-12 Thread Francisco Olarte

Matthias:

On Thu, Oct 10, 2019 at 7:26 PM Matthias Apitz  wrote:
> Hmm. But *I* do need the content in hex to see if the varchar column
> contains correct encoded UTF-8 data. We're on the way to port a huge
> database application from Sybase to PostgreSQL and are facing any kind of
> problems one can think of. Magically, sometimes strings, expected to be
> coded in UTF-8, arrive in the Perl $variables coded in ISO-8859-1 and than 
> cause
> other problems when German Umlauts should be translated into HTML
> encodings like ü etc. to be presented in the web browser.

This seems to be a perl problem ( I've had my share of them ). I
suppose you can convince perl to upgrade all your vas to utf-8, but I
do not remember how.

Anyway, if you want "the hex representation of the bytea equivalent of
a field ( of which I do not rememeber the original type ), why don't
you ask for it?

If you ask for bytea you are asking for a binary string. The fact it
is transmitted hex-encoded in the wire is an implementation detail.

psql is a text mode interface. To represent binary strings to you it
needs text, so it uses hex.

perl can do binary, so it puts the content in binary. After all your
bytea could have been a jpeg image for all perl/DBI knows.

But you can easily tell perl to pass a binary to hex, with unpack.

OTOH, if you want, you can tell postgres to send you hex, look for
encode/decode in the relevant manual pages, they are under binary
string functions, IIRC. The pg will build a text hex string, send it
on the wire by whatever mechanism it chooses and you'll get the hex
data from DBI. Do not ask pg+DBI for binary ( ::bytea ) and expect
text.

> Perl (and Java) sucks, it does magic things below the surface of
> string (objects). That's why I like C :-)

They have differing degrees of suckines. I've read the Java String
sources, and been horrified by them. Perl, OTOH, sucks for many
things, but has its points. And C, I try to use C++ for everything I
can ( many times using it as just a better ( for me ), C compiler,
I've been known for writting huge chunks of C++ but use malloc/stdio
and friends all around, but I've found programs have way less bugs if
using thighter types ). It's a taste question, and of course, I
wouldn't like to do my hundreds of <100 liners in perl for parsing
huge texts and extracting some data in C. But utf-8/latin-1, you're
right, you have umlauts, we have accendts and n-tildes, I've been
there and will be again. Enough off-topic anyway.

Francisco Olarte.

Re: Is there any configuration in postgresql.conf or any other configuration of postgres which will make this possible to listen on particular interface

2019-10-16 Thread Francisco Olarte

On Wed, Oct 16, 2019 at 12:07 PM M Tarkeshwar Rao
 wrote:
> Above processes are bind on all the interfaces of a machine. Is there any 
> configuration in postgresql.conf or any other configuration of postgres which 
> will make this possible to listen on particular interface?

Listening on an INTERFACE is a tricky thing. But if you can manage
with listening on a particular ADDRESS and letting the os pick the
interfaces for you, you can try listen_address ( which is trivially
found on 
https://www.postgresql.org/docs/12/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS
, although being lazy I just searched for "listen" on the manual index
)

Francisco Olarte.

Re: CPU SPIKE

2019-10-21 Thread Francisco Olarte

You'll have to provide more data:

On Mon, Oct 21, 2019 at 7:14 AM Pawan Sharma  wrote:
> Having real high CPU issue (95-98%), with SELECT statements and select 
> queries contains multiple AND operator, is it will cause any CPU Spike..???

If you do a query the only reason it is not hitting 100% cpu is it is
waiting for something, like disk or network or locked data, so with
all data caches, fast networks ( or small results ) a standalone query
will use 100% CPU during its run ( it must, if it is not doing it
something is rotten in the scheduler ).

If some set of queries is using 100% cpu during, i.e., 5 minutes, they
may be asking for cached data that is normal.

A proper question will be "x queries of type y are eating 95% CPU in
my machine Z, with this type of data, I think it is not correct
because blah-blah". An spike during the query execution is the correct
outcome, CPU time is not like petrol, if you do not use it is lost,
and careis taken doing things like paralell jobs to insure queries use
as much CPU time as possible ( because typically a query needs, say, 5
core-seconds, if you use 10% cpu in an octacore that is 3.75 wall
secs, if you use 100% it is 0.375 ).

Also note locks/big result transmissions aside a fully cached DB is
cpu-limited ( and would be for selects with complex conditions ), I
would expect 100% usage if enough ( for 24 cpus ) clients are doing
complex queries against a cached database.

Your problem may be "it is using 100% of 24 cpus during a minute"
where you think "it should be just a sec", but the 100% figure is
better, you do not want your cpu to sit iddle.

> apps team is using sub-partition, PG11, CPU:24, Mem: 16GB
...
>  effective_cache_size
> --
>  22GB
...
>  max_worker_processes
> --
>  8

I may be mislead, but isn't 16Gb a little spartan for 24 CPU with 8
workers ( per query ? ). Also, I assume the 22Gb is because you are
accounting for a huge host cache.

Francisco Olarte.

Re: Upgrade procedure

2019-10-30 Thread Francisco Olarte

On Wed, Oct 30, 2019 at 6:00 PM rihad  wrote:
> Hi, all. Why is it normally suggested to stop the server, upgrade it,
> then start it? Wouldn't it be easier & quicker to simply upgrade the
> package in-place and restart the service? On OSen that allow
> modification of currently running binaries, which is most Unix OS, M$
> Windows being a notable exception )

Not sure exactly why, and whether postmaster protects itself from
this, but in many years of using linux, which supports it, I've
encountered my share of (minor) problems when doing that, typically
due to the program not having loaded all the modules it can use (
firefox is a classic for me, as I have it normally open while I do my
background updates, nothing a restart does not solve, but I would get
nervous doing that to the database ).

That being said, I've upgraded my TEST servers without stopping them (
although lately the upgrade scripts do a restart at the end ), but for
production I prefer to download everything and prepare as much as I
can and do a stop-finish upgrade-start, the last ( file extracting )
phase is normally much faster than the server restart machinery, so no
much is gained by doing it in paralell.

And other thing, MODIFICATION of currently running binaries is BAD,
and IIRC many OS use them for paging and will not be happy ( maybe in
these days they transparently switch to normal swap ). What you
normally want is unlinking and replacing, modification of the names
aka dir contents (not being able to do this is one of my strongest
dislikes of windows, it makes so many things so much simpler ).

Francisco Olarte.

Re: Getting following error in using cursor to fetch the records from a large table in c language(current transaction is aborted, commands ignored until end of transaction block)

2019-11-01 Thread Francisco Olarte

On Fri, Nov 1, 2019 at 1:44 PM M Tarkeshwar Rao
 wrote:
> Getting following error in using cursor to fetch the records from a large 
> table in c language.

Regarding this, "c language", I'll comment later


> Can you please suggest why it is coming and what is the remedy for this.

> Nov  1 13:21:54 sprintstd2 postgres[18604]: [10-1] < 2019-11-01 13:21:54.212 
> CET > ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block
> Nov  1 13:21:54 sprintstd2 postgres[18604]: [10-2] < 2019-11-01 13:21:54.212 
> CET > STATEMENT:  BEGIN
> Nov  1 13:21:54 sprintstd2 postgres[18604]: [11-1] < 2019-11-01 13:21:54.324 
> CET > ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block
> Nov  1 13:21:54 sprintstd2 postgres[18604]: [11-2] < 2019-11-01 13:21:54.324 
> CET > STATEMENT:  BEGIN
> Nov  1 13:21:54 sprintstd2 postgres[18604]: [12-1] < 2019-11-01 13:21:54.356 
> CET > ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block
> Nov  1 13:21:54 sprintstd2 postgres[18604]: [12-2] < 2019-11-01 13:21:54.356 
> CET > STATEMENT:  BEGIN
> Nov  1 13:21:54 sprintstd2 postgres[18604]: [13-1] < 2019-11-01 13:21:54.360 
> CET > ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block
> Nov  1 13:21:54 sprintstd2 postgres[18604]: [13-2] < 2019-11-01 13:21:54.360 
> CET > STATEMENT

This seems incomplete, but I's telling you the cause. You had an
error, you need to terminate the transaction before issuing a new one,
i.e., do a commit ( which, IIRC, will rollback if the transaction is
in error ) or rollback.

> Sample Code snippet used

As you stated C I cannot comment too much, but notice:


> theCursorDec = (RWCString)"DECLARE " +  mySqlCursor + " CURSOR FOR " 
> + theSql;
> myFetchSql = "FETCH " + fetchStr + " IN " + mySqlCursor;

Neither of these are C, they COULD be C++

> // Begin the cursor
Same as this comment.

> PQexec(connection, ,"BEGIN"))
> PQexec(connection, ,"myFetchSql”)

And these are definitely not C ( no ; ) and, if you generated them by
editing, myfetchsql is quoted which smells fishy.

I won't comment more until you confirm that is the real code, but
anyway  it seems to me you issue transaction start queries without
properly terminating them with a transaction end one.

Francisco Olarte

Re: Getting following error in using cursor to fetch the records from a large table in c language(current transaction is aborted, commands ignored until end of transaction block)

2019-11-03 Thread Francisco Olarte

M Tarkeshwar Rao:

On Sat, Nov 2, 2019 at 5:16 PM M Tarkeshwar Rao
 wrote:
> Thanks Yes it is in c++. Actually we just written this code.
> Due to vaccum full cursor query failing on a connection and all the 
> subsequent queries are failing and we
> found shared errors in /var/logs of the postgres installed machine.
... Some more things followed by a huge bottom quote-to-the-sig.

Your writing style makes it really hard to decipher what you are
trying to say. I hope your problem has been solved, because I am not
going to try to decipher it or comment more.

Francisco Olarte.

Re:

2019-11-21 Thread Francisco Olarte

Sébastien:

On Thu, Nov 21, 2019 at 3:42 PM Sébastien Bihorel  wrote:
.
> /home/sebastien/data $ ls -l
> drwxr-xr-x 2 sebastien postgres 4096 Nov 19 09:14 pgdata96_sebastien
> Starting psql as sebastien, I tried the following requests and got the 
> following errors:
...
> The manual states "The location must be an existing, empty directory that is 
> owned by the PostgreSQL operating system user." So, I am not sure what I am 
> doing wrong. Web searches ran on these error messages did not enlighten me 
> more...

Your directory is owned by OS USER sebastien GROUP postgres, try
"chown -v postgres /home/sebastien/data " to fix it.

The server ( postgres ) does not have permissions ( rwx >>r-x<< r-x,
only read and chdir ) to create anything in there.

The fact you start psql as sebastien does not matter, it must be owned
by the SERVER user ( as psql is separated, it communicates by a
socket, can be in another machine ).

Francisco Olarte.

Re: System column xmin makes anonymity hard

2020-05-12 Thread Francisco Olarte

Johannes.

On Tue, May 12, 2020 at 8:05 PM Johannes Linke  wrote:
> since 9.4, VACUUM FREEZE just sets a flag bit instead of overwriting xmin 
> with FrozenTransactionId [1]. This makes it harder to build applications with 
> a focus on data reduction.
> We have an app that lets people anonymously vote on stuff exactly once. So we 
> save the vote in one table without any explicit connection to the voting 
> user, and separate from that a flag that this person gave their vote. That 
> has to happen in the same transaction for obvious reasons, but now the xmin 
> of those two data points allows to connect them and to de-anonymize the vote.

> We can of course obfuscate this connection, but our goal is to not keep this 
> data at all to make it impossible to de-anonymize all existing votes even 
> when gaining access to the server. The best idea we had so far is more of a 
> workaround: Do dummy updates to large parts of the vote table on every insert 
> so lots of tuples have the same xmin, and them VACUUMing.[2]

And even without the xmin someone could cump ctid and correlate them
if you are not careful.

You problem is going to be hard to solve without taking extra steps. I
think doing a transaction which moves all the votes for period ( using
insert into with the result of a delete returning ) and then inserts
them back ( with some things like a insert into of a select order by
random ) may work ( you may even throw a shuffled flg along the way ).
An then throw in  vacuum so next batch of inserts overwrites the freed
space.

But for someone with the appropiate access to the system, partial
deanonimization is possible unless you take very good measures. Think
of it, here in spain we use ballot boxes. But voter order is recorded
( they do double entry check, you get searched in an alphabetic list,
your name is copied on a time ordered list, and your position on the
list recorded in the alphabetic one, all in paper, nice system, easy
to audit, hard to cheat ). If you can freeze time, you can carefully
pick up votes from the box and partially correlate them with the list,
even with boxes much larger than the voting envelopes they tend to
stack with a nice order. And this is with papers, computers are much
better on purposelessly ordering everything because it is easier to do
it this way.

> Does anyone have a suggestion better than this? Is there any chance this 
> changes anytime soon? Should I post this to -hackers?

Something which may be useful is to use a stagging table for newly
inserted votes and move them in batches, shuffling them, to a more
permanent one periodically, ad use a view to joing them. You can even
do that with some fancy partiotioning and an extra field. And move
some users already-voted flags too, on a different transaction. Doing
some of these things and adding some old votes to the moving sets
should make the things difficult to track, but it all depends on how
hard your anonimization requirements are ( I mean, the paper system
I've described leaves my vote perfectly identificable when I've just
voted, but it is regarded as a non issue in general, and I suspect any
system you can think leaves the last vote identifiable for a finite
amount of time ). In general, move data around, in single transactions
so you do not lose anything, like shaking a ballot box periodically (
but ensure the lid is properly taped first ).

Francisco Olarte.

Re: Catching errors with Perl DBI

2020-07-02 Thread Francisco Olarte

Stan:

On Thu, Jul 2, 2020 at 5:03 PM stan  wrote:

> How can I catch the errors generated whne I call an INSERT that violates a
> constraint? I have coded like this:
>
>  my $sth = $dbh->prepare($stmt);
>  my $rv = $sth->execute() or die $DBI::errstr;
>  if ( $rv < 0 ) {
>  print $DBI::errstr;
>  }
>
> But, if the INSERT violates a constraint, it never gets the the evaluation
> of the $rv

I assume you mean the if($rv<0) is what it is not executed.

In perl this happens because something died. I assume it is not the
one you coded. This means some of your handles have the RaiseError
attribute, lookit up in the perldoc.

> Is this a setting for the DBI?

I do not remember if it has a global setting, but it sure has a
database handle setting ( which percolates down ). I use it routinely
for easier error handling.

I'm not sure if you know how to from your message, but if something is
dying you can use the block eval construct:

eval {
# potentially dying code...
   my $sth = $dbh->prepare($stmt);
   my $rv = $sth->execute() or die $DBI::errstr;
   if ( $rv < 0 ) {
  print $DBI::errstr;
}
 1; # This forces the eval to return true if execution gets here.
} or do {
# Whatever you want, $@ has the codes.
}

to trap it in perl.

About RaiseError, it is common to set it to true in the handle, put
all your code in a sub() and catch it, in programs where you donot
have easy recovery of errors, and use local eval if needed to catch
this kind of prim. key violation things.

Francisco Olarte.

Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread Francisco Olarte

Robins:

On Mon, Jul 6, 2020 at 1:37 PM Robins Tharakan  wrote:

> When an SQL needs to UNION constants on either side, it should be possible to
> implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect 
> understanding,
> or something already discussed but rejected for some reason?

Maybe the optimization does not hold its weight. I mean, the increased
complexity in the optimizer, bigger memory footprint, testing and
developer usage, is not worth it.

> This need came up while reviewing generated SQL, where the need was to return 
> true when
> at least one of two lists had a row. A simplified version is given below:

I.e., I do not think this is a "need", specially in generated SQL,
seems more like a deficiency in the generator ( specially since
generators are able, and some do it, to massively overcondition the
generated code to insure the optimizer does not miss anything ), and
wrapping things in a limit 1 when just testing for row existence seems
easy to do while generating.

> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
> vs.
> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- 
> Faster

As an aside, isn't it easier, more correct ( in my opinion ) and
clearer to just use exists for row-existence test? Because you have to
at least see it there is a result above, probably using exists, and
you can do...

select exists(SELECT 1 FROM pg_class) or exists(SELECT 1 FROM pg_class);

to get a direct boolean and benefit from shortcircuiting, by putting
the most likely one first, and from the internal knowledge the
optimizer may have to not fully evaluate queries, which may be greater
than deducting from the union/limit case.

Francisco Olarte.

Re: Doubt in mvcc

2020-07-13 Thread Francisco Olarte

Rama:

On Mon, Jul 13, 2020 at 9:52 AM Rama Krishnan  wrote:
> I m preparing for interview one of the recruiter asked me mvcc drawbacks as i 
> told due to mvcc it use more space and need to perform maintenance activity.
> Another one is the same data causes an update conflict because two different 
> transactions can update the same version of the row.
>  he told its wrong, kindly tell me will you please tell me its correct or 
> wrong?

I'm not sure I understand your question too well, you may want to
refresh/expand.

One interpretation is, on a pure MVCC contest, two transactions, say 5
and 6, could try to update a tuple valid for [1,) and end up
generating two new tuples, [5,), [6,) and closing the original at
either [1,5) or [1,6) .

That's why MVCC is just a piece, locking is other. On a MVCC the
tuples are locked while a transaction manipulates them. Other
transactions may read them, which is why readers do not block writers,
but two updates on the same tuple serialize.

Francisco Olarte.

Re: Doubt in mvcc

2020-07-13 Thread Francisco Olarte

Rama:

1st of all, please do not top post, specially if you want to discuss
examples, I'm not goint to hunt up and down for the relevant sections.

On Mon, Jul 13, 2020 at 10:52 AM Rama Krishnan  wrote:
> Still, I have a doubt as per your example both are trying to update the same 
> tuple so it may produce two different copies right?

 First, a loud advise. MY EXAMPLE IS NOT A REAL EXAMPLE OF HOW A REAL
MVCC DATABASE WORKS. It just tried to show that if you just use the
min-max transactions on storage without using more techniques, which
many people will consider part of MVCC, it will not work.

> I read some blocks they mentioned drawback above two things
> It may lead to lost update also
> For example, two transactions are going to increase the amount on the same 
> account by $100 . The first transaction reads the current value ($1000) and 
> then the second transaction reads the same value. The first transaction 
> increases the amount (this gives $1100) and writes this value. The second 
> transaction acts the same way: it gets the same $1100 and writes this value. 
> As a result, the customer lost $100.

PROPER mvcc, like postgres does, will not lead to this if used
properly. If both transactions use UPDATE first will read AND lock the
row, update value, write it, commit and unlock. Second will try to
read and WAIT for lock, read 1100, write 1200.

Some things happens if both use select for update and/or use the
adequate isolation levels to force the engine to use appropiate locks.

BUT if both transactions do a select, wait for a bit, then do an
update set ( NOTE: for a banking application the CORRECT way to do a
deposit is "update accounts set balance=balance+100", not "select
balance from accounts into $B; update accounts set balance=$B+100 ).
You may end up which what look like a lost update, but is really not a
DB problem. If you do it in two ops, the DB does not know they
correlate. For what it knows your bank might be a room with money in
tin boxes, you read it to chek a box, counted the box, noticed the
discrepancy and sent an update to fix it, and did it twice to be sure.
Update correlates more with "I opened the box and put 100$ in without
looking at what was there". Update returning would be "and I counted
the box afterwards", and select for update would be "I took the box to
my table, counted, added 100, counted the new pile,  and returned the
box to the safe".

> Will u please provide more details

No, I will not. You need to read a lot more than a mail can hold, MVCC
is a complex topic which I just do not fully understand and I'm not
going there. But there are tons of info floating around, more or less
easy to find.

> Serialize is the solution to this issue.

That's what locks and isolation levels do.

Francisco Olarte.

Re: Improvement for query planner? (no, not about count(*) again ;-))

2020-07-20 Thread Francisco Olarte

Tobias, 1st some etiquette stuff.

- You have replied just to me, directly. I'm CCing the list. Remember
to use reply all. Usual practice in the postgres lists is to reply to
the list and everyone involved in the thread ( doing reply all
achieves this normally ).

- It's not a biggie in this particular mail, but please do not
top-post, specially if you want to get answers on any complex
question. Trim unnecessary parts from the quoted text and reply below.
Having to scroll to a big chunk which includes even my signature is
not a thing I like.

On Mon, Jul 20, 2020 at 2:23 PM  wrote:
> I have tried the queries
> select name1 from games union select name2 from games
> but not
> select distinct name1 from games union
> select distinct name2 from games
> since it's just the same and easy for the optimizer to realize (I thought?)

There are several other things you have not done. You have not
provided any info ( statistics ) on your table, just the cardinality.
You have not provided any explain output, which is normally needed if
you really want people to help you.

On the subject, I'm not really sure both queries are identical or can
be optimized, but when one does bulk queries like that it is better to
help it. Your query is a corner case, a one of loading, and many
optimizers do not catch that things properly, as putting code for them
means increasing bug surface on a feature of dubious utility ( I,
personally, would prefer having to put your sample query manually than
risking optimizer bugs OR paying the price of the optimizer trying to
catch that on every query I send ).

Also, the optimizer may be catching many things, a explain output may
help to see what it's doing.

> I've given Postgres a few Gigs (I think 4?) as work_mem, having 16 GB in 
> total. Still it's not using them.

This does not seem correct. work_mem is per operation, it can be used
several times on a single query. See
https://www.postgresql.org/docs/12/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
. Again, some explain/show combos may clear up things.

> Seems like my mistake was creating that table with a primary key. But the 
> query itself without inserting into anything should've been fast then, which 
> it wasn't. I'll remember your trick of creating the primary key afterwards 
> and will try just select name1 from games to see how it goes.

The PK stuff is bulk-loading 101. Try explain AND explain analyze of
some variants, remember to analyze your tables ( seems redundant, but
the PK & redundant hash key stuff leads me to think you are not too
experienced on postgres usage ).

Francisco Olarte.

Re: Bytea Example

2020-08-16 Thread Francisco Olarte

Naveen:

1st, do not top post, it is frowned upon here. AAMOF I was tempted to
skip your message due to this, as it makes replying too hard.

AK asked you (twice), "What programming language and database driver
are you using?"

On Sun, Aug 16, 2020 at 7:11 PM Naveen Kumar  wrote:
> I am connected to PostgreSQL in a Windows environment. Below is my version:
> "PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit"

AK question is relevant because pg is a client-server database. I
assume you have the server correctly installed and running, but we
need to know what client software you are using, i.e

- "I'm using the bundled psql command line program."
   In this case you interact by sending sql commands directly (I'm not
able to put a sample because I never use bytea from psql, but many on
this list do and can help you.

- "I'm using pgAdmin4", and some pgAdmin user can help you.

- "I'm using a C program using libpq"

- "I'm using a Java program using JDBC"

> I am just trying to practice LOB objects, like byteA data type, in 
> PostgreSQL. Unfortunately, I didn't find good links regarding this so I 
> raised the issue. Nothing else I am doing.

There are lot of links, many people know them by heart, but you need
to help them by answering the question.

The driver/language phrasing in AK question is because normally no one
uses a plain sql ( psql/pgadmin) tool for LOBs, but if you do there
are links which can be provided.

Bear in mind that for all we know you could be connecting using your
own gopher based tool which translates coloquial swahili to sql.

Francisco Olarte.

Re: Reconstructing transaction content after the fact

2020-09-02 Thread Francisco Olarte

Lionel:

On Wed, Sep 2, 2020 at 5:46 PM Lionel Bouton  wrote:
...
> The software uses a modern version of PostgreSQL (9.6 or later) and uses
> a transaction to store a vote and the corresponding register entry in
> two separate tables.
...>
> There's already an obvious angle of attack to correlate a vote and a
> register entry : the WAL. That said the WAL can be configured to be
> short lived to minimize the attack surface. By my understanding a low

> Another angle I could think of are the transaction ids. If I'm not
> mistaken they are stored on disk inside the files storing table and
> index data (my quick read of the documentation lets me think the txid is
...
> Is there a way to access these values by connecting to a PostgreSQL
> server instead of analyzing in-memory or on-disk data ?

If you assume attackers can read your disks, or even query arbitrary
tables, you are going to find plenty of potential attacks. Apart from
xid databases tend to lay insertions sequentially on disk, so if your
main way of filling the tables is inserting a pair you've got a nice
probability of correlating them with just two "select * from
voters/entries" ( or just dumping the raw table files if they get file
but not sql access ). In fact even paper voting can be attacked this
way ( in spain they write every voter in the order in which they vote
on a sheet, and if you do not take care to shake the ballot boxes you
can  get a very decent aproximation to the envelope insertion order (
you will need raw access to the boxes, like you would need raw sql or
disk access in your case ) ) .   ( Not sure if ballot box is the
correct term )

For something like that I would try to insure no disk access, no raw
sql access, give the apps a single point of access to the DB
mediating every query/operation with a stored procedure/function,
using accounts with access to only those, even for selects, so you
have tight control and easier auditing.

Francisco Olarte.

Re: Network performance optimization

2020-09-09 Thread Francisco Olarte

Fabio:

On Wed, Sep 9, 2020 at 1:05 PM Fabio Ugo Venchiarutti
 wrote:
> Even if network datagrams moved at the speed of light and with no
> serialisation/forwarding delay (which they don't), you're still going to
> deal with several thousand KMs of distance; I'm positively surprised
> you're getting such short round-trip times as it stands.

Light travels at about a foot per nanosecond in air/vacuum, 20 cm per
nanosecond in glass IIRC, so you can RTT 10cm per nanosecond in fiber.
This amounts to 100km per millisecond. 1200 to 1600 km in 12-16 ms.
East-1/2 are N.Virginia / Ohio, which can be from 240 to 950 km apart
( on a quick google maps measure, not several thousands ), depending
on the exact place. And Amazon has really fast pipes going between
their regions, so this is not surprising. I have 55ms from ireland (eu
west 1) to N.Virg., and they are 5500 km over the great circle.

Francisco Olarte.

Re: How to write such a query

2020-09-19 Thread Francisco Olarte

Igor:
> My problem is that I want to emulate Access behavior.
> As I said - Access does it without changing the query internally (I presume).
> I want to do the same with PostgreSQL.

Use access connected to Postgres.

> I'm just trying to understand how to make it work for any query
> I can have 3,4,5 tables, query them and then update the Nth record in the 
> resulting recordset.
> Access does it, PowerBuilder does it.

Now, jokes aside. Access, Powerbuilder are applications used to edit
databases ( among other things ).
Postgres is a database server which provides databases.

> I just want to understand how.

If you want to do the same thing in your app, they normally pull the
result of a query which some columns uniquely identifying the row, or
use a cursor and use the "WHERE CURRENT OFF" positioning. But YOUR APP
does it, postgres does no do it.

When Access does it the database is in postgres, or swl server, or
jet, or whatever. Access is not a database, in the sense postgres it.
Jet is.

Francisco Olarte.

Re: How to migrate column type from uuid to serial

2020-10-07 Thread Francisco Olarte

Hemil:

On Wed, Oct 7, 2020 at 2:49 PM Hemil Ruparel  wrote:
> I was integrating a payment gateway for my app when I noticed its maximum 
> length of customer id string is 32. SIze of UUID is 36 (32 characters and 4 
> dashes). So I want to change the type of customer id to serial. The problem 
> is by now, the column is being used at many places. How to migrate the column 
> to serial without dropping the data?

An uuid is just a 128 bit number, as noted by many. Your problem is
the default encoding is 32 hex digits plus 4 hyphens. If your payment
gateway uses a 32 CHARs string  ( "maximum length of customer id
string is 32" ) you can just use other encodings. Passing to integer
and using decimal wont cut it ( you'll need 39 digits ), but using
just hex (without hyphens) will drop the string representation to 32
characters ( you can encode with a subst and postgres accepts it
without hyphens ).

If you want to buy a little more space for your own purposes you can
even fit 128 bits in 22 base64 chars with a couple pairs of bits to
spare, and IIRC you can do it with a creative decode/encode step after
killing the dashes.

And if your payment gateway uses unicode codepoints instead of ASCII
chars as units you could probably use more creative encodings ;-) ,
but probably using  a "drop the dashes" subst in the interface will be
your simpler option.

Francisco Olarte.

Re: How to migrate column type from uuid to serial

2020-10-08 Thread Francisco Olarte

On Thu, Oct 8, 2020 at 6:14 AM Hemil Ruparel  wrote:
> I was thinking UUID was not a very good choice for id. Serial would be a 
> better one because I don't have a billion customers. It is more like a 
> thousand. So when I saw the customer ID of the payment gateway cannot accept 
> more than 32 characters, I thought UUID is overkill. So I want to migrate to 
> using a serial int instead as the primary key.,

I do not like them. They are typically random 128 bit integers. If you
think of it, random 16 bits ints are not a good idea for keys, even if
the table has only a couple dozen keys. UUIDs are the same at a bigger
scale, they look nicer and safer, but have got problems. The version
1-2, the one with mac address with theoretically guaranteed uniqueness
were nice for their intended use, avoiding collisions without a
central authority. But the namespace hash and random version are
practically collision free, but not theoretically, which displeases
me. When I need to present an obscured name to an outside service (
i.e., in your case, presenting the user id to the gateway directly
leaks information, as they can estimate your consumer growth ) I just
encrypt ( not hash, encrypt ) them. For these you can do something
like treating a 64 bit number like an 8 byte array, encrypt it with a
"secret" key and you send something which is random enough to the
gateway, but is easy for you to manage in the database, and preserves
the counter niceties, like knowing there are approximately 1000
customers between IDs 3000 and 4000 ( some may have been deleted
/skiped ). Also, depending on your patterns, IDs typically correlate
nicely with table order, which makes index access a bit faster, you do
not jump around. And 4 byte ints make for a much denser index than 16
byte uuid.

Francisco Olarte.

Re: Strange behavior

2020-10-10 Thread Francisco Olarte

Olivier:

On Sat, Oct 10, 2020 at 6:13 PM Olivier Leprêtre  wrote:

> I’m surprised by this behavior I noticed in pgadmin3 and postgresql 9.6
...

> select v1 from test1 where v1 not in (select v1 from test2)

This is called a correlated subquery ( google and search for it, it is
even in wikipedia ). It has many uses.

Basically, it refers to the v1 from the outside query.

Get in the habit of using (potentially aliased ) column names whenever
you have any moderately complex query, i.e. if the inner v1 would have
been v2 ( due to a typo ), writing your query as :

select t1.v1 from test1 as t1 wher t1.v1 not in ( select t2.v1 from
test2 as t2 )

Would have caught it.

Francisco Olarte.

Re: Another user error?

2020-11-01 Thread Francisco Olarte

On Sun, Nov 1, 2020 at 7:54 PM Rich Shepard  wrote:
>
> When trying to populate the locations table I get this error:
> psql:locations.sql:2105: ERROR:  syntax error at or near ";"
> LINE 2105: ...E Airport Way',null,'Portland','OR','97218',null,null,null);
>   ^
> Line 2105 is the last line in the file and is terminated with the only
> semi-colon in the file. Every line prior to this one is terminated with a
> colon. All other syntax errors have been corrected.
>
> I've not encountered this issue before. Where should I look for the error?

This is nearly impossible to diagnose without a chunk of the query (
i,e, the first lines and the last ones ).

It smells to missing parentheses. If you use some editor witch matches
them, try adding one BEFORE the semicolon. If it matches something
above you've got it.

But really, show your code, otherwise you are not going to get much help.

Francisco Olarte.

Re: Another user error?

2020-11-01 Thread Francisco Olarte

Rich:

On Sun, Nov 1, 2020 at 8:04 PM Rich Shepard  wrote:
>
> On Sun, 1 Nov 2020, Francisco Olarte wrote:
>
> > This is nearly impossible to diagnose without a chunk of the query (
> > i,e, the first lines and the last ones ).

> Francisco,

> $ psql -d bustrac -f locations.sql

This is NOT the first line of the query. Even "head locations.sql" is
not guaranteed to give the first line of the query ( you know it, we
do not, it may have a few selects ata the top.

What I mean is something like 

>>>>>>>>
insert into locations balh, blerg, blub values
(1,2,3),
.. snip the middle lines.
(4,5,6),
(7,8,9);
<<<<<<<<<<<<

> > It smells to missing parentheses. If you use some editor witch matches
> > them, try adding one BEFORE the semicolon. If it matches something
> > above you've got it.
> Here're the last two lines:
> (2297,1,'Principal place of business','600 SE Bay 
> Boulevard',null,'Newport','OR','97365','541-265-7758','541-265-4235',null),
> (2298,1,'Principal place of business','7200 NE Airport 
> Way',null,'Portland','OR','97218',null,null,null);

> And psql found a few missing terminating parentheses and commas which were
> fixed. I expected this last run to successfully complete.

Good luck. BTW, if you know a bit of scripting in perl/awk/python
whatever, it's normally useful to generate big insertion queries
programatically, to avoid mismatched quotes problems. And if your
editor is half-decent ad you have th disk space you can normally try
to make a big file, create an empty table like the target one, insert
into so you can bisect the input ( and once you have everything in
spool tables you just insert the whole table into the final one and
drop it ).

I mean, you do "create table location_tmp like locations". Then, split
your input files into easily managed ones, say some thousand lines.
Insert every file into tmp, do bisections on the ones which fail to
find errors. Once you have everything in tmp just insert from it into
final table.

Francisco Olarte.

Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Francisco Olarte

Thorsten:

On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning  wrote:
...
> I've changed the new table "datagram" to be a partitioned one with
> partitions containing rows per year and per half-year. Each partition
> contains far less rows than before this way and while I only tested
> with "--compress=9" this time, the numbers are quite interesting:
>
> > unpartitioned: 6,4 GiB
> > half-yearly parts: 4,8 GiB
> > yearly parts:  4,8 GiB
>
> The interesting part this time is that the table layout for all
> partitions is the same like before, only the number of rows per table
> is different. Though, the number of rows overall is the same like
> before, the same data, IDs etc. Though, this time the dump really is
> smaller than with the OLD schema containing far more data because of
> duplicate IDs and stuff.
> I wouldn't have expected table layout to be that important.

Compresion is dependent on detectable redundancy on the input. pg_dump
more or less gzips per-table "copy to stdout" dumps. If your data
evolves in a determined way having it sorted by ingestion time may
increase detectable redundancy a lot, and partitioning sorts partially
by date ( or fully if you have made the partitions by range-querying
via index scan ). In this case it may not be the layout, but the
order.

Given you seem to be able to test, you may try sorting the full table
by the column you use for partitioning. IIRC cluster will do the trick
if it is indexed.

( This has happened to me compressing document dumps, presorting by
some chosen fields improved my compression ratio a bit, IIRC it was
due to the compressor searching for duplicates on a limited window
only, this is why I use "detectable" redundancy )

Francisco Olarte.

Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Francisco Olarte

Thorsten:

On Wed, Feb 10, 2021 at 12:36 PM Thorsten Schöning
 wrote:
> Guten Tag Francisco Olarte,
> am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:
> > [...]and partitioning sorts partially
> > by date ( or fully if you have made the partitions by range-querying
> > via index scan ).[...]
> That statement is especially interesting not only because of my
> dump-size, but I'm running into the problem that queries spanning more
> than one partition seem to prefer sequential scan over using indexes.
> My indexe seems to only be used when querying the rows of one
> partition.

You know the drill, show your indexes, post some explain for these,
those statements are too fuzzy to infer any useful advice.

> So, the following is my definition, should that be "range-queried via
> index scan" properly? :-)
> > CREATE TABLE datagram
> > (
> >   id bigserial NOT NULL,
> >   src_re integer   NOT NULL,
> >   src_cltinteger   NOT NULL,
> >   src_meter  integer   NOT NULL,
> >   captured_attimestamp with time zone  NOT NULL,
> >   captured_rssi  smallint  NOT NULL,
> >   oms_status smallint  NOT NULL,
> >   oms_encbytea,
> >   oms_decbytea
> > ) PARTITION BY RANGE (captured_at);
> > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> > ('1970-01-01') TO ('1970-07-01');
> > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> > ('1970-07-01') TO ('1971-01-01');
> > [...]

Given that definition there is no possible index scan, as there are no indexes.

> > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;

But given this what I was suggesting is not true. What I meant is that
if you had big datagram_unpart table indexed by captured_at with good
index correlation, which given the name of the field would be
expected, and populated the partitions one by one by doing range
queries over datagram unpart you may have hit index scans on each
population query and end up with perfectly sorted partitions, and that
could help compressions.

Given your query, unless the optimizer is performing some supernatural
( for me ) trickery my bet would be on a full table scan plus
insertion which would left you with whatever order you had originally
for each partition.

But if your data, or your "data shape" has some natural correlation
with captured timestamps ( i.e., I know that my CDR data has it ),
puttig similar data together could have improved your compression
ratios. Bear in mind I do not know it.

Anyway, I would not worry too much about the backup ratios with the
sizes you have, I would worry more on doing things like augmenting
fill factors and similar things if your data processing is similar to
mine ( data enters at time captured_at, sometimes is fixed due to
errors but normally never moves after being in the tables for a couple
of months, and when it does it is a special problems which can be
manually tuned ). It will not improve the backup too much, but can
improve other things. i.e., in my big CDR ( that means call detail
record, telephony ) tables I let them insert into "normal" monthly
partitions, but when they are some month olds I coalesce them into
fully sorted, unindexed, 100% filled yearly partitions. Those
partitions are then put into an "historic" schema which I do NOT
backup automatically, I only do it after the ( manually fired )
coalescing and the ( manually done ) very ocasional long term fixups (
I rarely have to touch nothing older than a month ).

Regards.
Francisco Olarte.

Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Francisco Olarte

Thorsten:

On Wed, Feb 10, 2021 at 12:46 PM Thorsten Schöning
 wrote:

UUppzz, completely missed the second message.

> > CREATE INDEX idx_datagram_for_time_window ON datagram USING btree 
> > (src_meter, captured_at DESC);

Depending on the query and data "shape", this may be chosen or not for
queries. You should be the one more probable to know it.

Francisco Olarte.

Re: getting tables list of other schema too

2021-02-24 Thread Francisco Olarte

Atul:

On Wed, Feb 24, 2021 at 11:24 AM Atul Kumar  wrote:
> yes I know that, but my doubt is why \dt is showing tables of other
> schemas even I  am setting the search_path.

AFAIK dt list "tables", not "tables in the schemas in search path".

It states " By default, only user-created objects are shown; supply a
pattern or the S modifier to include system objects.", but these sys
schema does not seem to be a system one.

Francisco Olarte.

Re: getting tables list of other schema too

2021-02-24 Thread Francisco Olarte

On Wed, Feb 24, 2021 at 12:12 PM Francisco Olarte
 wrote:
> AFAIK dt list "tables", not "tables in the schemas in search path".
> It states " By default, only user-created objects are shown; supply a
> pattern or the S modifier to include system objects.", but these sys
> schema does not seem to be a system one.

Zap it, after (incorrectly after posting) searching for "schema" in
the docs I found, in a paragraph far, far, away, \d* uses search path.

Francisco Olarte.

Re: getting tables list of other schema too

2021-02-25 Thread Francisco Olarte

Atul.

Due to your top posting style and not being a native english speaker
I'm unable to understand your question.

As all the quotes at the bottom  seemed to belong to me, I'm assuming
you referred to some of my postings.

As the last one said I tried to point a thing after checking some
docs, then I noticed some error in my part and tried to tell everybody
to ignore my previous post.

On Wed, Feb 24, 2021 at 3:16 PM Atul Kumar  wrote:
>
> I am sorry but I am not clear from your response, as I have created
> another instance with same version 9.6 but there no system schema or
> its tables are visible.

Regarding these, I do not know what a "system schema" is May be
"public". Note my post talked about "user cretaed" vs "system" ( I?ve
normally observed those are things like information schema vies,
pg_class an similar tables, which are always there after DB creation
and are normally needed for the server to work )  OBJECTS. Also,
naming an schema "sys" does not make it a system schema.

Francisco Olarte

Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines

2022-03-13 Thread Francisco Olarte

Hi Daniel:

On Fri, 11 Mar 2022 at 19:38, Daniel Verite  wrote:
> > These values are 'normal'. I'm not use to CSV, but I suppose
> > such newlines
> > must be encoded, perhaps as \n, since AFAIK CSV needs to be 1 line per row,
> > no?
> No, but such fields must be enclosed by double quotes, as documented
> in RFC 4180 https://datatracker.ietf.org/doc/html/rfc4180

CSV is really poiosonous. And in the multiplan days, which was nearly
RFC4180, it was tolerable, but this days where everybody uses excel to
spit "localized csv" it is hell ( in spain it uses ; as delimiter
because it localizes numbers with , as decimal separator, you may have
similar problems ).

Anyway, I was going to point RFC4180 is a bit misleading. In 2.1 it states:
>>>
   1.  Each record is located on a separate line, delimited by a line
   break (CRLF).  For example:

   aaa,bbb,ccc CRLF
   zzz,yyy,xxx CRLF
<<<

Which may lead you to believe you can read by lines, but several lines
after that in 2.6 it says

>>>
   6.  Fields containing line breaks (CRLF), double quotes, and commas
   should be enclosed in double-quotes.  For example:

   "aaa","b CRLF
   bb","ccc" CRLF
   zzz,yyy,xxx
<<<

Which somehow contradicts 2.1.

In C/C++ it's easily parsed with a simple state machine reading char
by char, wich is one of the strong points of those languages, but
reading lines as strings usually leads to complex logic.

Francisco Olarte.

Re: COPY TO STDOUT WITH (FORMAT CSV, HEADER), and embedded newlines

2022-03-13 Thread Francisco Olarte

Dominique:

On Fri, 11 Mar 2022 at 21:13, Dominique Devienne  wrote:
> But sure, if TEXT does the kind of pseudo-CSV I need, I'd change it to use it.

Text, the original format for copy, is much easier to manage than CSV.
It can easily be managed as you can split the whole input on newlines
to get records, split each record on tabs to get fields, then unescape
each field.

In C++ you can easily read it a char at a time and build along the way
or, if you have a whole line, unescape it in place and build a
vector pointing two the buffer. If you are testing, the split
on newline/split on tab approach gives you a list of escaped strings
easily compared to escaped patterns.

I've never had problems with it in decades, and in fact I use a
extension of it ( with a \E code similar to the \N trick for nulls
for 0-element lines, which are not useful in db dumps, but I and I
need to use, as "\n" decodes to {""} but I need to express {}, which I
emit as "\\E\n". It is and old problem, "join by tabs join by
newlines" makes things "prettier" but can lead to no final new line
and no way to express empty sets, "terminate with tabs terminate with
newlines" leads to uglier/harder to read lines but can express them).

Francisco Olarte.

Re: Fresh eyeballs needed: input into error [FIXED]

2022-04-27 Thread Francisco Olarte

Rich:

On Wed, 27 Apr 2022 at 14:38, Rich Shepard  wrote:
> On Tue, 26 Apr 2022, Bruce Momjian wrote:
> > I am curious what OS psql was using that was fixed by a re-login?
> This desktop's running Slackware64-14.2.

Where you using cut&paste / selection or something similar? Lately
I've been getting weird errors on paste operations due to bracketed
paste, which are puzzling, It seems to have interactions with readline
and other stuff depending on where the cursor is and some other
things. Last I remembered when pasting some things to guile invisible
chars ( paste brackets ) where intercepted by its REPL leading to
bizarre errors. And sometimes after a relogin things are done a bit
different and make problem disappear.

Francisco Olarte.

Re: External psql editor

2022-04-29 Thread Francisco Olarte

Hi Rich:

On Fri, 29 Apr 2022 at 17:55, Rich Shepard  wrote:
> I do all my postgres work using the psql shell. Editing a command reguires
> moving character-by-character and I'd like to use my small text editor (joe)
> because it allows more control over line movement.

I do a similar thing, but normally edit queries in an editor window
and just use selection or clipboard to paste them into the xterm where
I have psql running. I also used joe a lot ( its key sequences where
easy coming from wordstar(cp/m->msdos) ).

> A web search found a stackexchange thread that suggested adding to
> ~/.bash_profile the line:
> export PSQL_EDITOR=/usr/bin/joe
> so I did this yesterday.
>
> Today I've learned that the keyboard chords I use in joe in other
> applications aren't working here. For example, C-w should delete the word to
> the right of the point (cursor location). It doesn't. Instead, it deletes
> from the cursor postion to the head of the line. C-x doesn't move the cursor
> one word to the right, but cancels the command.

What do you mean by "here"? IIRC PSQL_EDITOR sets the editor for \e,
not for the psql command line. For that you could try writing a
binding for readline ( which you could also use in bash if you like
them ) with the joe keyseqs, but I fear it's editing model is a nit
different.

I've done "PSQL_EDITOR=joe psql" to refresh my memory and it is in
fact as I remembered.

> Is there a way for me to specify use of joe at the psql command line? (I'd
> use emacs but that's gross overkill.)

The use of joe AS EDITOR for a single command can be done with the
above method. The use of joe keys for editting the normal psql line
would probably require readline wizardry.

I've read joe has slave shell sessions. Other thing you could try (
I've done it with emacs shell mode ) is use that, but I fear it only
works well with single line queries. Or look if it has some kind of
sql modes ( interactive sql, not sql-script-syntax-highlight ).

Francisco Olarte.

Re: Backing up a DB excluding certain tables

2022-05-04 Thread Francisco Olarte

Jorge:

On Wed, 4 May 2022 at 18:12, JORGE MALDONADO  wrote:
> I just found the solution (or 1 solution).
> It seems that the problem resides in tables with names containing characters 
> other than lowercase letters.
> I want to exclude the following tables from the backup:

> AspNetRoleClaims
> AspNetRoles
> AspNetUserClaims
> AspNetUserLogins
> AspNetUserRoles
> AspNetUserTokens
> AspNetUsers
> __EFMigrationsHistory
>
> One pg_dump command that worked correctly is as follows:
>
> pg_dump -f c:/temp/respaldo.backup -n riopoderoso -F p -h localhost -p 5433 
> -U postgres -W -s -T *.?sp?et* -T *.*igrations?istory  riopoderoso
>
> As you can see, uppercase letters and underscore characters were avoided in 
> both -T arguments. In this way, the dumped file was generated successfully 
> without the files listed above.
> It was not even necessary to issue the command cmd.exe /c chcp 1252 in the 
> command prompt before running pg_dump.
> I also tried different combinations of single quotes and double quotes but 
> none worked.

> This behavior of characters other than lowercase letters in table names is 
> present no matter if the database is originally created with UTF8 encoding. 
> The problem persists in such a case too.

When I initially saw your question talking about chcp and similar
stuff I skipped it, not having used windows for more than a decade.

The problem you are seeing may be due more to your shell/OS combo than
to other things. In unix, where psql is easier to work with,
to execute a program the OS passes it the arguments vector properly
separated. The different shells are responsible for building these
arguments,
process wildcards and other things an have very well documented
quoting rules to let the user generate exactly what he wants passed to
the
programs. This is why in unix we have to quote * whenever it appears
in a table name and similar stuff.

IIRC In MSDOS mode, whith windows inherited, the shell does some
substitutiton to the command line and then passes whole to the
process, which,
if it is a C program, is then responsible of reparsing it,
reprocessing quotes, expand wildcards and call main. Quoting is poorly
documented and
processing may vary for each program using different CRTs ( that is C
runtime, not cathode ray tube ).

Now for the thing. I'm going to use braces for quoting... If you need
to send a table name with uppercase letters to postgres related
programs you
normally need to insure it receives it in argument with double-quotes,
i.e. {"AspNetRoles"}. In unix this easy, in the bash shell I use the
easy way is to surround the
double quotes with single quotes, {'"AspNetRoles"'} ( you may need a
non proportional font to read that), or just escape the quotes
{\"AspNetRoles\"}
or several other variations. But in windows the quoting rules are
difficult to master, and I'm not sure if you can do that easily ( back
in the days I had
a program which dumped the arguments it received to check what the
shell was doing to my command line ).

At the end of https://www.postgresql.org/docs/14/app-pgdump.html there
is a sample double quoting, but I do not know if this is for a windows
shell. It would
work with unix shells, but is a little convoluted, so may be it is the
thing to try ( try it with an schema only dump redirected to dev/null
( I do not remember
it is windows name, I think it was NUL: )).

An I remember pg_dump had a catalog mode, but it seems to have
dissapeared in recent versions ( you gave it a switch, it wrote a
commented list
of IDs which you could edit ( normally avoiding reorders ) and feed
back with another switch to make partial backups, it was really useful
for selective
backups of complex stuff ).

Francisco Olarte.

Re: Backing up a DB excluding certain tables

2022-05-04 Thread Francisco Olarte

Replying to self...

On Wed, 4 May 2022 at 19:19, Francisco Olarte  wrote:

> An I remember pg_dump had a catalog mode, but it seems to have
> dissapeared in recent versions ( you gave it a switch, it wrote a
> commented list
> of IDs which you could edit ( normally avoiding reorders ) and feed
> back with another switch to make partial backups, it was really useful
> for selective
> backups of complex stuff ).

Got it wrong, after consulting some old notes it is pg_restore which
has the -l/-L switches, what we did, was full-dump a db for upgrading,
write TOC, edit TOC to restore only schema on some tables, restore it
faster, go live,  then use rest of the TOC to restore the missing
tables ( they where huge logs, just affected reports which could wait
).

Francisco Olarte.

Re: Psycopg3 fails to resolve 'timezone localtime' on MacOS

2022-05-06 Thread Francisco Olarte

Jerry:

On Fri, 6 May 2022 at 03:04, Jerry Sievers  wrote:
> Has anyone run into This?
> Psycopg3 fails to resolve timezone localtime on MacOS (Catalina).
>
> It falls back to UTC regardless of whether running with/without the
> tzdata package which I did try.
>
> There is a /etc/localtime symlink on this box pointed at the correct
> file in the tz data directory, which apparently doesn't get used.
>
> Adding a temporary symlink directly into the tz data directory got
> things working but I'm skeptical of such a hack.

I'm not using Mac, but Linux, and not python, but I think you have
some concepts mixed.

In Linux, and it seems Mac is pretty similar, to define a timezone you
put something readable in tzfile(5) format  ( i.e. a file or a symlink
to a file as you did ) in /usr/share/zoneinfo ( this is the "tzdata
directory" ).

Then, to set the system default timezone you copy ( or link ) a tzfile
in /etc/localtime. Note this DOES NOT define a named timezone that you
can use. And there is NOT a timezone named "localtime".

Normaly you use a symbolic link from etc/localtime to the timezone
directory, which has the nice side effect of letting you read the link
to know the name, but it does not have to be. I think this is leading
you into thinking "localtime" is a valid name, which is not, or was
not until you defined it but creating the name by adding the temporary
symlink.

If you want to have a zone named "localtime" pointing to the default
zone, adding a symlink in the tzdata dir named "localtime" ( directory
entries is what define the names ) pointint to /etc/localtime will do
it, but I would not recommend it.

Your errors are pretty extrange. Are you exporting TZ or whichever env
var MacOs uses or a similar thing set to "localtime" in any of the
relevant environments ( i.e., the server or client process ), or
setting the server timezone to "localtime"?

Because normally when you want "localtime" what you want is the
process system default timezone, so you do not set anything in the
process ($TZ) included, and let the library fallback to the system
default timezone ( unnamed ). This seems like some configuration
problem.

Framcisco Olarte.

Re: generated column cast from timestamptz to timestamp not OK.

2022-05-13 Thread Francisco Olarte

On Fri, 13 May 2022 at 12:47, alias  wrote:
> CREATE TABLE test_g (
> a timestamptz,
> b timestamp GENERATED ALWAYS AS (a::timestamp) STORED
> );
> then an error occurred.
>> ERROR:  42P17: generation expression is not immutable

Cast to timestamp uses current session time zone, current session time
zone is not inmutable.

Try forcing the time zone ( a at timezone $whatever ) ( or use a view
if you need it ).

> However  the following 2 commands is ok.
...
>> b bigint GENERATED ALWAYS AS (a::bigint) STORED
...
>> b text GENERATED ALWAYS AS (a::text) STORED

Because conversion from int to bigint is inmutable, just extend sign
bit, and from varchar to text too, they are the same.

> I didn't get it. timestamptz changes then timestamp also changes. timestamp 
> is part of timestamptz...

I think you are falling in the common misconception that a timestamp
is something like "MMDDhhmmss" and a timestamptz is the same plus
a time zone. They are not ( look at the docs, they both have the same
size ).

A timestamptz dessignates a point in time, is like a real number,
which is printed ( and read by default ) in the timezone of the user
session. It DOES NOT STORE A TIME ZONE. Its external ( text )
representation varies with the session parameters. Use set timezone
and play a bit to see it.

A timestamp is more or less the same, but is always printed/read as if
it where in the UTC timezone, and the time zone is not printed ( as it
is always the same ). But inside is also just a real number. It is
just a compact and convenient way of storing "MMDDhhmmss",
calculate the timepoint in that utc time and store it.

To convert between them in a inmutable way you need to specify how and
use the at tz operator.

Try it in a command line.

FOS

Re: How to get response message

2022-06-10 Thread Francisco Olarte

On Fri, 10 Jun 2022 at 18:38, Rama Krishnan  wrote:
>
> Hi All,
>
> I am want to delete old records using function so my senior has function like 
> below but I want to get response of this particular inside query wheter it is 
> successful or failure

> How to get response of the function status

> Drop table test_old;
> Create table test_old as select * from sales where bill_date '1 year';
> Delete table sales where sales_id in (select sales_id from test_old;

I do a similar thing routinely and use a "move", ( insert into archive
delete from live where yadayada returning whatever ). I suppose you
could do a simiar trick.

drop table test_old; -- Beware of this, it makes your
functiondangerous, if you execute it twice you loose data.
create table test_old as delete from sales where bill_date

Re: ADD COLUMN ts tsvector GENERATED too slow

2022-07-06 Thread Francisco Olarte

On Wed, 6 Jul 2022 at 11:55, Florents Tselai  wrote:
> Also, fwiw looking at top the CPU% and MEM% activity, looks like it does data 
> crunching work.
...
> >> On 06.07.22 10:42, Florents Tselai wrote:
> >>> I have a beefy server (40+ worker processes , 40GB+ shared buffers) and a 
> >>> table holding (key text, text text,) of around 50M rows.
> >>> These are text fields extracted from 4-5 page pdfs each.

How big is yout table? from your query it seems you expect more than
1M-1 ( left... ), but if you have very big text columns it may be
spending a lot of time fully decompressing / reading them ( I'm not
sure if it left(..) on toasted values is optimized to stop after
reading enough ). Also, it has to rewrite a lot of data to insert the
columns, it it takes some ms per row which I would not discard 50M
rows * 1 ms / row = 50ksecs = 500k secs ~=13.9 hours per ms-row, so at
2 ms ( which may be right for reading a big row, calculating the
vector and writing an even bigger row ) it would take more than a day
to finish, which I would not discard given you are asking for a heavy
thing.

If you have stopped it I would try doing a 1000 row sample in a copied
table to get an speed idea. Otherwise, with this query, I would
normally monitor disk usage of disk files as an indication of
progress, I'm not sure there is another thing you could look at
without disturbing it.

FWIW, I would consider high mem usage normal in these kind of query,
hi cpu would depend on what you call it, but it wouldn't surprise me
if it has at least one cpu running at full detoasting and doing
vectors, I do not know if alter table can go paralell..

Francisco Olarte.

Re: ADD COLUMN ts tsvector GENERATED too slow

2022-07-06 Thread Francisco Olarte

Hi Florents:

On Wed, 6 Jul 2022 at 12:35, Florents Tselai  wrote:
> 50M+ rows and iirc pg_relation_size was north of 80GB or so.

Somebody already mentioned pg_table_size, but anyway this gives you
1.6Kb per row, which I would suspect is extremely low given your pdf
content and the 1M truncation you preventively applied.

When working with this kind of sizes it really pays to do some good
ole "back of the envelope" calculations and/or some tests with small
batches. Bear in mind if you have, say, 10k per row ( which I would
label as really low for pdf content ) you are going to have 500Gb of
data, if you manage to process this at 100Mb per second you will have
more than an hour ( and I think I'm estimating really low ).

FOS

Re: postgresql generate ddl returns FK with `()` in it

2022-07-08 Thread Francisco Olarte

On Fri, 8 Jul 2022 at 12:37, Roman Gavrilov
 wrote:
> I have a schema with two tables, where the first table's composite 2-columns 
> FK points to the second table's 2-columns composite PK. But the order of 
> columns is reversed.

Note sql does not look at the NAMES of the foreign key columns or the
primary key columns, it matches them by position.

Also note you will have a higher success probability in this kind of
question if you provide the table definitions postgres is using,
pg_dump with the
-s, --schema-onlydump only the schema, no data
-t, --table=PATTERN  dump the specified table(s) only
options is a simple way to get those.

You have not even cited which programs has this "generate ddl" command.

> When I use `Generate DDL` on the first table, FK definition comes back as:
> ALTER TABLE eval.rubric_questions ADD CONSTRAINT rubric_questions_fk FOREIGN 
> KEY (rubric_id,rubric_version_id) REFERENCES ();
> How can I fix that `();` issue? Is this documented behavior?
> (I realize that I need to fix the schema, but in time-being I need to 
> generate code using entity framework, and it chokes in that).

Ah, you expect people to hunt your link to SO:
> Full details on SO:
> https://stackoverflow.com/questions/72898296/what-is-in-postgresql-ddl

This is frowned upon and lowers your success rate, AAMOF I've only
replied to this because I'm waiting for someone to get out to lunch
and burning some minutes, normally I would have just bitbucketed your
mail.

>From the SO one can see you are using DBbeaver, If I copied it right.
This seems to be a bug in it, not in postgres. You can have a pk on
columns (a,b,c,d) and have an fk on columns (w,x,y,z), in fact it is
common, like in the classical tree (pk=id, fk=parent_id references
same table).

But anyway, be careful, postgres does not match columns in keys by
name, but by position, and from the little detail you provided in SO
it appears that aside from trigering a bug in dbbeaver your schema is
totally wrong and will malfunction. Post more details and someone may
be able to help you.

Regards.
   Francisco Olarte.

Re: Error when pasting function blocks into psql

2022-07-08 Thread Francisco Olarte

Hi Ludwig:

On Fri, 8 Jul 2022 at 12:37, Ludwig Isaac Lim  wrote:
> I noticed that I'm getting errors when copy and pasting code for stored 
> procedure from Windows directly into psql running in putty.
> To reproduce create a procedure that has multiple statements with each 
> statements having multiple lines, similar to the ones below:
...
> Copy and paste from Windows editor (or IDE) to a psql running in a Putty. 
> Chances are it will complain of syntax error. The error doesn't appear when I 
> copy and paste it to an empty file in vim, and the run:
> psql -f 
> Is there a way of prevent this kind of error when pasting directly into psql? 
> I'm using the latest version of PostgreSQL (server and client. Version 14.4)

You omitted the OS version which is running psql/vim, I assume it is
some king of *ix, probably linux, and which kind of paste you are
using in putty, IIRC it has a couple.

Probably you are sending extra spaces or \015 or something like that
which vim filters for you on save. You can probably confirm these by
pasting your lines into "cat > file" ( remember EOF ing after paste,
enter+^D is the usual way ) and trying to feed that file to psql.

Not having used windows or vim in 20 years I can only think on one
thing to help you with these kind of problems. You'll probably have od
(octal dump) installed. Try pasting the code to 'od -tx1 -tc' which
will give you a nice dump from where you can see if your
IDE/EDITOR/putty is plain nasty tricks ( if you tried pasting into cat
you can cat that file and the one from vim ).

Francisco Olarte.

Re: - operator overloading not giving expected result

2022-07-08 Thread Francisco Olarte

Hi Rajesh:

On Fri, 8 Jul 2022 at 12:36, Rajesh S  wrote:
> We are migrating our database from Oracle to Postgresql.  In oracle we have 
> used this syntax "SELECT ('1999-12-30'::DATE) - ('1999-12-11'::DATE)" to get 
> difference between two dates as a integer output (ex: 19).  But in Postgres 
> the same query returns result as "19 days".  Because of this we are getting 
> errors while assigning this query output to a numeric variable saying "ERROR: 
> invalid input syntax for type numeric: "1825 days"" and "ERROR: operator does 
> not exist: interval + integer".  To avoid changing the application code in 
> many places to extract the number of days alone, we tried operator 
> overloading concept as below.

Are you posting exact code? Because current doc states:

date - date → integer
Subtract dates, producing the number of days elapsed
date '2001-10-01' - date '2001-09-28' → 3

I remember it always working that way and 8.0 docs document it that
way too ( and I've used a lot of date arithmetic even in 7.x )

Maybe your EXACT code is not what you have pasted, but rather you
inserting dates in timestamp columns and then substracting said
columns? Because even your operations are defined in terms of
timestamptz, not dates.

Francisco Olarte.

Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-19 Thread Francisco Olarte

On Tue, 19 Jul 2022 at 18:50, Sebastien Flaesch
 wrote:
>> If that's the behavior you want, you can build it out of standard SQL 
>> facilities (e.g. update a one-row table).
> Can you elaborate please?
> Do you mean the code should use an UPDATE on a one-row table to acquire a 
> lock?

That's the usual SQL way. You can, using the appropiate isolation
level, select the max() from the table using the IDs  ( but this will
probably lock it whole ), you can use a one-row table with a current
int column as a sequence, and set current=current+1, but this will
lock everyone inserting ( and need an extra table ). You can use a
name-current unique sequences table and update that ( it saves tables
but it can worsen locking, as many engines locks whole pages ). The
consecutive gapless sequence problem surfaces a lot, but there is no
good way to solve it without locking, as once you get the value
everybody else needing it has to wait until you decide whether to
commit on rollback.

Sequences are for when you need autogenerated keys. They use
consecutive integer blocks nearly everywhere because they are easy to
code, play well with btree indexes, correlate well with insertion
times, but they give up the "no hole" approach to improve performance.
They even can return values unordered in different sessions (
sequences acquire, and burn,  blocks in a locking way, when you need
high concurrency you make them return big blocks, but them values from
different sessions can be delivered out of order ).

If you need correlative numbers, byte the bullet and code it, trying
to do it with sequences will probably end with an approach which will
fail in some corner cases and be slower than just coding it with a
single row table ( just remember to check it, so you do not have the
fun which ensues in oracle when someone puts two rows in dual in the
old days ). All the queries you have posted are probably more complex
than a couple procedures to update and read a table, even using eval
to simulate named sequences if you want ( just remember to always
pre-increment before reading, there is a reason everybody does it that
way ).

As a remark, in Spain bill numbers need to be gapless increasing. I
have done it with a sequence ( with cache 1, so I doubt it is much
more performant than a single row table, as it has a backing element
which needs to be saved ), and just used another process which swept
the values and filled the gap a posteriori ( there is a way to fill a
gap, has it legal uses, you can use it for that as long as you do not
have too many, basically insert cancelled bills in the gaps ). I
probably would do it with the one-row table now for simplicity, I
doubt I gained enough performance to justify the 20-something perl
lines plus the crontab entry to use sequences. As beard turns grey I
apreciate more the simpler ways to do things.

Francisco Olarte.

Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-20 Thread Francisco Olarte

Hi Gavan.

On Wed, 20 Jul 2022 at 00:10, Gavan Schneider  wrote:
> On 20 Jul 2022, at 4:08, Francisco Olarte wrote:
> As a remark, in Spain bill numbers need to be gapless increasing. I
> have done it with 
>
> One answer to this problem has been around for a while, and my version is 
> shown below.

If I read it correctly, leaving the 700 stuff aside, this function
gives you the first free transaction_ref. This is not valid for my
problem, I need them increasing, 1-3-2 is not a valid sequence. The
trick I use is that I can have "dummy" records, which do not have real
data ( or dates, which is what, among other things, makes me need them
increasing ), so I generate 1-3-4 and then insert 2 in batch which
values adequate for legal (i.e., I can just use 1.date for 2.date ).

I think what you do is generate account numbers which should be
gapless in the medium term and should be increasing as needed, wherte
the gapless is more important to you than the increasing.

> No extra table is needed, just a windowed look at the table where the 
> reference is used.

 It is normally never needed, thay are used solely for performance and
to avoid locking. A current_transaction_seq

> My usage is at human speed so performance is not an issue but it should be 
> pretty good if there are minimal holes. What is not addressed is that a 
> reference can be reissued upto the time the calling process commits an entry 
> in the table and takes that reference out of circulation. There are different 
> approaches to handling/preventing such collisions.

Your construct seems pretty expensive, and I'm not sure how much
locking it does at serializable. Also, given it needs recovery ( the
single row table plus back-fill does not, if you consider the
back-filling "bussines as usual" ). Also the reissuing of a number is
a no-go in my automated systems, it would need extensive recovery, in
that case I can use a gap-less approach by simpy selecting max(id) (
and given it is indexed, using a select for update on the max record
by ordering desc and limit 1, if I'm not too confused this would give
no gaps, increasing at the expense of reduced concurrency ).

...
> $COMMENT$
> Determines the next available reference number, making sure to fill any holes.

No going back allowed in my case. Simplifying it, a bill may reference
an older one in its data, and its number MUST be less. The gap problem
is there MUST be NO gaps WHEN I "print" them. And I can use dummies,
but I SHOULD not use them. In practice, you rarely use more than a
couple of dummies a year. I could strictly print dummies when I need
them, but they MUST be always the same, and the easier way is to
insert them.

Francisco Olarte.

Re: How does postgres sort large strings?

2022-07-22 Thread Francisco Olarte

On Fri, 22 Jul 2022 at 16:46, Sergey Burladyan  wrote:
> I thought that the sorting values are stored entirely in work_mem, but in 
> fact it works somehow differently.
> Can anyone suggest how this works?

In the classic go by chunks way?

To sort values you need to compare them, to compare strings you do not
need the whole string, i.e. if you have to 1000 byte strings, one is
500A,500B, other is 1000A, to compare them ( using C locale,  others
can be done in a similar way ) you can read 10 bytes from each and
compare, if they are the same, read 10 more, if they are not you are
done, if you hit the end of both strings, they are equal, if you hit
the end of one ( the shorter ), that one goes first. You can even do
it a character at a time. In the example, after looping 50 times on
10A you hit 10B, 10A, second string goes first, you do not even need
to look at the rest. A char at a time will end on the 501 char.

And probably PG can compare the strings in the shared buffers, so it
only needs some housekeeping information in work mem, and rely on its
infrastructure to bring the contents into shared buffers. I do not
think you are estimating memory usage right.

Francisco Olarte.

>
> For example, I created this 512MB incompressible file and test table:
>
> tr -dcs '[:print:]' '[:print:]' < /dev/urandom | tr -d '"'\' | dd bs=1K 
> count=512K of=asciidump
>
> create unlogged table t1 (v text);
> insert into t1 select pg_read_file('asciidump') from generate_series(1, 10);
>
> select pg_column_size(v), octet_length(v) from t1 limit 1;
>  pg_column_size | octet_length
> +--
>   536870912 |536870912
>
> set work_mem to '64MB';
>
> Now I think that 64MB is not enough to sort such large values and postgres 
> will use temp files,
> but in fact it does not.
>
> select temp_files, temp_bytes from pg_stat_database where datname = 
> current_catalog;
>  temp_files | temp_bytes
> +
>   0 |  0
>
> explain (analyze,verbose,buffers) select v from t1 order by v;
>   QUERY PLAN
> ---
>  Sort  (cost=94.38..97.78 rows=1360 width=32) (actual time=6433.138..6433.140 
> rows=10 loops=1)
>Output: v
>Sort Key: t1.v
>Sort Method: quicksort  Memory: 25kB
>Buffers: shared hit=543881 read=679794 written=118012
>->  Seq Scan on public.t1  (cost=0.00..23.60 rows=1360 width=32) (actual 
> time=0.007..0.009 rows=10 loops=1)
>  Output: v
>  Buffers: shared hit=1
>  Planning Time: 0.035 ms
>  Execution Time: 6433.155 ms
>
> > Sort Method: quicksort  Memory: 25kB
>
> select temp_files, temp_bytes from pg_stat_database where datname = 
> current_catalog;
>  temp_files | temp_bytes
> +
>   0 |  0
>
> WOW! How does it work?! :-)
>
> --
> Sergey Burladyan
>
>

Re: Password reset link / 'less' does not exit in psql version 13.4

2022-07-25 Thread Francisco Olarte

Michael:

On Mon, 25 Jul 2022 at 12:01, Michael J. Baars
 wrote:
> 2) I have three psql clients running, a version 12.6, a version 13.4 and a 
> version 14.3. Until now a 'select * from table;' showed the output in 'less' 
> or something alike and exited from 'less' when
> the output was complete. Both version 12.6 and version 13.4 work that way. 
> Version 14.3 does not exit from 'less' when the output is complete. Did 
> anyone notice this already?

AFAIK psql just pipes and waits ( and I've piped through some programs
of mine which could normally show bizarre behaviour if psql did
otherwise )This seems like you tuned the pager command or the default
less options, from man less:
>>>
   -e or --quit-at-eof
  Causes less to automatically exit the second time it
reaches end-of-file.   By  default,  the
  only way to exit less is via the "q" command.

   -E or --QUIT-AT-EOF
  Causes less to automatically exit the first time it
reaches end-of-file.
<<<

This can come from a variety of sources, see man less, search for lesskey, ...

Check also PAGER and PSQL_PAGER, and .psqlrc and friends, you may have
some things there.

Francsico Olarte.








>
> Best regards,
> Mischa Baars.
>
>
>
>

Re: Purpose of DEFERRABLE _and_ INITIALLY DEFERRED foreign key constraint checking?

2022-08-04 Thread Francisco Olarte

Ron:

On Thu, 4 Aug 2022 at 02:30, Ron  wrote:
> DEFERRABLE
> NOT DEFERRABLE
> This controls whether the constraint can be deferred. A constraint that is 
> not deferrable will be checked immediately after every command. Checking of 
> constraints that are deferrable can be postponed until the end of the 
> transaction [/quote]

> But yet a DEFERRABLE FK constraint in a transaction immediately failed on a 
> FK constraint violation.

Because, as the name hints, it is DEFERRABLE not DEFERRED.

> INITIALLY IMMEDIATE
> INITIALLY DEFERRED
> If a constraint is deferrable, this clause specifies the default time to 
> check the constraint. If the constraint is INITIALLY IMMEDIATE, it is checked 
> after each statement. This is the default. If the constraint is INITIALLY 
> DEFERRED, it is checked only at the end of the transaction.
> INITIALLY DEFERRED solved my problem.  Why do both clauses exist?

Note DEFAULT TIME. Both claused exists because you can change that
with SET CONSTRAINTS.

Not knowing your case I cannot comment on it, but one use case I've
found is when your system NORMALLY holds constraints valid across
statements, so you have DEFERRABLE INITIALLY IMMEDIATE to check it and
fail early and loudly on any bug. But you have some SPECIAL_OP (tm),
which only holds constraints at transaction end, because it does some
carefully controlled manipulations. You do this one with SET
CONSTRAINTS DEFERRED ( and triple testing as you have weakened your
safety net ).You can even toggle it according to the docs, i.e. toglle
a constraint to deferred, do several statements which end up in a
valid state, toggle to immediate to check it is really valid. It's all
in the set constraints docs.

> (A naive interpretation just by looking at the clause words led me to think 
> that INITIALLY DEFERRED would not check record validity when a constraint is 
> added to a table, but obviously that's wrong too.)

Not that obvious, but true. In the effort to make it read nice, like
natural languages,  SQL is difficult to interpret precisely, like
natural languages.

FOS.

Re: Behavior of identity columns

2022-08-04 Thread Francisco Olarte

On Thu, 4 Aug 2022 at 10:52, sivapostg...@yahoo.com
 wrote:
> I see 'identity' column values increment even when some error(s) occurs while 
> inserting data.  Is that a known bug or known behavior?

Known behaviour, explained thousand of times over the years.  Identity
is not a way to generate consecutive values, not ever guaranteed
ascending values, it is for generating unique values. The
implementation burn the values before commit/rollback for better
concurreency, search archives or docs for details. In your example, id
is allocated and burnt to generate the complete failing row, when it
fails it is gone. Search lists, google, the docs, its been explained
ad nauseam.

Francisco Olarte.

Re: Creating constraint dynamically

2022-08-22 Thread Francisco Olarte

On Mon, 22 Aug 2022 at 09:29, sivapostg...@yahoo.com
 wrote:
> We populate constraint string dynamically and add it to the table with alter 
> table command.  It gets added, but without the required brackets.

Brackets are NOT required in your example. You are doing a classic
form, OR of ANDs, sometime called sum of products. And has greater
precedence than OR ( amongst other things because this form of
conditions is used a lot ), as it has been pointed, so brackets (
round brackets are usually called parentheses abbreviated to parens,
in programming brackets is normaly nor used for those ) are not
necessary. What the system is doing is parsing and normalizing your
condition ( note how it changes bpchar to text and add types in some
places ) and reconstructing the condition, without brackets because
they are not required.

> Since there are AND and OR conditions, without brackets the whole conditions 
> becomes useless.

Before stating these things, test, try to insert a condition violating
row to see if the condition is really useless, it is very rare to find
a bug ( changing the semantics of the condition would be one ) like
these.

> How to create a constraint like the above one, with braces in tact ?   Or any 
> other way that we can implement a check constraint as above?

The condition IS correctly implemented, but nothing guarantees you the
text returned by pgadmin will be the exact same condition you sent. In
fact, AFAIK, nothing guarantees you can recover a condition set on a
column. It is transformed to an equivalent. I'm not sure how it is
exactly done, but it probably also does whitespace normalization and
constant folding.

Francisco Olarte.

Re: Resolving host to IP address

2022-09-12 Thread Francisco Olarte

On Mon, 12 Sept 2022 at 14:23, Sebastien Flaesch
 wrote:
> I managed to establish the secure connection, by using 
> DN=root.strasbourg.4js.com for the self-signed root CA, and 
> DN=toro.strasbourg.4js.com for the server certificate, DN=pgsuser for the 
> client certificate.
> I have created my client certificate by using the root CA.
> pg_hba.conf:
> hostssl all pgsuser toro.strasbourg.4js.com  md5 
> clientcert=verify-ca
>
> Server and client are on the same Debian 11 machine.
> It works, if I comment out the /etc/hosts line set by Debian Linux for my 
> host name:
> # 127.0.1.1   toro.strasbourg.4js.com toro

> The name "toro" is then resolved to the IP address provided by my DHCP server:
> root@toro:~# host toro
> toro.strasbourg.4js.com has address 10.0.40.61
>
> root@toro:~# host toro.strasbourg.4js.com
> toro.strasbourg.4js.com has address 10.0.40.61
>

> However, if I put the 127.0.1.1 line back in /etc/hosts, re-create all 
> certificates (is this actually needed? I guess no), restart the PostgreSQL 
> server, I get this error:
> $ psql 
> 'postgresql://toro.strasbourg.4js.com:5437/test1?user=pgsuser&sslmode=verify-ca&sslrootcert=./root.crt&sslcert=./client.crt&sslkey=./client.key'
> psql: error: connection to server at "toro.strasbourg.4js.com" (127.0.1.1), 
> port 5437 failed: FATAL:  no pg_hba.conf entry for host "127.0.0.1", user 
> "pgsuser", database "test1", SSL encryption

> What looks strange to me in this error is that first it mentions 127.0.1.1 
> (ok) but then, 127.0.0.1
(not having your full data, guessing a bit on typical configs here ).

Your loopback interface, "lo" which is used to connect to net 127.*
has probably the address localhost=127.0.0.1.

Postgres is probably binding to wilcard address. So when you tell psql
to connect to 127.0.1.1 it starts the tcp connection selecting the
interface address as source, per the route table, so your connection
is source=127.0.0.1, destination 127.0.1.1.

The error message up to the "failed:" is probably psql telling you
where it sent the connection, to toro=127.0.1.1.

The second part is the server telling you where it sees the connection
comming from.

> What am I missing here?

Probably some tcp tracing to see it in action. If you bind to *:5437
you can receive connections on any 127.* address. Your hosts uses this
trick for unknown reasons.

When you zap the host line everything works well because your
interface is probably 10.0.40.61, so the route table says use
10.0.40.61 as source.

You would probably experience the same problem if you added a second
IP address, say 1.2.3.4 to your interface and told dhcp to resolve
toro to it. In localhost you do not have to do anything because any
127 address can be used as source or connected to in loopback, it is
magic.

Also, if you want it to work you would need a second hostssl line
listing localhost as the source address, or, IIRC, you can try to
force the source address for connections to be toro using some deep
magic jujus, as psql does not seem to suipport setting it.

Francisco Olarte.

Re: Resolving host to IP address

2022-09-12 Thread Francisco Olarte

A little off topic but may be useful for someone:

On Mon, 12 Sept 2022 at 16:11, Tom Lane  wrote:
...
> Linux doesn't seem to show its routing of the loopback domain in netstat
> output, but it's probably much like what macOS shows explicitly:

Among other things it can be shown by "ip route show", although it is
a little too verbose:

$ ip route show table all
default via 192.168.0.1 dev enp39s0 onlink
192.168.0.0/24 dev enp39s0 proto kernel scope link src 192.168.0.2
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo table local proto kernel scope link
src 127.0.0.1
broadcast 192.168.0.0 dev enp39s0 table local proto kernel scope link
src 192.168.0.2
local 192.168.0.2 dev enp39s0 table local proto kernel scope host src
192.168.0.2
broadcast 192.168.0.255 dev enp39s0 table local proto kernel scope
link src 192.168.0.2
::1 dev lo proto kernel metric 256 pref medium
fe80::/64 dev enp39s0 proto kernel metric 256 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
local fe80::2d8:61ff:fe9f:267b dev enp39s0 table local proto kernel
metric 0 pref medium
multicast ff00::/8 dev enp39s0 table local proto kernel metric 256 pref medium

( table local trims it a bit ).

For debugging this things my first go is to "ip route get", which
gives the selected one:

$ ip route get 127.1.2.3
local 127.1.2.3 dev lo src 127.0.0.1 uid 1000
cache 

Francisco Olarte.

Re: Resolving host to IP address

2022-09-12 Thread Francisco Olarte

Hi  Sebastien:

On Mon, 12 Sept 2022 at 16:40, Sebastien Flaesch
 wrote:

> I think I got it: PostgreSQL should listen to the real, non-loopback network 
> interface.

Not really. Your problem is not where postgres listens, but where your
host line resolves, the 127.0.1.1 stuff, plus your rules.

In many OS you can bind a listening TCP socket to IN_ADDR_ANY plus a
port. The good thing of doing it is it will work even if you add /
delete new IP addresses. Postgres does this and, IMO, is a good thing.
You seem to have it configured that way-

> Just for info (local dev config, not prod):
> sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ grep listen_addresses postgresql.conf
> listen_addresses = 'localhost,toro.strasbourg.4js.com'

No, you do not, you list your interfaces.. Unless you have a
complex setup, which does not seem to be the case, I would recommend
just using the default "*", specially if you intend to run your
servers firewalled ( which 10.xxx hints to ). This would lead to
something like ..

tcp0  0 0.0.0.0:54320.0.0.0:*   LISTEN
tcp6   0  0 :::5432 :::*LISTEN

( not done with pg, done with nc -l, but it worked the same last time
I checked it )

> sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ netstat -tl -W | grep 5437
> tcp0  0 localhost:5437  0.0.0.0:*   LISTEN
> tcp0  0 toro.strasbourg.4js.com:5437 0.0.0.0:*   
> LISTEN
> tcp6   0  0 localhost:5437  [::]:*  LISTEN

There is a piece of info missing here, where does your localhost resolve to.

> sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ netstat -tl -W --numeric-hosts | 
> grep 5437
> tcp0  0 127.0.0.1:5437  0.0.0.0:*   LISTEN
> tcp0  0 127.0.1.1:5437  0.0.0.0:*   LISTEN
> tcp6   0  0 ::1:5437:::*LISTEN

But from this it seems, if you have restored the 127.0.1.1 hosts line,
to 127.0.0.1+::1

> sf@toro:/opt3/dbs/pgs/14.1-openssl/data$ ip route get 127.0.1.1
> local 127.0.1.1 dev lo src 127.0.0.1 uid 1000
> cache 
This is expected.

IMO your problem arises from your pg_hba.conf, and your bizarre ( IMO,
I use debian too and one of the first things I do after installing is
dropping the 127.0.1.1 line, but I used fixed IP on a desktop, so YMMV
).
If you just lists localhost as origin ( which means you have to
duplicate the pg_hba.conf lines, listen_address=* will work for both )
it will work right, whatever your listen_adresses is ). You can also
RTFM a bit and notice the line "You can also write all to match any IP
address, samehost to match any of the server's own IP addresses, or
samenet to match any address in any subnet that the server is directly
connected to." ( it is easy to miss, I had to search for it, even if I
knew it existed ).

Note: localhost is a name, like toro, and is resolved,
samehost/samenet/all are magic.

Sumarizing, I would:

- Use listen_adresses=*
- Use samehost in your rules.

Which will lead to a lean config which probably does what you want.

Francisco Olarte.

Re: Using CTID system column as a "temporary" primary key

2023-03-29 Thread Francisco Olarte

On Wed, 29 Mar 2023 at 21:11, Sebastien Flaesch
 wrote:
> Oh the use of default keyword is new to me, thanks for that.
> But to make PostgreSQL more Informix-compatible, zero should have been 
> considered as well.

NONONO please! Someone already pointed a sequence can generate zero,
but even without that some of us may need to insert 0 ( i.e. because
we deleted the row with the zero-id and want recreate it or just
because, even if the default-generating sequence does not spit zeroes,
the PHB wants us to insert is employee-record with ID=0 or other
reasons). AFAIK serial just creates an integer column with a default,
doc (8.1) says its range is from 1, but I'm not even sure this is
enforced, when I've looked at the description of a serial column I do
not remember seeing anything more then the default, so you could
probably insert negatives. I , and I suspect others, would prefer to
be able to insert any int than copying a hacky ( and I suspect non
standard ) trick from informix.

Just write to informix and suggest them to implement DEFAULT on
inserts, it is much better ;-> . Being more informix-compatible may
sound as a feature to yoy, to me it sounds like a misfeature, like
needing DUAL to be more oracle-compatible.

Francisco Olarte.

Re: Do BRIN indexes support MIN/MAX?

2023-03-29 Thread Francisco Olarte

On Wed, 29 Mar 2023 at 22:07, Vladimir Sitnikov
 wrote:
> > Is it correct that BRIN indexes don't support MIN/MAX operations ?
> In theory, it should be possible to implement min/max scan support for BRIN, 
> however it is not implemented yet.
>
> Just in case, min/max query would require to read all BRIN pages, and then it 
> would require to read the corresponding pages in table.

> For instance, imagine the table has N pages. Then BRIN would have N/128 pages 
> with the default pages_per_range=128, so your max(..) query would take N/128 
> + 128 pages to read. In theory it would be sequential, however, under 
> concurrent load it might not be that sequential for the disk.

I think BRIN would require N/128 RANGES, not pages, and if I am not
mistaken it fits several ranges in an index page. It talks of summary
tuples, and I suspect a summary tuple for say, an integer, is not
gonna be longer, than 128 bytes, in which case you could fit 64 of
them in a 4k page.

Also, if you account for possible concurrent load disturbing your
index+partial scan, you also have to account for the ( more likely )
disruption on the full scan.

I.e., I have this table
 apc | apc_cdrs_p2022_12  | table |
postgres   | permanent   | heap  | 860 MB |
N/128 pages implies N/128 bytes, so index would be 6.7Mb in your numbers, but
 apc | apc_cdrs_p2022_12_cuando_idx   | index
   | postgres   | apc_cdrs_p2022_12  | permanent   | brin
| 64 kB  |
 apc | apc_cdrs_p2022_12_raw_id_idx   | index
   | postgres   | apc_cdrs_p2022_12  | permanent   | brin
| 64 kB  |
1st one is on a timestamp  column, second on an integer. And several
empty partitions hace 48kB indexes, so it seems data is just 16k for
the 860 ranges. That could be about 20 bytes/range which more or less
fits to a couple of values.

In my experience, BRIN are ridiculously small. I use them on that
particular table because both cuando and raw_id correlate with
insertion order and I normally only read several megabytes ranges
indexed on them, so they work very well in limiting the scan range to
nearly what I need.

> For instance, 80GiB table would be like 10’000’000 pages, so the default BRIN 
> would take about 78’000 pages (625MiB), so the min/max scan would read 626 MiB
> If pages per range is increased to ~3162, then index size would be ~3162 
> pages (25MiB), and each index entry would cover 25MiB range. Then the query 
> would have to read ~50MiB to fetch min/max. It is not clear if that is really 
> practical though.

If you assume your index fits 64 tuples per page your index read drops
to about 10Mb, plus the 1Mb range.

Also, I suspect you will have to read all unsummarized ranges (
probably before the summarized ones, as unsummarized can discard
summarizeds, but not the other way ).

Francisco Olarte.

Re: Using CTID system column as a "temporary" primary key

2023-03-30 Thread Francisco Olarte

On Thu, 30 Mar 2023 at 10:01, Dominique Devienne  wrote:
>> 2) 0 can be a valid sequence value:
> Of course. Yet, as above, if that is opt-in as specified in the `create 
> table` DDL somehow, then why not?
> BTW, default and 0 are not the same thing. You cannot bind "default" in place 
> of
> an integer-valued prepared-statement placeholder, in a binary mode insert. So 
> it is
> definitely not the same thing.

IMNSHO if you need to select between default and explicit in an insert
via binding you have a design problem, and down this path lies
madness.

> So while I can accept that not implementing that particular informix 
> compatibility wart
> is a perfectly valid position, for impl and maintenance cost, the arguments 
> I've read so
> far can be "easily" side-stepped from a technical perspective I suspect. FWIW.

Do not forget the runtime costs, once you start piling informix warts
over oracle warts over access warts over sybase warts over mysql warts
over sql server warts it adds up.

I do not think postgres target should be compatibilty with (select
sql_engine order by random limit n). Normally it tries to follow
standards, and do something reasonable when not possible, but this
informix wart sounds particularly worthless to implement. Beside your
use case I do not think it would serve for anything else than
encouraging people to use an ill dessigned informix feature.

Francisco Olarte.

1 2 3 >

1 - 100 of 234 matches

Mail list logo