Max sane value for join_collapse_limit?

2022-06-03 Thread Andreas Joseph Krogh


Hi, I have set join_collapse_limit = 12 in production, but I'm thinking about 
raising it to 16.

On modern HW is there a “sane maximum” for this value?

I can easily spare 10ms for extra planning per query on our workload, is 16 
too high?



Thanks.






--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 

GIN theory

2022-06-03 Thread huangning...@yahoo.com
Hi:
I want to know the time that create a gin index for a array, or some theory 
about gin index?

Thanks

Re: unoptimized nested loops

2022-06-03 Thread Tim Kelly
Thank you to David, Jeff and Tom for your responses.  Tom's response has 
made me rethink my question.  I may have provided too much information, 
in the effort to anticipate suggestions.  Let me rephrase:


I have two tables, a parent (named "metadata") and a child (named 
"data").  Each table has two fields.  One field is shared (the "id," an 
alphanumeric field).  The other field in each table is basically the 
constraint (named "author" in metadata) and the target (named "content" 
data).  Each table has about 1.25m records/rows.  There are about 1500 
orphaned child records, but all parent records have a child record.


When I do a search, as the "human in charge (HIC)" I know that the 
constraint from the parent table will yield a very small amount of child 
table records in which the target needs to be found.  A typical search 
yields a few hundred to a few thousand parent records, which should take 
milliseconds to search for the target.  A search of all of the child 
records for the target that is then compared against the constrained 
parent records to produce a pure intersection is very inefficient.


I found discussions from ten years or more ago about where the order of 
query arguments would affect the search optimization.  Depending on the 
database, the HIC would place a known driving constraint either last or 
first in the arguments.  This practice has been reasonably abandoned as 
the default practice, but I have been unable to find how to override the 
current practice of letting the planner do the work, and the planner is 
getting it wrong.  Something that should only take one or two seconds is 
now taking three to four minutes.


So, using an even more egregious scenario I found:

c_db=>select count(id) from metadata where author like '%Tim%';

 count
---
   261
(1 row)

Time: 650.753 ms


c_db=>select count(id) from data where data.content like '%some text%' 
and data.id in (select id from metadata where metadata.author like '%Tim%');


 count
---
31
(1 row)

Time: 207354.109 ms


Just as a reminder, this is 30 seconds longer than it takes to search 
the 1.25m records for the target:


c_db=> select count(id) from data where content like '%some text%';
  count
-
 1167810
(1 row)

Time: 180144.251 ms


To address David's suggestion of turning off enable_nestloop, this 
resulted in about a 10% improvement in speed.  I found no appreciable 
difference in time by setting statistics, although there was some 
ambiguity in the suggestion.  I assume the statistics are supposed to be 
set on the content column,


ALTER TABLE data ALTER COLUMN content SET STATISTICS 1000;


To address Jeff's inquiry about planning on the smaller set:

c_db=> explain analyze select count(id) from metadata where author like 
'%Tim%';
QUERY PLAN 


--
 Aggregate  (cost=114040.65..114040.66 rows=1 width=11) (actual 
time=681.639..681.639 rows=1 loops=1)
   ->  Seq Scan on metadata  (cost=0.00..114040.64 rows=5 width=11) 
(actual time=3.053..681.591 rows=261 loops=1)

 Filter: ((author)::text ~~ '%Tim%'::text)
 Total runtime: 681.685 ms
(4 rows)

Time: 682.142 ms

For completeness

c_db=> explain analyze select count(id) from metadata inner join data on 
metadata.id = data.id where author like '%Tim%' and content like '%some 
text%';
QUERY 
PLAN

---
 Aggregate  (cost=48203.00..48203.01 rows=1 width=11) (actual 
time=208239.776..208239.777 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..48202.99 rows=5 width=11) (actual 
time=34102.795..208239.754 rows=31 loops=1)
 ->  Seq Scan on data  (cost=0.00..47132.93 rows=125 width=11) 
(actual time=4.714..179369.126 rows=1167810 loops=1)

   Filter: (content ~~ '%some text%'::text)
 ->  Index Scan using metadata_pkey on metadata 
(cost=0.00..8.55 rows=1 width=11) (actual time=0.024..0.024 rows=0 
loops=1167810)

   Index Cond: ((metadata.id)::text = (data.id)::text)
   Filter: ((metadata.author)::text ~~ '%Tim%'::text)
 Total runtime: 208239.847 ms
(8 rows)

Time: 208247.698 ms


As for the version

$ psql --version
psql (PostgreSQL) 8.4.1
contains support for command-line editing


Let's describe this system as "legacy" and updating is not an option. 
If the planner from this version was not optimized compared to more 
recent versions, the need for an override is even greater.  However, I 
am very reluctant to believe the version is at the heart of the problem. 
 I believe I am missing something and perhaps failing to properly 
explain my need.


How do I override the planner and instruct the computer to do what I 
say, regardless of the 

Re: GIN theory

2022-06-03 Thread jian he
theory/explanation about GIN index:
https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README
https://postgrespro.com/blog/pgsql/4261647
https://pgpedia.info/g/gin.html


On Fri, Jun 3, 2022 at 2:34 PM huangning...@yahoo.com <
huangning...@yahoo.com> wrote:

> Hi:
>
> I want to know the time that create a gin index for a array, or some
> theory about gin index?
>
>
> Thanks
>


-- 
 I recommend David Deutsch's <>

  Jian


Re: Max sane value for join_collapse_limit?

2022-06-03 Thread Philip Semanchuk



> On Jun 3, 2022, at 4:19 AM, Andreas Joseph Krogh  wrote:
> 
> Hi, I have set join_collapse_limit = 12 in production, but I'm thinking about 
> raising it to 16.
> On modern HW is there a “sane maximum” for this value?
> I can easily spare 10ms for extra planning per query on our workload, is 16 
> too high?

I set ours set to 24 (from_collapse_limit=24 and geqo_threshold=25). Most of 
our queries that involve that involve 10+ relations have a slow execution time 
(20-30 minutes or more) so reducing planning time isn’t a major concern for us. 
If the planner takes an extra 20-30 seconds to find a plan that reduces 
execution time by 5%, we still come out ahead. 

That said, in our environment the planner can make pretty bad choices once the 
number of relations into the mid teens because we have some 
difficult-to-estimate join conditions, so we write our canned queries with this 
in mind, breaking them into two parts if necessary to avoid throwing too much 
at the planner at once. IOW, we generally don’t come anywhere near 24 relations 
in a query. Our very high join_collapse_limit might still come into play if a 
user writes a very complicated ad hoc query.

So (IMHO) as is often the case, the answer is “it depends”. :-)

Cheers
Philip



Re: Logically replicated table has no visible rows

2022-06-03 Thread Jeff Ross

On 5/31/22 11:46 AM, Jeff Ross wrote:

Hello,

We have a logically replicated table on RDS that is 39 G in size on 
both the publisher (10.21) and the subscriber (12.8).


The replication slots on the publisher are all marked as active and 
the lsns are current so no lag.


Other tables on the subscriber side are also identical in size and 
have no problem with queries against them.


We did a vacuum full on the subscriber table and the size dropped to a 
couple hundred MBs and is growing but still has no visible rows.


Now on the publisher I see a temporary replication slot so I'm 
guessing that in reality replication had not finished and that's why 
the load_events table was full sized but had no visible rows?


At this point I guess my questions are does my hunch that replication 
hadn't completely finished is the root cause of what we saw make sense?


I've always used both the lsns and the lack of temporary replication 
slots to tell when logical replication had finished--is there a more 
authoritative way to do so?


Thanks,

Jeff




As a follow up for the archives...

It became apparent the longer we looked that logical replication in fact 
had not finished and indeed never did finish.  On both an EC2 server and 
an RDS server we were missing at least one table that by table size 
appeared to be identical to the publisher's table and yet had no visible 
rows.  On the publisher I would occasionally see temporary replication 
slots but they would go away in a few minutes.


I then dropped the subscription on the RDS server and immediately 
restarted it with (copy_data = False).  As expected we immediately saw 
rows being inserted into that massive table but it still showed only 
those rows.  I have a "backfill" process to sync up logically replicated 
tables and that ran but very, very slowly.  Finally while syncing that 
big table it stalled completely.  Network monitoring showed about 
100Kb/s traffic to the RDS server.  That is not a typo.


Our client spun up another RDS server, imported all the settings from 
the previous one and I setup the database and imported the schemas.  As 
soon as we started replication we were seeing network transfers in the 
45Mb/s range.  Replication finished with all tables intact in under 5 
hours for 420G.


We changed nothing on our side so whatever was causing the glacial data 
transfer was on AWS side.


Jeff




Why password authentication failed for user "postgres"?

2022-06-03 Thread BeginnerC

Hello everyone,
I am a newbie to the postgres,when I use the psql to connect to the 
postgres,a error message printed:

These command list like this:

psql -U postgres
Password for user postgres:postgres
*postgreSQL: password authentication failed for user "postgres"*
*
*
How to solve this problem?
Thanks in advance!



Re: Why password authentication failed for user "postgres"?

2022-06-03 Thread Reid Thompson
On Sat, 2022-06-04 at 06:32 +0800, BeginnerC wrote:
> Hello everyone,
> I am a newbie to the postgres,when I use the psql to connect to the
> postgres,a error message printed:
> These command list like this:
> 
> psql -U postgres
> Password for user postgres:postgres
> postgreSQL: password authentication failed for user "postgres"
> 
> How to solve this problem?
> Thanks in advance!

this may work...

switch user to postgres.
$ sudo su - postgres
login to postgres
$ psql 
reset the password for postgres
postgres-# \password
 





'{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-03 Thread Bryn Llewellyn
Here’s the minimal testcase:

do $body$
declare
  j1  constant jsonb   not null := '{"x": 42, "y": null}';
  j2  constant jsonb   not null := '{"x": 42   }';

  predicate_1 constant boolean not null := (j1->>'y' is null) AND (j2->>'y' is 
null);
  predicate_2 constant boolean not null := j1 = j2;
begin
  assert predicate_1;
  assert not predicate_2;
end;
$body$;

The block finishes silently.

I certainly expect "predicate_1" to be true. This reflects the defined JSON 
semantics that, within an object, the omission of a key-value pair is the same 
as its presence with a value equal to the (bizarre) JSON null.

As I reason it, the truth of "predicate_1" implies the truth of "predicate_2" 
because "jsonb" claims to represent the underlying semantics of a JSON document 
using its own secret post-parse representation.

Am I missing a use case where an object with a key-value pair with a JSON null 
value is meaningfully different from one where the key is simply absent?

If not, would you consider what I showed to be evidence of a bug?

My discovery let me design other tests.

This block confirms the basic idea that the meaning of a "jsonb" value is 
independent of the formatting of the incoming document that defined it.

do $body$
declare
  t1 constant text not null := '
{
  "x": 42,
  "y": 17
}
';
  j1 constant jsonb not null := t1::jsonb;
  t2 constant text  not null := j1::text;
  j2 constant jsonb not null := t2::jsonb;

  predicate_3 constant boolean not null := t2 = t1;
  predicate_4 constant boolean not null := j2 = j1;
begin
  assert not predicate_3;
  assert predicate_4;
end;
$body$;

Another test (I won't bore you with its code) confirms that the order in which 
the incoming document lists key-value pairs has no consequence for its meaning.

Here's another twist on the same basic issue:

create type t_ as (a int, b int);
create function f()
  returns table(z text)
  language plpgsql
as $body$
declare
  j1_in  constant jsonb not null := '{"a": 17, "b": null}';
  j2_in  constant jsonb not null := '{"a": 17   }';

  r1 constant t_not null := jsonb_populate_record(null::t_, j1_in);
  r2 constant t_not null := jsonb_populate_record(null::t_, j2_in);

  j1_out constant jsonb not null := to_jsonb(r1);
  j2_out constant jsonb not null := to_jsonb(r1);
begin
  z := j1_out::text;  return next;
  z := j2_out::text;  return next;
end;
$body$;
select f();

This is the result:

{"a": 17, "b": null}
{"a": 17, "b": null}

I have a little demo where I shred a set of "books" incoming JSON documents 
(where a book has a genre and many authors) into the classic Codd-and-Date four 
tables: books, authors, genres, and book_author_intersections. Then I scrunch 
each book back to a single JSON doc. I want to prove that I get back what I 
started with. So I have to clutter what should be a simple test:

  differ constant boolean not null :=
(
with
  a as (select * from j_in  except select * from j_out),
  b as (select * from j_out except select * from j_in )

select (exists(select 1 from a) or exists(select 1 from b))
);

with no end of ad-hoc-ery.

Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-03 Thread David G. Johnston
On Fri, Jun 3, 2022 at 6:41 PM Bryn Llewellyn  wrote:

>
>
>
>
> *declare  j1  constant jsonb   not null := '{"x": 42, "y":
> null}';  j2  constant jsonb   not null := '{"x": 42   }';*
>


> *(j1->>'y' is null) *
>

This produces a JSON Null which when asked for as a text data type results
into SQL NULL due to implicit type coercion


> *(j2->>'y' is null)*
>

This produces "key y not present in JSON" but someone decided that was too
unfriendly and so we instead produce SQL NULL.  This SQL NULL exists for a
different reason than the SQL NULL in the previous case.  Unfortunately,
there is only a single concept of NULL in the SQL Language.


> *  predicate_2 constant boolean not null := j1 = j2;*
>

This is trivially false, j1 has a key of "y" while j2 does not.  If there
is a complaint to be had, this returning false isn't it.


> Another test (I won't bore you with its code) confirms that the order in
> which the incoming document lists key-value pairs has no consequence for
> its meaning.
>

Yes, the equality test is semantic, do all the same keys and corresponding
values exist?  If so, return true.

>
>
>
> *  j1_in  constant jsonb not null := '{"a": 17, "b":
> null}';  j2_in  constant jsonb not null := '{"a": 17   }';*
> This is the result:
>
>
>
> *{"a": 17, "b": null}{"a": 17, "b": null}*
>

Yep, for basically the same reason as the first point.  The nulls are
consequences of different situations (lack of key, value of key being null)
being largely undistinguishable at the SQL level.  We provide a "does key
exists" operator if you do need to make that determination.

I have a little demo where I shred a set of "books" incoming JSON documents
> (where a book has a genre and many authors) into the classic Codd-and-Date
> four tables: books, authors, genres, and book_author_intersections. Then I
> scrunch each book back to a single JSON doc. I want to prove that I get
> back what I started with.
>

Yeah, the lack of any goal of round tripping conversions from JSON through
SQL and back into SQL makes proving that the system does such a thing
problematic.  You'll get a back something meaningfully equivalent, by your
own argument, but not identical on a key-by-key basis.

David J.


Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-03 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
> b...@yugabyte.com wrote:
> 
>> declare
>>   j1  constant jsonb   not null := '{"x": 42, "y": null}';
>>   j2  constant jsonb   not null := '{"x": 42   }';
>> ... 
>> (j1->>'y' is null)
>  
> This produces a JSON Null which when asked for as a text data type results 
> into SQL NULL due to implicit type coercion
> 
>> (j2->>'y' is null)
> 
> This produces "key y not present in JSON" but someone decided that was too 
> unfriendly and so we instead produce SQL NULL. This SQL NULL exists for a 
> different reason than the SQL NULL in the previous case.  Unfortunately, 
> there is only a single concept of NULL in the SQL Language.
> 
>>   predicate_2 constant boolean not null := j1 = j2;
> 
> This is trivially false, j1 has a key of "y" while j2 does not. If there is a 
> complaint to be had, this returning false isn't it.
> 
>> Another test (I won't bore you with its code) confirms that the order in 
>> which the incoming document lists key-value pairs has no consequence for its 
>> meaning.
> 
> Yes, the equality test is semantic, do all the same keys and corresponding 
> values exist?  If so, return true.
> 
>>   j1_in  constant jsonb not null := '{"a": 17, "b": null}';
>>   j2_in  constant jsonb not null := '{"a": 17   }';
>> 
>> This is the result:
>> 
>> {"a": 17, "b": null}
>> {"a": 17, "b": null}
> 
> Yep, for basically the same reason as the first point.  The nulls are 
> consequences of different situations (lack of key, value of key being null) 
> being largely undistinguishable at the SQL level. We provide a "does key 
> exists" operator if you do need to make that determination.
> 
>> I have a little demo where I shred a set of "books" incoming JSON documents 
>> (where a book has a genre and many authors) into the classic Codd-and-Date 
>> four tables: books, authors, genres, and book_author_intersections. Then I 
>> scrunch each book back to a single JSON doc. I want to prove that I get back 
>> what I started with.
> 
> Yeah, the lack of any goal of round tripping conversions from JSON through 
> SQL and back into SQL makes proving that the system does such a thing 
> problematic. You'll get a back something meaningfully equivalent, by your own 
> argument, but not identical on a key-by-key basis.

Thanks for attempting to satisfy me, David—but I'm sad to say that I remain 
somewhat unsatisfied. This is doubtless my fault. I suppose that I knew, before 
I asked, that the ultimate answer would be "It is what it is and it will never 
change."

Of course, I can predict the outcomes of countless tests on the basis that I 
understand "what it is". Here's just one such (just like you suggested):

do $body$
declare
  j1 constant jsonb not null := '{"a": 1, "b": null}';
  j2 constant jsonb not null := '{"a": 1   }';

  n1 constant int := (select count(*) from jsonb_object_keys(j1));
  n2 constant int := (select count(*) from jsonb_object_keys(j2));
begin
  assert (j1 ? 'b');
  assert not (j2 ? 'b');
  assert (n1 = 2) and (n2 = 1);
end;
$body$;

The idea of "null" is a strange beast just within PostgreSQL—let alone all SQL 
systems. The classic understanding of its semantics is "There is simply no 
information available about the value". And this explains the treacherous 
three-by-three outcome table for operators like OR and AND. However, this 
understanding can't cope with the outcome here:

create type t_ as (a int, b int);
do $body$
declare
  b1 constant boolean := null;
  b2 constant boolean := null;
  r1 constant t_ not null := (b1, b2);
  r2 constant t_ not null := (b1, b2);
begin
  assert (b1 = b2) is null;
  assert (b1 is not distinct from b2);

  assert (r1 = r2);
end;
$body$;

I'd've thought that I'd need to weigh in with "is not distinct from" to get 
"r1" and "r2" to compare as the same just as I do with "b1" and "b2". So maybe 
it's not surprising that "null" in JSON is quirky too.

I'm going to try to think like this:

The number of possible spellings of the names of keys in a JSON object is some 
flavor of infinite. So including this in an object:

"k": null

really is saying something. It says that I do know about "k" and that yet I 
have simply no information available about its value.

The quirkiness that my first example showed can be seen differently from how I 
saw it first. When I now consider this expression:

('{"x": 42}'::jsonb)->>'y'

it seems that its evaluation should simply raise an exception. But you said:

> This produces "key y not present in JSON" but someone decided that was too 
> unfriendly and so we instead produce SQL NULL.


Oh well, I know how to program the cases that I care about to get the outcomes 
that I want. It just means lots of typing. But that's anyway what one signs up 
for who decides to work with JSON…



Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-03 Thread Tom Lane
Bryn Llewellyn  writes:
> I'm going to try to think like this:
> The number of possible spellings of the names of keys in a JSON object is 
> some flavor of infinite. So including this in an object:
> "k": null
> really is saying something. It says that I do know about "k" and that yet I 
> have simply no information available about its value.

I'd read it as asserting that key "k" is meaningful for this object,
but the correct value for that key is not known.

I have a hard time with your assertion that {"x": 42, "y": null}
should be considered equivalent to {"x": 42}, because it would
render key-exists predicates useless.  Either you have to say that
key "y" is claimed to exist in both of these objects and indeed every
object, or you have to make it fail if the key's value is null (so that
it'd say "false" in both of these cases).  Either of those options
seems both weird and useless.

> The quirkiness that my first example showed can be seen differently from how 
> I saw it first. When I now consider this expression:
> ('{"x": 42}'::jsonb)->>'y'
> it seems that its evaluation should simply raise an exception. But you said:
>> This produces "key y not present in JSON" but someone decided that was too 
>> unfriendly and so we instead produce SQL NULL.

Right.  This is hard to justify from a purist semantic point of view,
but having the operator throw an error in such cases would make it
close to unusable on not-uniformly-structured data.  And really the
point of using JSON inside a SQL database is to cope with irregularly-
structured data, so fuzziness seems like what we want.

regards, tom lane