Does "ON UPDATE" for foreign keys require index?

2019-04-25 Thread rihad
Hi. Say I have column A.b_id which references B.id (which is a primary 
key) and as such it is declared as a foreign key constraint. A.b_id has 
no index because it doesn't need one. What happens when table B's rows 
are modified (but never deleted)? Will PG still have have to scan A fo 
find A.b_id to do nothing with it? )) B.id itself is never modified, 
it's just a normal serial value typically used for id's.



The docs are a bit ambiguous:

Since a DELETE of a row from the referenced table *or an **UPDATE**of 
a referenced column* will require a scan of the referencing table for 
rows matching the old value, it is often a good idea to index the 
referencing columns too. Because this is not always needed, and there 
are many choices available on how to index, declaration of a foreign 
key constraint does not automatically create an index on the 
referencing columns.





https://www.postgresql.org/docs/9.6/ddl-constraints.html



Re: Does "ON UPDATE" for foreign keys require index?

2019-04-25 Thread Alvaro Herrera
On 2019-Apr-25, rihad wrote:

> Hi. Say I have column A.b_id which references B.id (which is a primary key)
> and as such it is declared as a foreign key constraint. A.b_id has no index
> because it doesn't need one. What happens when table B's rows are modified
> (but never deleted)? Will PG still have have to scan A fo find A.b_id to do
> nothing with it? )) B.id itself is never modified, it's just a normal serial
> value typically used for id's.

We have an optimization that if you update a row in a transaction just
once, and the column is not modified, then it won't need to scan the
referencing table.  However, if you make two updates in a transaction,
the optimization isn't smart enough to detect that the FK isn't
invalidated, so a scan will occur.  Therefore, if A is large [enough
that you care about a seqscan on it] and you expect to be doing more
than one UPDATE of B in the same transaction, then this could be
noticeable.

I suggest you run some tests, just to be sure.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Martin Kováčik
Hi group,

I have a Jenkins build server which runs integration tests of a Java
application against PostgreSQL 11.2. The tests run in serial fashion and
are reproducible because each test runs in separate DB transaction which is
rolled back at the end of each test. Each full build (with 1400+ tests)
takes approx 1h:45m.

My problem is that sometimes the build gets stuck (for several hours, often
10+) in random test waiting for random select query with postgres process
consuming 100% CPU (1 core).
Well, at least I thought it was random, then I figured out it was triggered
by autovacuum daemon. The problem is not the vacuum process itself, but in
the analyze part that runs together with autovacuum.

To illustrate my situation let's consider my tests look like this:

BEGIN;

-- A: insert data for the test

-- B: i'll refer to this point later

-- C: select(s)

ROLLBACK;

Everything is fine, until autovacuum (analyze) runs when the test is at
point B. After that the query planner at point C chooses wrong plan and the
query takes a long time to complete, blocking one CPU core for a long time.
It seems like the planner statistics inside running transaction are
affected by analyze task running outside of the transaction. In this case
after running analyze (outside the transaction) when the transaction is at
B, causes query planner to think there are no rows (because the inserts at
point A were not yet committed).

I did prepare a simple test case to reproduce this behavior:

First you need to create a table:

create table a (id bigint primary key);

Then run this transaction:

begin;
insert into a
select * from generate_series(1, 1000);

-- during sleep execute analyze on this db in separate connection
select pg_sleep(10);

explain analyze select count(*) from (
select distinct a1, a2, a3, a4 from a a1
left join a a2 on a1.id > 900
left join a a3 on a2.id = a3.id
left join a a4 on a3.id = a4.id
left join a a5 on a5.id = a4.id
left join a a6 on a6.id = a5.id
left join a a7 on a7.id = a6.id
left join a a8 on a8.id = a7.id) temp;

rollback;

The query plan for the select is as follows and the query takes 865ms to
finish.


QUERY
PLAN

---
 Aggregate  (cost=1987304286213.68..1987304286213.69 rows=1 width=8)
(actual time=862.406..862.407 rows=1 loops=1)
   ->  Unique  (cost=1942976891422.08..1987284286213.68 rows=16
width=128) (actual time=783.281..858.261 rows=100900 loops=1)
 ->  Sort  (cost=1942976891422.08..1951838370380.40
rows=3544591583328 width=128) (actual time=783.280..825.055 rows=100900
loops=1)
   Sort Key: a1.*, a2.*, a3.*, a4.*
   Sort Method: external merge  Disk: 12368kB
   ->  Merge Right Join  (cost=5497387.28..53175003562.02
rows=3544591583328 width=128) (actual time=171.612..196.342 rows=100900
loops=1)
 Merge Cond: (a4.id = a3.id)
 ->  Merge Left Join  (cost=449798.05..1010691.54
rows=36848656 width=40) (actual time=8.257..8.815 rows=1000 loops=1)
   Merge Cond: (a4.id = a5.id)
   ->  Sort  (cost=158.51..164.16 rows=2260
width=40) (actual time=1.174..1.241 rows=1000 loops=1)
 Sort Key: a4.id
 Sort Method: quicksort  Memory: 103kB
 ->  Seq Scan on a a4  (cost=0.00..32.60
rows=2260 width=40) (actual time=0.040..0.639 rows=1000 loops=1)
   ->  Materialize  (cost=449639.54..465944.26
rows=3260943 width=8) (actual time=7.077..7.254 rows=1000 loops=1)
 ->  Sort  (cost=449639.54..457791.90
rows=3260943 width=8) (actual time=7.066..7.148 rows=1000 loops=1)
   Sort Key: a5.id
   Sort Method: quicksort  Memory: 71kB
   ->  Merge Right Join
(cost=2897.82..52270.24 rows=3260943 width=8) (actual time=4.352..6.630
rows=1000 loops=1)
 Merge Cond: (a7.id = a6.id)
 ->  Merge Left Join
(cost=317.01..711.38 rows=25538 width=8) (actual time=1.236..2.482
rows=1000 loops=1)
   Merge Cond: (a7.id =
a8.id)
   ->  Sort
(cost=158.51..164.16 rows=2260 width=8) (actual time=0.618..0.792 rows=1000
loops=1)
 Sort Key: a7.id
 Sort Method:
quicksort  Memory: 71kB
 ->  Seq Scan on a
a7  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.017..0.265
rows=1000 loops=1)
   ->  Sort

Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Michael Lewis
I assume it is in the documentation, but I am not aware of how stats are
handled for uncommitted work. Obviously in the example you provided the
table would be empty, but in your real tests do they start out empty? Would
it suffice to use temp tables created like the regular ones and analyze
after insert to ensure stats are up to date? Or would it make sense to turn
off auto-vacuum/analyze during these tests? The workload is not expected to
be similar to real world I would think, and so you don't need the vacuum
and the stats update perhaps.

>


Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Martin Kováčik
Turning off autovacuum for the tests is a valid option and I will
definitely do this as a workaround. Each test pretty much starts with empty
schema and data for it is generated during the run and rolled back at the
end. I have a lot of tests and at the moment it is not feasible to modify
them.

The real workload for the application is different, but there are some
cases, when we import data from remote web service in a transaction do some
work with it and then we do a commit. If there is an autovacuum during this
process I assume there will be similar problem regarding planner statistics.

The real problem here is that the statistics that are seen from one
(uncommited) transaction are affected by analyze statement from another
(commited) transaction. I've seen similar behavior in production for other
applications using PostgreSQL too (10.x and 11.x) - 100% CPU consumed by an
otherwise fine select query. I was puzzled by this behavior back then but
now it makes sense.
*Martin Kováčik*
*CEO*
*redByte*, s.r.o.
+421 904 236 791
kova...@redbyte.eu, www.redbyte.eu 


On Thu, Apr 25, 2019 at 5:26 PM Michael Lewis  wrote:

> I assume it is in the documentation, but I am not aware of how stats are
> handled for uncommitted work. Obviously in the example you provided the
> table would be empty, but in your real tests do they start out empty? Would
> it suffice to use temp tables created like the regular ones and analyze
> after insert to ensure stats are up to date? Or would it make sense to turn
> off auto-vacuum/analyze during these tests? The workload is not expected to
> be similar to real world I would think, and so you don't need the vacuum
> and the stats update perhaps.
>
>>


Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Michael Lewis
On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik  wrote:

> Turning off autovacuum for the tests is a valid option and I will
> definitely do this as a workaround. Each test pretty much starts with empty
> schema and data for it is generated during the run and rolled back at the
> end. I have a lot of tests and at the moment it is not feasible to modify
> them.
>
> The real workload for the application is different, but there are some
> cases, when we import data from remote web service in a transaction do some
> work with it and then we do a commit. If there is an autovacuum during this
> process I assume there will be similar problem regarding planner statistics.
>

Unless you are importing a huge amount of data relative to what is already
there, it seems likely to be significantly less impactful than adding data
to a completely empty table. The stats on a table with 0 rows and then 5000
rows is going to be night and day, while the difference between stats on
100,000 rows and 105,000 is not as impactful. Musing here. I expect others
will chime in.

Stats are not versioned with MVCC so it would expected that a commit in
another transaction that is updating stats would influence the query plan
for another transaction that is active.

>


Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Adrian Klaver

On 4/25/19 7:37 AM, Martin Kováčik wrote:

Hi group,



See comments inline below


To illustrate my situation let's consider my tests look like this:

BEGIN;

-- A: insert data for the test

-- B: i'll refer to this point later

-- C: select(s)

ROLLBACK;

Everything is fine, until autovacuum (analyze) runs when the test is at 
point B. After that the query planner at point C chooses wrong plan and 
the query takes a long time to complete, blocking one CPU core for a 
long time. It seems like the planner statistics inside running 
transaction are affected by analyze task running outside of the 
transaction. In this case after running analyze (outside the 
transaction) when the transaction is at B, causes query planner to think 
there are no rows (because the inserts at point A were not yet committed).


I did prepare a simple test case to reproduce this behavior:

First you need to create a table:

create table a (id bigint primary key);

Then run this transaction:

begin;
insert into a
select * from generate_series(1, 1000);

-- during sleep execute analyze on this db in separate connection
select pg_sleep(10);


analyze a;

On my machine that changes the time from:

29715.763 ms

to

291.765 ms

when running ANALYZE in the concurrent connection during the sleep.




explain analyze select count(*) from (
select distinct a1, a2, a3, a4 from a a1
left join a a2 on a1.id  > 900
left join a a3 on a2.id  = a3.id 
left join a a4 on a3.id  = a4.id 
left join a a5 on a5.id  = a4.id 
left join a a6 on a6.id  = a5.id 
left join a a7 on a7.id  = a6.id 
left join a a8 on a8.id  = a7.id ) temp;

rollback;





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Tom Lane
=?UTF-8?B?TWFydGluIEtvdsOhxI1paw==?=  writes:
> To illustrate my situation let's consider my tests look like this:

> BEGIN;

> -- A: insert data for the test

> -- B: i'll refer to this point later

> -- C: select(s)

> ROLLBACK;

> Everything is fine, until autovacuum (analyze) runs when the test is at
> point B. After that the query planner at point C chooses wrong plan and the
> query takes a long time to complete, blocking one CPU core for a long time.
> It seems like the planner statistics inside running transaction are
> affected by analyze task running outside of the transaction.

Yup, they are.  However, you're already at risk of a pretty bad plan for
a case like this, since (by assumption) the stats before you did the
insert at step A are radically different from what they should be after
the insert.

The standard recommendation, when you need the results of a data change
to be understood by the planner immediately, is to do your own ANALYZE:

BEGIN;

-- A: insert data for the test

ANALYZE test_table;

-- C: select(s)

ROLLBACK;

This should protect step C against seeing any irrelevant stats, because

(a) once your transaction has done an ANALYZE, autovacuum shouldn't
think the stats are out of date, and

(b) even if it does, your transaction is now holding
ShareUpdateExclusiveLock on the table so auto-ANALYZE can't get
that lock to do a fresh ANALYZE.


Another thing you could do is not even allow the test table to exist
outside your transaction:

BEGIN;

CREATE TABLE test_table (...);

-- A: insert data for the test

ANALYZE test_table;  -- this is now somewhat optional

-- C: select(s)

ROLLBACK;

Then there's nothing for auto-ANALYZE to get its hands on.  If you're
satisfied with the planner's default behavior in the absence of any
stats, you could omit the post-insertion ANALYZE in this case.  But
I'm not sure that that would represent a test that has much to do with
production situations.

regards, tom lane




Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Martin Kováčik
As my example shows you don't have to import a lot of rows - 1000 is enough
to make a difference - it all depends on the query. When a cartesian
product is involved only a few records is enough.
I think that stats should be MVCC versioned otherwise the planner is using
wrong statistics and chooses wrong plans.
*Martin Kováčik*
*CEO*
*redByte*, s.r.o.
+421 904 236 791
kova...@redbyte.eu, www.redbyte.eu 


On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis  wrote:

>
>
> On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik  wrote:
>
>> Turning off autovacuum for the tests is a valid option and I will
>> definitely do this as a workaround. Each test pretty much starts with empty
>> schema and data for it is generated during the run and rolled back at the
>> end. I have a lot of tests and at the moment it is not feasible to modify
>> them.
>>
>> The real workload for the application is different, but there are some
>> cases, when we import data from remote web service in a transaction do some
>> work with it and then we do a commit. If there is an autovacuum during this
>> process I assume there will be similar problem regarding planner statistics.
>>
>
> Unless you are importing a huge amount of data relative to what is already
> there, it seems likely to be significantly less impactful than adding data
> to a completely empty table. The stats on a table with 0 rows and then 5000
> rows is going to be night and day, while the difference between stats on
> 100,000 rows and 105,000 is not as impactful. Musing here. I expect others
> will chime in.
>
> Stats are not versioned with MVCC so it would expected that a commit in
> another transaction that is updating stats would influence the query plan
> for another transaction that is active.
>
>>


Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Adrian Klaver

On 4/25/19 12:47 PM, Martin Kováčik wrote:
As my example shows you don't have to import a lot of rows - 1000 is 
enough to make a difference - it all depends on the query. When a 
cartesian product is involved only a few records is enough.
I think that stats should be MVCC versioned otherwise the planner is 
using wrong statistics and chooses wrong plans.


Then you are looking at moving the choke point to looking up the correct 
stats across possibly hundreds/thousands of transactions in flight.



*Martin Kováčik*
/CEO/
*redByte*, s.r.o.
+421 904 236 791
kova...@redbyte.eu , www.redbyte.eu 




On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis > wrote:




On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik mailto:kova...@redbyte.eu>> wrote:

Turning off autovacuum for the tests is a valid option and I
will definitely do this as a workaround. Each test pretty much
starts with empty schema and data for it is generated during the
run and rolled back at the end. I have a lot of tests and at the
moment it is not feasible to modify them.

The real workload for the application is different, but there
are some cases, when we import data from remote web service in a
transaction do some work with it and then we do a commit. If
there is an autovacuum during this process I assume there will
be similar problem regarding planner statistics.


Unless you are importing a huge amount of data relative to what is
already there, it seems likely to be significantly less impactful
than adding data to a completely empty table. The stats on a table
with 0 rows and then 5000 rows is going to be night and day, while
the difference between stats on 100,000 rows and 105,000 is not as
impactful. Musing here. I expect others will chime in.

Stats are not versioned with MVCC so it would expected that a commit
in another transaction that is updating stats would influence the
query plan for another transaction that is active.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Martin Kováčik
Thanks Tom,

that is a great explanation. The default plan, which overestimates number
of rows is much better than the plan (which is based on wrong stats) that
underestimates it. This is true for my particular case but may be bad for
other cases.

The schema is created before the tests begin so the suggestion to create
required tables inside a transaction isn't feasible for my case. For the
build server I'm going with Michael's suggestion to disable autovacuum.
Instead I'll do manual vacuuming/analyzing before build starts.

For the production use case I'll do my own analyze after bulk insert as you
suggested.

Initially I thought that inside a transaction stats are automatically
updated after the inserts because the query ran much faster than the query
which was based on statistics with no rows. I did not realize the planner
chooses a "default" plan when there are no stats.

*Martin Kováčik*
*CEO*
*redByte*, s.r.o.
+421 904 236 791
kova...@redbyte.eu, www.redbyte.eu 


On Thu, Apr 25, 2019 at 9:46 PM Tom Lane  wrote:

> =?UTF-8?B?TWFydGluIEtvdsOhxI1paw==?=  writes:
> > To illustrate my situation let's consider my tests look like this:
>
> > BEGIN;
>
> > -- A: insert data for the test
>
> > -- B: i'll refer to this point later
>
> > -- C: select(s)
>
> > ROLLBACK;
>
> > Everything is fine, until autovacuum (analyze) runs when the test is at
> > point B. After that the query planner at point C chooses wrong plan and
> the
> > query takes a long time to complete, blocking one CPU core for a long
> time.
> > It seems like the planner statistics inside running transaction are
> > affected by analyze task running outside of the transaction.
>
> Yup, they are.  However, you're already at risk of a pretty bad plan for
> a case like this, since (by assumption) the stats before you did the
> insert at step A are radically different from what they should be after
> the insert.
>
> The standard recommendation, when you need the results of a data change
> to be understood by the planner immediately, is to do your own ANALYZE:
>
> BEGIN;
>
> -- A: insert data for the test
>
> ANALYZE test_table;
>
> -- C: select(s)
>
> ROLLBACK;
>
> This should protect step C against seeing any irrelevant stats, because
>
> (a) once your transaction has done an ANALYZE, autovacuum shouldn't
> think the stats are out of date, and
>
> (b) even if it does, your transaction is now holding
> ShareUpdateExclusiveLock on the table so auto-ANALYZE can't get
> that lock to do a fresh ANALYZE.
>
>
> Another thing you could do is not even allow the test table to exist
> outside your transaction:
>
> BEGIN;
>
> CREATE TABLE test_table (...);
>
> -- A: insert data for the test
>
> ANALYZE test_table;  -- this is now somewhat optional
>
> -- C: select(s)
>
> ROLLBACK;
>
> Then there's nothing for auto-ANALYZE to get its hands on.  If you're
> satisfied with the planner's default behavior in the absence of any
> stats, you could omit the post-insertion ANALYZE in this case.  But
> I'm not sure that that would represent a test that has much to do with
> production situations.
>
> regards, tom lane
>


Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Martin Kováčik
I'm not sure if I understand what you mean. My initial thought was that
stats are fixed per transaction, i.e. analyze from comitted transaction
doesn't interfere with another running transaction. Maybe I was confused by
this because analyze can be run inside a transaction, so my assumption was
it is isolated like other statements.
*Martin Kováčik*
*CEO*
*redByte*, s.r.o.
+421 904 236 791
kova...@redbyte.eu, www.redbyte.eu 


On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver 
wrote:

> On 4/25/19 12:47 PM, Martin Kováčik wrote:
> > As my example shows you don't have to import a lot of rows - 1000 is
> > enough to make a difference - it all depends on the query. When a
> > cartesian product is involved only a few records is enough.
> > I think that stats should be MVCC versioned otherwise the planner is
> > using wrong statistics and chooses wrong plans.
>
> Then you are looking at moving the choke point to looking up the correct
> stats across possibly hundreds/thousands of transactions in flight.
>
> > *Martin Kováčik*
> > /CEO/
> > *redByte*, s.r.o.
> > +421 904 236 791
> > kova...@redbyte.eu , www.redbyte.eu
> > 
> >
> >
> > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis  > > wrote:
> >
> >
> >
> > On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik  > > wrote:
> >
> > Turning off autovacuum for the tests is a valid option and I
> > will definitely do this as a workaround. Each test pretty much
> > starts with empty schema and data for it is generated during the
> > run and rolled back at the end. I have a lot of tests and at the
> > moment it is not feasible to modify them.
> >
> > The real workload for the application is different, but there
> > are some cases, when we import data from remote web service in a
> > transaction do some work with it and then we do a commit. If
> > there is an autovacuum during this process I assume there will
> > be similar problem regarding planner statistics.
> >
> >
> > Unless you are importing a huge amount of data relative to what is
> > already there, it seems likely to be significantly less impactful
> > than adding data to a completely empty table. The stats on a table
> > with 0 rows and then 5000 rows is going to be night and day, while
> > the difference between stats on 100,000 rows and 105,000 is not as
> > impactful. Musing here. I expect others will chime in.
> >
> > Stats are not versioned with MVCC so it would expected that a commit
> > in another transaction that is updating stats would influence the
> > query plan for another transaction that is active.
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Is _ a supported way to create a column of array type?

2019-04-25 Thread Piotr Findeisen
Hi,

As documented to
https://www.postgresql.org/docs/11/arrays.html#ARRAYS-DECLARATION one can
create column of an array type using `[]` form.

Internally, array types get a name in the form of `_`.
This is documented
https://www.postgresql.org/docs/11/sql-createtype.html#id-1.9.3.94.5.9

*So -- the question: *
*Can a user use `_` to define a column of array type?*
*Is it supported?*

The reason I am asking is that e.g. int4[] and _int4 behave differently.
Although they look the same, the have different pg_attribute.attndims.

I am testing on Postgres 11.2.


create table t(a int4[], b _int4);

\d t
  Table "public.t"
 Column |   Type| Collation | Nullable | Default
+---+---+--+-
 a  | integer[] |   |  |
 b  | integer[] |   |  |


SELECT attname,  attndims FROM pg_attribute att JOIN pg_class tbl ON
tbl.oid = att.attrelid WHERE tbl.relname = 't';
 attname  | attndims
--+--
...
* a|1*
* b|0*


This has also been discussed previously in 2006 in the
https://www.postgresql.org/message-id/8C5B026B51B6854CBE88121DBF097A8651DB95%40ehost010-33.exch010.intermedia.net
user
group thread. However, it was a while ago (so something might have changed
since then) and the conclusion from that discussion wasn't fully clear to
me.


Best regards,
Piotr


Re: Is _ a supported way to create a column of array type?

2019-04-25 Thread Tom Lane
Piotr Findeisen  writes:
> Internally, array types get a name in the form of `_`.

Typically, yes.

> *Can a user use `_` to define a column of array type?*

Sure ... didn't you try it?

> *Is it supported?*

Not really, because it's not guaranteed that the name looks like that.
There are various corner cases where something else would be generated,
either to avoid a collision, or because truncation is needed.

However, if you've taken the trouble to check what name actually got
assigned to the array type, it's perfectly valid to use that name.

> The reason I am asking is that e.g. int4[] and _int4 behave differently.
> Although they look the same, the have different pg_attribute.attndims.

Yeah.  Nothing really cares about attndims though ... it's vestigial.
Perhaps we should remove it someday.

regards, tom lane




Re: Is _ a supported way to create a column of array type?

2019-04-25 Thread Tom Lane
Piotr Findeisen  writes:
> I think I need to provide some context for my question.
> ...
> When accessing a table in Postgres, we need to map columns' types to
> appropriate types in Presto.
> For mapping arrays, we need to know number of array dimensions.
> Currently we read this from pg_attribute.attndims and this does not work
> for _ columns.

Well, you've got a conceptual problem there, which is exactly the
assumption that attndims is meaningful :-(.

In the first place, the Postgres type system doesn't distinguish
arrays of different numbers of dimensions, ie, int4[][] is not
really different from int4[].  So you can't attach any very strong
meaning to attndims = 2 vs attndims = 1.

In the second place, we don't bother to fill attndims when the
user doesn't write any brackets, which is what would happen with
a type spec of "_int4" rather than "int4[]".  So I guess you could
say that what attndims records is the number of brackets that were
written in the table creation command, but that unfortunately has
got no real semantic significance.

The right way (TM) to decide if a column is of array type is to
look at the pg_type entry its atttypid points at and see if that
is an array type.  Depending on what you want to do, any of these
tests on the pg_type entry might be reasonable:
1. has nonzero typelem.  (This basically means that the column can
   be subscripted, so it includes fixed-length "array" types such
   as "point", which you might not want to accept.)
2. has nonzero typelem and typlen = -1.  (This restricts it to
   varlena arrays, which are the generic kind of array.)
3. has typcategory "A".  (This should be effectively the same
   as method 2, I think, though the backend code doesn't rely
   on typcategory for such decisions.  Conceivably you'd do this
   if you wanted to let users mark weird types as being arrays.)

> 2. is it possible to make pg_attribute.attndims have correct value when
> column is defined using _ form?

Even if we wanted to put work into a column that's so vestigial that
taking it out is a reasonable proposal, we would certainly never
back-patch such a change; nor would existing catalog entries change
even if we did.  So you pretty much have to deal with the facts on
the ground, which are that attndims is largely useless.

regards, tom lane




Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Adrian Klaver

On 4/25/19 1:28 PM, Martin Kováčik wrote:
I'm not sure if I understand what you mean. My initial thought was that 
stats are fixed per transaction, i.e. analyze from comitted transaction 
doesn't interfere with another running transaction. Maybe I was confused 
by this because analyze can be run inside a transaction, so my 
assumption was it is isolated like other statements.


To make it clearer I would take a look at:

https://www.postgresql.org/docs/11/sql-analyze.html
https://www.postgresql.org/docs/11/catalog-pg-statistic.html


The gist is currently there is one entry(with exception noted in second 
link) per column in each table. This is the most recent information 
available subject to the conditions Tom pointed out in his post. To make 
it MVCC aware like you suggest would mean tracking the state of all 
transactions currently open and recording that information, so each 
transaction could find its stats. This means the planning problem could 
then move to the stats table as it would need to be analyzed itself to 
work efficiently. This would probably also need to be MVCC aware to be 
relevant, which would add to the overhead. I could see this turn into a 
'hall of mirrors' problem quickly.



*Martin Kováčik*
/CEO/
*redByte*, s.r.o.
+421 904 236 791
kova...@redbyte.eu , www.redbyte.eu 




On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver > wrote:


On 4/25/19 12:47 PM, Martin Kováčik wrote:
 > As my example shows you don't have to import a lot of rows - 1000 is
 > enough to make a difference - it all depends on the query. When a
 > cartesian product is involved only a few records is enough.
 > I think that stats should be MVCC versioned otherwise the planner is
 > using wrong statistics and chooses wrong plans.

Then you are looking at moving the choke point to looking up the
correct
stats across possibly hundreds/thousands of transactions in flight.

 > *Martin Kováčik*
 > /CEO/
 > *redByte*, s.r.o.
 > +421 904 236 791
 > kova...@redbyte.eu 
>,
www.redbyte.eu 
 > 
 >
 >
 > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis mailto:mle...@entrata.com>
 > >> wrote:
 >
 >
 >
 >     On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik
mailto:kova...@redbyte.eu>
 >     >> wrote:
 >
 >         Turning off autovacuum for the tests is a valid option and I
 >         will definitely do this as a workaround. Each test pretty
much
 >         starts with empty schema and data for it is generated
during the
 >         run and rolled back at the end. I have a lot of tests and
at the
 >         moment it is not feasible to modify them.
 >
 >         The real workload for the application is different, but there
 >         are some cases, when we import data from remote web
service in a
 >         transaction do some work with it and then we do a commit. If
 >         there is an autovacuum during this process I assume there
will
 >         be similar problem regarding planner statistics.
 >
 >
 >     Unless you are importing a huge amount of data relative to
what is
 >     already there, it seems likely to be significantly less impactful
 >     than adding data to a completely empty table. The stats on a
table
 >     with 0 rows and then 5000 rows is going to be night and day,
while
 >     the difference between stats on 100,000 rows and 105,000 is
not as
 >     impactful. Musing here. I expect others will chime in.
 >
 >     Stats are not versioned with MVCC so it would expected that a
commit
 >     in another transaction that is updating stats would influence the
 >     query plan for another transaction that is active.
 >


-- 
Adrian Klaver

adrian.kla...@aklaver.com 




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

2019-04-25 Thread Martin Kováčik
I’ve got it now, thank you for the clarification. You are right.

Martin Kováčik
+421904236791

> On 26 Apr 2019, at 00:25, Adrian Klaver  wrote:
> 
>> On 4/25/19 1:28 PM, Martin Kováčik wrote:
>> I'm not sure if I understand what you mean. My initial thought was that 
>> stats are fixed per transaction, i.e. analyze from comitted transaction 
>> doesn't interfere with another running transaction. Maybe I was confused by 
>> this because analyze can be run inside a transaction, so my assumption was 
>> it is isolated like other statements.
> 
> To make it clearer I would take a look at:
> 
> https://www.postgresql.org/docs/11/sql-analyze.html
> https://www.postgresql.org/docs/11/catalog-pg-statistic.html
> 
> 
> The gist is currently there is one entry(with exception noted in second link) 
> per column in each table. This is the most recent information available 
> subject to the conditions Tom pointed out in his post. To make it MVCC aware 
> like you suggest would mean tracking the state of all transactions currently 
> open and recording that information, so each transaction could find its 
> stats. This means the planning problem could then move to the stats table as 
> it would need to be analyzed itself to work efficiently. This would probably 
> also need to be MVCC aware to be relevant, which would add to the overhead. I 
> could see this turn into a 'hall of mirrors' problem quickly.
> 
>> *Martin Kováčik*
>> /CEO/
>> *redByte*, s.r.o.
>> +421 904 236 791
>> kova...@redbyte.eu , www.redbyte.eu 
>> 
>> On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver > > wrote:
>>On 4/25/19 12:47 PM, Martin Kováčik wrote:
>> > As my example shows you don't have to import a lot of rows - 1000 is
>> > enough to make a difference - it all depends on the query. When a
>> > cartesian product is involved only a few records is enough.
>> > I think that stats should be MVCC versioned otherwise the planner is
>> > using wrong statistics and chooses wrong plans.
>>Then you are looking at moving the choke point to looking up the
>>correct
>>stats across possibly hundreds/thousands of transactions in flight.
>> > *Martin Kováčik*
>> > /CEO/
>> > *redByte*, s.r.o.
>> > +421 904 236 791
>> > kova...@redbyte.eu 
>>>,
>>www.redbyte.eu 
>> > 
>> >
>> >
>> > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis >
>> > >> wrote:
>> >
>> >
>> >
>> > On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik
>>mailto:kova...@redbyte.eu>
>> > >> wrote:
>> >
>> > Turning off autovacuum for the tests is a valid option and I
>> > will definitely do this as a workaround. Each test pretty
>>much
>> > starts with empty schema and data for it is generated
>>during the
>> > run and rolled back at the end. I have a lot of tests and
>>at the
>> > moment it is not feasible to modify them.
>> >
>> > The real workload for the application is different, but there
>> > are some cases, when we import data from remote web
>>service in a
>> > transaction do some work with it and then we do a commit. If
>> > there is an autovacuum during this process I assume there
>>will
>> > be similar problem regarding planner statistics.
>> >
>> >
>> > Unless you are importing a huge amount of data relative to
>>what is
>> > already there, it seems likely to be significantly less impactful
>> > than adding data to a completely empty table. The stats on a
>>table
>> > with 0 rows and then 5000 rows is going to be night and day,
>>while
>> > the difference between stats on 100,000 rows and 105,000 is
>>not as
>> > impactful. Musing here. I expect others will chime in.
>> >
>> > Stats are not versioned with MVCC so it would expected that a
>>commit
>> > in another transaction that is updating stats would influence the
>> > query plan for another transaction that is active.
>> >
>>-- Adrian Klaver
>>adrian.kla...@aklaver.com 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com