Re: Why is tuple_percent so low?

2018-02-27 Thread David Rowley
On 27 February 2018 at 18:03, Sam Saffron  wrote:
> 1. Where is all my missing space, is this in page alignment stuff and
> per-page overhead?

Yes, tuples are MAXALIGNed when copied onto the page. That appears to
be 8 bytes on your system. You're just 2 bytes over the 8-byte
boundary. You'll notice the table becoming more compact if you somehow
could do away with the dsecs column.

> 2. Is there any other schemes I can look at for storing this data to
> have a more efficient yet easily queryable / updateable table.

Hard to say without learning about the use case.  Some sort of
de-normalisation may help to allow fewer tuples, e.g storing one
column in an array, but may be more painful from a design point of
view. It may also cause pain from a dead-tuple point of view if you
have to UPDATE large tuples frequently.

> Keep in mind these tables get huge and in many of our cases will span
> 10-20GB just to store this information.

I guess it's all relative, but that does not seem that large. Is this
causing some sort of performance problems you're trying to solve? or
is this completely down to disk space consumption?  If it's down to
performance then you might be able to solve that problem with an
index.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: index-only-scan when there is an index on all columns

2018-02-27 Thread David Rowley
On 28 February 2018 at 11:11, Andres Freund  wrote:
> I'm fairly certain that only vacuum and table rewrites like cluster sets
> all-visible,

I don't think the pages are set all visible again after a rewrite.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Is there a way to create a functional index that tables tableoid column as an arg?

2018-03-11 Thread David Rowley
On 11 March 2018 at 07:53, Ryan Murphy  wrote:
> I am using table inheritance and have e.g. the following tables:
>
> create table animal ( ... );
> create table dog ( ... ) inherits (animal);
> create table cat ( ... ) inherits (animal);
> create table person ( ... ) inherits (animal);
> create table musician ( ... ) inherits (person);
> create table politician ( ... ) inherits (person);
>
> Now I have a query that gets all the "animal"'s except for those that are
> "person"'s.
>
> select * from only animal
>
> won't cut it, because it leaves out the dogs and cats.
>
> select *, tableoid::regclass relname from animal
> where relname != 'person'::regclass
>
> also won't cut it because it leaves out the musicians and politicians.
>
> So I have created an immutable function is_a_kind_of(tbl regclass,
> parent_tbl regclass) that returns true iff tbl is identical with, or
> directly or indirectly inherits from, parent_tbl.  For example:
>
> is_a_kind_of('person','person') => true
> is_a_kind_of('person','animal') => true
> is_a_kind_of('musician','person') => true
> is_a_kind_of('animal','person') => false
> is_a_kind_of('dog','person') => false
>
> No problems so far.  Now my query works:
>
> select *,tableoid from "animal"
> where not is_a_kind_of(tableoid::regclass::text, 'person')
>
> This query is somewhat slow though - I'd like to index the is_a_kind_of()
> call.  And Postgres supports functional indexes!  So I try:
> create index animal_is_person on animal (
> is_a_kind_of(tableoid::regclass, 'person') );
>
> ERROR:  index creation on system columns is not supported
>
> I see that this is because "tableoid" is a system column. Does anyone know
> any workaround for this?  So close yet so far away!

I don't think having an index on the tableoid would help you here
anyway.  Previous versions did allow indexes on system columns, but
that was disallowed recently due to the fact that the value of most
system columns are changed internally by postgres and the indexes
wouldn't correctly be updated.  tableoid might be an exception to
this, but it does not really seem like a useful column to index,
giving it would be indexing the same value for each record in the
table.

The reason it's slow is that PostgreSQL is executing the function once
per input row. In this case, that must be done because you're passing
in tableoid as a function parameter, and that could change from one
row to the next (once the Append moves to the next subplan).

What you'd want to do instead is create a set-returning function that
finds all the inherited children and returns them all. This will allow
PostgreSQL to execute the function just once at the start of the
query, then join the results to this function.

Something like:

create or replace function get_inherited_tables (prelid oid) returns
setof oid as $$
with recursive c as (
   select inhrelid,inhparent from pg_inherits where inhparent = prelid
   union all
   select i.inhrelid,i.inhparent from pg_inherits i inner join c
on i.inhparent = c.inhrelid
)
select prelid union all select inhrelid from c;
$$ language sql stable;

then have your queries do:

-- get all animals that are persons
select ... from animal where tableoid in (select
get_inherited_tables('person'::regclass);

-- get all animals that are not persons
select ... from animal where tableoid not in (select
get_inherited_tables('person'::regclass);

Just be careful around search_paths and your use of regclass. In this
case, if "animal" was not in the first schema in search_path, but
someone created another table called "person" that was in the first
schema listed in search_path, then the query would not do what you
want. You might want to consider prefixing the input parameter into
get_inherited_tables with the schema name too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: primary key and unique index

2018-03-23 Thread David Rowley
On 23 March 2018 at 20:55, Thomas Poty  wrote:
> In MySQL a  "show create table" gives me :
>  ...
>   PRIMARY KEY (`ID`,`CountryCode`,`LanguageCode`),
>   UNIQUE KEY `unique_my table_4` (`ID`,`CountryCode`,`LanguageCode`),
> ...
>
> So, In PostgreSQL, does it make sense to create a primary key AND a unique
> index based on the same columns?
> Is PostgreSQL smart enough to use the unique index created for the primary
> key.

Doing this makes no sense in PostgreSQL.  I'm struggling to imagine
why it would in MySQL.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_stat_statements: password in command is not obfuscated

2018-03-23 Thread David Rowley
On 24 March 2018 at 10:30, legrand legrand  wrote:
> It seems that passwords used in commands are not removed when caught by
> pg_stat_statements
> (they are not "normalized" being utility statements)
>
> exemple:
> alter role tt with password '123';
>
> select query from public.pg_stat_statements
> where query like '%password%';
>
> query
> 
> alter role tt with password '123';
>
> Do you think its a bug ?

If it is, then it's not a bug in pg_stat_statements. log_statement =
'ddl' would have kept a record of the same thing.

Perhaps the best fix would be a documentation improvement to mention
the fact and that it's best not to use plain text passwords in
CREATE/ALTER ROLE. Passwords can be md5 encrypted.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Bad Query Plans on 10.3 vs 9.6

2018-03-28 Thread David Rowley
On 29 March 2018 at 18:26, Cory Tucker  wrote:
> Hello all.  I'm migrating a database from PG 9.6 to 10.3 and have noticed a
> particular query that is performing very badly compared to its 9.6
> counterpart.
>
> The plan on 9.6 v 10.3 are effectively identical except in 9.6 the planner
> decides to use an index only scan on the primary key and in 10.3 it does a
> sequential scan.  The problem is the sequential scan is for a table of 75M
> rows and 25 columns so its quiet a lot of pages it has to traverse.

How certain are you that all the indexes match on each instance?

It would be useful to see psql's \d output for each table in question.

Another option for you to consider would be to get rid of the OR
clause in the query entirely and have two separate CTEs and INSERT
INTO your orphaned_matches table having SELECTed from both CTEs with a
UNION ALL. A record already deleted won't appear in the 2nd branch of
the UNION ALL result. However, that still might not fix your issue
with the index not being used, but you may find the anti-joined
version of the query is faster anyway.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Bad Query Plans on 10.3 vs 9.6

2018-03-29 Thread David Rowley
'On 30 March 2018 at 03:21, Cory Tucker  wrote:
>> Another possibility is that 10.3 sees the index-only scan as too expensive
>> because it thinks most of the table isn't all-visible.  Comparing
>> pg_class.relallvisible values might be informative.

> I'm happy to try to dig into this one more, however, I'm not familiar with
> this value.  What should I be looking for here?

Each table in your database has an entry in the pg_class table. Something like:

SELECT relallvisible from pg_class where oid = 'build.household'::regclass;

would show you the value, however, I think a problem here is unlikely
since that would just control the likelihood of an index-only-scan vs
an index-scan. You're getting a Seq-scan, which I imagine is going to
be quite a bit more expensive than even an index scan.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Inconsistent compilation error

2018-04-18 Thread David Rowley
On 19 April 2018 at 13:28, Melvin Davidson  wrote:
> It is not fine. You have specifed TWO percent signs (%) which requires TWO
> argumenrts,
> but you have only provided ONE -> r.

I'm confused about this statement.

Did you perhaps overlook the fact that there are in fact two
arguments? I'm unsure why you're considering
'blah_history.original_id' not to be an argument here.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: What is the C function to modify attribute

2018-05-16 Thread David Rowley
On 17 May 2018 at 15:36, a <372660...@qq.com> wrote:
> I am writing a trigger that will record all adjustment of the row into the
> last column of the row. I have done a iterator to produce text record and
> the last step would be add the text record into the last column (which
> designed as an array of user defined type, with the output function display
> the abstraction of update and a user defined function is used to retrieve
> detail update info).
>
> Now I would like to find a way that will allow me to update the last column
> in C efficiently.
>
> Can anyone help? Thank you so much!!

You might find https://www.postgresql.org/docs/10/static/triggers.html useful.

There's a complete example of a trigger function written in C there.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: limit and query planner

2018-06-05 Thread David Rowley
On 6 June 2018 at 07:17, armand pirvu  wrote:
> So since
> select count(*) from sp_i2birst_reg_staging_test;
> count
> ---
>  6860
> and
> select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17'
> and status=0 ;
> count
> ---
>  4239
>
> That means to me I fetch almost the whole table and then I fall in the case 
> you described seqscan-and-sort wins over indexscan .

The planner simply assumes that 1 in (6860.0 / 4239.0) rows matches
your WHERE clause. Since you want 10 rows from the LIMIT, it thinks
it'll just need to read 17 rows from the heap to answer your query.

> explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 
> 'ACSF17'
> and status=1;
>  QUERY PLAN
>  Index Scan using fooidx on sp_i2birst_reg_staging_test  (cost=0.28..202.91 
> rows=500 width=519) (actual time=0.097..0.527 rows=500 loops=1)
>Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1))
>  Planning time: 1.024 ms
>  Execution time: 0.766 ms
> this gets 500 rows out of 6860
>
> explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 
> 'ACSF17'
> and status=1 limit 10;
>
>QUERY PLAN
>  Limit  (cost=0.28..4.33 rows=10 width=519) (actual time=0.073..0.105 rows=10 
> loops=1)
>->  Index Scan using fooidx on sp_i2birst_reg_staging_test  
> (cost=0.28..202.91 rows=500 width=519) (actual time=0.072..0.101 rows=10 
> loops=1)
>  Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1))
>  Planning time: 0.280 ms
>  Execution time: 0.173 ms
>
> Back to the original
>
> explain analyze select * from sp_i2birst_reg_staging_test where evt_id = 
> 'ACSF17'
> and status=0 limit 1 ;
>
> QUERY PLAN
>  Limit  (cost=0.00..0.13 rows=1 width=519) (actual time=0.021..0.021 rows=1 
> loops=1)
>->  Seq Scan on sp_i2birst_reg_staging_test  (cost=0.00..548.40 rows=4239 
> width=519) (actual time=0.019..0.019 rows=1 loops=1)
>  Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0))
>  Rows Removed by Filter: 1
>  Planning time: 0.286 ms
>  Execution time: 0.110 ms

For the more restrictive status, the planner thinks more rows will
need to be looked at, which increases the cost of the seqscan, and the
planner favours the index scan.  You can see the planner estimates 500
rows will match the status=1 query. So thinks ceil(1 *  (6860 / 500.0)
* 10) = 138 rows will need looked at in the seqscan plan. That's
obviously more costly than 17 rows. So the index scan begins to look
more favourable.

The planner always assumes the rows are evenly distributed, which is
not always the case. If your ten rows were at the end of the heap,
then the seqscan would be a bad plan. In this case "Rows Removed by
Filter" would be high.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: help understanding create statistic

2018-06-28 Thread David Rowley
On 28 June 2018 at 21:38, Luca Ferrari  wrote:
> CREATE INDEX idx_year
> ON expenses ( EXTRACT( year FROM day ) );
>
> why is the planner not choosing to use such index on a 'year' raw query?
>
> EXPLAIN SELECT * FROM expenses
> WHERE year = 2016;

The expression in the where clause must match the indexed expression.
You'd need to add an index on just (year) for that to work.

> Am I misunderstaing this functional dependency?

Yeah, the statistics are just there to drive the planner's costing.
They won't serve as proof for anything else.

All you've done by creating those stats is to allow better estimates
for queries such as:

SELECT * FROM expenses WHERE day = '2018-06-28' and year = 2018;

> stxdependencies | {"3 => 5": 1.00}

It would appear that "3" is the attnum for day and "5" is year. All
that tells the planner is that on the records sampled during analyze
is that each "day" had about exactly 1 year.

There's nothing then to stop you going and adding a record with the
day '2017-01-01' and the year 2018. The stats will remain the same
until you analyze the table again.

If those stats didn't exist, the planner would have multiplied the
selectivity estimates of each item in the WHERE clause individually.
So if about 10% of records had year=2018, and 0.01% had '2018-06-28',
then the selectivity would have been 0.1 *  0.001.   With a functional
dependency of 1, the selectivity just becomes 0.001.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Convert Existing Table to a Partition Table in PG10

2018-07-01 Thread David Rowley
On 1 July 2018 at 10:15, Clifford Snow  wrote:
> I also leaned that my range partition value I used on a timestamp needed to
> have fractional seconds. I used a range of 2017-01-01 00:00:00 to
> 2017-23:59:59 which failed when I attempted to add a record that had a
> timestamp of 2017-23:59:59. Adding a fractional second to the range solved
> the problem.

Please be aware that with RANGE partitions the upper bound is
non-inclusive. The lower bound is inclusive.

If you want a 2017 partition, then FOR VALUES FROM ('2017-01-01') TO
('2018-01-01') will allow all 2017 timestamps and only 2017
timestamps.

You've no need to consider precision of the type and how many 9's you
add to anything here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: FK v.s unique indexes

2018-07-03 Thread David Rowley
On 3 July 2018 at 19:30, Rafal Pietrak  wrote:
> tst=# create table test1(load bigint, a int, b int, c bool) partition by
> list (c);
> CREATE TABLE
> tst=# create table test1_true  partition of test1 for values in (true);
> CREATE TABLE
> tst=# create table test1_false  partition of test1 for values in (false);
> CREATE TABLE
> tst=# create unique index load ON test1 (load,a,b,c);
> CREATE INDEX
> tst=# create table info_text1 (load text, a int, b int, c bool, info
> text,  foreign key (load,a,b,c) references test1(load,a,b,c)) ;
> ERROR:  cannot reference partitioned table "test1"
> --
>
> Why is this forbidden?

I don't think there were any actual roadblocks, it was more of just
not enough time in the cycle to make it work due to a few technical
details that required extra effort to make work.

Alvaro managed to simplify the problem and allow foreign keys to be
defined on partitioned tables and get that into PG11.

So it was a case of 50% is better than 0%, which I very agree with.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump out of memory

2018-07-03 Thread David Rowley
On 4 July 2018 at 14:43, Andy Colson  wrote:
> I moved a physical box to a VM, and set its memory to 1Gig.  Everything
> runs fine except one backup:
>
>
> /pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep
>
> g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed.
> pg_dump: Error message from server: ERROR:  out of memory
> DETAIL:  Failed on request of size 1073741823.> pg_dump: The command was: 
> COPY public.ofrrds (id, updateddate, bytes) TO
> stdout;

There will be less memory pressure on the server if the pg_dump was
performed from another host. When running pg_dump locally the 290MB
bytea value will be allocated in both the backend process pg_dump is
using and pg_dump itself. Running the backup remotely won't require
the latter to be allocated on the server.

> I've been reducing my memory settings:
>
> maintenance_work_mem = 80MB
> work_mem = 5MB
> shared_buffers = 200MB

You may also get it to work by reducing shared_buffers further.
work_mem won't have any affect, neither will maintenance_work_mem.

Failing that, the suggestions of more RAM and/or swap look good.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Optimizing execution of expensive subqueries

2018-07-14 Thread David Rowley
On 12 July 2018 at 02:24, Mathieu Fenniak  wrote:
> I'm currently looking at a query that is generally selecting a bunch of
> simple columns from a table, and also performing some subqueries to
> aggregate related data, and then sorting by one of the simple columns and
> paginating the result.
>
> eg.
>
> SELECT
>   tbl.field1, tbl.field2, tbl.field3, ...,
>   (SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId
> AND anothertbl.ThingyId = 1) as Thingy1Sum,
>   ... repeat for multiply thingies ...
> FROM
>   tbl
> ORDER BY tbl.field1 LIMIT 20
>
> I'm finding that if "tbl" contains hundreds of thousands of rows, the
> subqueries are being executed hundreds of thousands of times.  Because of
> the sorting and pagination, this is appears to be unnecessary, and the
> result is slow performance.  (PostgreSQL 9.5.9 server)

You've got two choices.

1) You can add a btree index on field1 so that the executor does not
need to examine all records before taking the top-20, or;
2) move the subquery out of the target list and instead make it a LEFT
JOIN adding an appropriate GROUP BY clause.

#2 might not be a great option since it may require building groups
that don't get used, but it would likely be the bast option if you
didn't have a LIMIT clause, or the LIMIT was a larger percentage of
the total records.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Finding out why parallel queries not avoided

2018-07-21 Thread David Rowley
On 21 July 2018 at 20:15, Didier Carlier  wrote:
> explain select count(*) from calendar c1, calendar c2, measure m where
>  c1.stddate='2015-01-01' and c2.stddate='2015-12-31' and m.fromdateid 
> >=c1.calendarid and m.fromdateid < c2.calendarid;
>   QUERY PLAN
> --
>  Aggregate  (cost=5073362.73..5073362.74 rows=1 width=8)
>->  Nested Loop  (cost=8718.47..4988195.81 rows=34066770 width=0)
>  ->  Index Scan using calendar_stddate_unique on calendar c2  
> (cost=0.28..2.30 rows=1 width=4)
>Index Cond: (stddate = '2015-12-31 00:00:00+01'::timestamp 
> with time zone)
>  ->  Nested Loop  (cost=8718.19..4647525.81 rows=34066770 width=4)
>->  Index Scan using calendar_stddate_unique on calendar c1  
> (cost=0.28..2.30 rows=1 width=4)
>  Index Cond: (stddate = '2015-01-01 
> 00:00:00+01'::timestamp with time zone)
>->  Bitmap Heap Scan on measure m  (cost=8717.91..4306855.81 
> rows=34066770 width=4)
>  Recheck Cond: ((fromdateid >= c1.calendarid) AND 
> (fromdateid < c2.calendarid))
>  ->  Bitmap Index Scan on idx_measure_fromdate  
> (cost=0.00..201.22 rows=34072527 width=0)
>Index Cond: ((fromdateid >= c1.calendarid) AND 
> (fromdateid < c2.calendarid))
>
> Both queries return the same answers but I don't see why the second one 
> doesn't use parallel query.

You'd likely be better of writing the query as:

select count(*) from measure where fromdateid >= (select calendarid
from calendar where stddate = '2015-01-01') and fromdateid < (select
calendarid from calendar where stddate = '2015-12-31');

The reason you get the poor nested loop plan is that nested loop is
the only join method that supports non-equijoin.

Unsure why you didn't get a parallel plan. Parallel in pg10 supports a
few more plan shapes than 9.6 did. Unsure what version you're using.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Restore relhaspkey in PostgreSQL Version 11 Beta

2018-07-30 Thread David Rowley
On 31 July 2018 at 11:11, Melvin Davidson  wrote:
>> If you want stability use information_schema which we'll try very hard to
>> not ever break.
> Of course. Would you be so kind as to point out where in the
> information_schema  it
> indicates if a table has a primary key or not. Oh wait, now I remember...no
> place.

With all due respect Sir, you're making a fool of yourself here.  I'd
suggest that before you debate or argue with people that you ensure
that you're correct. This can often be hard to do on the spot, but
excuses dwindle a bit more when the communication is asynchronous via
email.

It's not that difficult to find information_schema.table_constraints
and see that constraint_type has "PRIMARY KEY"

>>Keeping random atavistic things around, would slow us down, which will be a
>>price everybody is paying.
> Random atavistic things? I hardly think relhaspkey is random. It's been
> there since version 7.2.
> Exactly how does keeping it around slow you/us down?

Well, it's been known that some people misuse relhaspkey. For example,
in [1], someone is recommending to use relhaspkey to check for tables
which don't have a PRIMARY KEY constraint.  This was the wrong advice
as the flag could remain set after the primary key was dropped from
the table and before any vacuum took place on the table. The correct
advice should have been the same as what Tom mentioned above, by
checking for a pg_index record for the table with indisprimary as
true.  Alternatively, another useful response would have been to check
information_schema, which would have provided an SQL standard way to
check.

Now, in regards to [1]. I rather hope that you can sympathize with the
decision to remove the column a little as the person who made the
incorrect recommendation in [1] was none other than you yourself.  So
it seems that you've only assisted in contributing to the columns
removal by not only misusing it yourself but also instructing others,
publically to do the same.

Now, in regards to your general tone here. It appears you're under the
assumption that the column was removed for some malicious reason in
order to break people's scripts, but I can assure you, there was no
malicious intent involved. However, the column *was* removed exactly
in order to break queries.  The reason being it was most likely the
queries were already broken and we deemed the problem big enough to
remove the misleading column in order to let people know their queries
were broken.  Your argument to put the column back carries very little
weight, as it appears your script is trying to determine which tables
have no primary key incorrectly. So I'd recommend that, instead of
expending some keystrokes in replying to this email, that instead, you
spend them fixing your broken code. Tom has kindly given you a very
good starting point too.

Personally, if I had been using a query like yours, I'd be thanking
Peter for highlighting it was broken for me.

If you'd like something else to read, please also look at [2]. I
imagine this is the sort of thing that Andres is talking about.

[1] 
https://www.postgresql.org/message-id/CANu8FiyQsQg7bF3FPT+FU=kK=wjhfewpp+6qe9fxf6yxr+w...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/20140317185255.20724.49675%40wrigleys.postgresql.org

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Template0 datfrozenxid age is 160million and progressing

2018-08-01 Thread David Rowley
On 2 August 2018 at 02:31, Rijo Roy  wrote:
> I agree.. But unfortunately it is the business call and we cannot alter it..
> I am planning to convince them by keeping autovacuum_analyze_threshold to a
> high value so that auto analyse will not kick in very often leaving the
> autovacuum to do its job..
> Please advise.

If autovacuum is disabled, it'll still kick in for any anti-wraparound
work that needs to be performed.

This is also mentioned in the docs:

"Note that even when this parameter is disabled, the system will
launch autovacuum processes if necessary to prevent transaction ID
wraparound. See Section 24.1.5 for more information."

https://www.postgresql.org/docs/10/static/runtime-config-autovacuum.html

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Weird behaviour of the planner

2018-08-01 Thread David Rowley
On 2 August 2018 at 02:48, Guillaume Lelarge  wrote:
> EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
>
>QUERY PLAN
> -
>  Unique  (cost=12005.97..12049.20 rows=1 width=1430) (actual
> time=20055.294..20323.348 rows=1 loops=1)
>->  Sort  (cost=12005.97..12006.30 rows=132 width=1430) (actual
> time=20055.290..20105.738 rows=6 loops=1)
>  Sort Key: (... 130 columns ...)
>  Sort Method: external sort  Disk: 84464kB
>  ->  Seq Scan on gleu2  (cost=0.00..12001.32 rows=132 width=1430)
> (actual time=0.109..114.142 rows=6 loops=1)
>  Planning time: 10.012 ms
>  Execution time: 20337.854 ms
> (7 rows)
>
> That looks quite good. The total cost is 12049, so I expect this plan to
> have the smaller cost as it's the choosen plan. Now, I'm disabling Sort, and
> here is what I get:
>
> SET enable_sort TO off;
> EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
>
>QUERY PLAN
> -
>  HashAggregate  (cost=12044.22..12044.23 rows=1 width=1430) (actual
> time=508.342..508.343 rows=1 loops=1)
>Group Key: (... 130 columns ...)
>->  Seq Scan on gleu2  (cost=0.00..12001.32 rows=132 width=1430) (actual
> time=0.036..57.088 rows=6 loops=1)
>  Planning time: 1.335 ms
>  Execution time: 508.529 ms
> (5 rows)

When the planner adds a new path it compares the cost not exactly, but
with a 'fuzz factor'.  It's very likely that the hashagg path did not
make it as it was not fuzzily any cheaper than the unique path. By
default, this fuzz factor is 1%.

It seems in your case the costs don't quite match reality which is
quite likely due to the poor row estimates on "gleu2".  Has that table
been analyzed recently? or is there some reason that auto-vacuum is
not getting to it?

There's a bit more reading of what I'm talking about in
https://github.com/postgres/postgres/blob/master/src/backend/optimizer/util/pathnode.c#L141

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: ProjectSet

2018-08-02 Thread David Rowley
On 2 August 2018 at 21:17, Oliver Kohll  wrote:
> Is there an explanation of ProjectSet anywhere?

Plan node types and what they each do are not very well documented
outside of the source code.

ProjectSet appears when the SELECT or ORDER BY clause of the query.
They basically just execute the set-returning function(s) for each
tuple until none of the functions return any more records.

Simple Example:

EXPLAIN SELECT generate_series(1,2);
   QUERY PLAN
-
 ProjectSet  (cost=0.00..5.02 rows=1000 width=4)
   ->  Result  (cost=0.00..0.01 rows=1 width=0)
(2 rows)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: ProjectSet

2018-08-02 Thread David Rowley
On 3 August 2018 at 01:16, Oliver Kohll  wrote:
> Ah thanks David, at least I know what it is now. I don't think I have any
> set returning functions though will double check in case there's a joined
> view that has one. Perhaps it could be the something to do with cross
> product which similarly creates multiple rows on the right for each row on
> the left side.

postgres=# select proretset from pg_proc where proname = 'regexp_matches';
 proretset
-------
 t
 t
(2 rows)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Will there ever be support for Row Level Security on Materialized Views?

2018-08-27 Thread David Rowley
On 28 August 2018 at 09:58, Ken Tanzer  wrote:
> On Tue, Aug 14, 2018 at 5:24 PM Ken Tanzer  wrote:
>> Basically, I'm wondering whether materialized views are likely to ever 
>> support row-level security.
>
> Hi.  Just wanted to follow up on my own email and see if anyone could answer 
> this.

You might think there's some master project-wide list of things that
are to implement that we all work from, but there's not. There is a
todo list [1] that might serve as some sort of guide for a new person
who wishes to contribute something, who's looking for inspiration as
to what to contribute, but I don't think that list ever goes much
beyond that.

New features normally appear for one of two reasons:

1) A developer gets inspired to make something happen; or
2) Someone pays a developer to make something happen.

So I guess in this case either 1 or 2 has not happened, or they have
but the correct people have either not seen your email or they just
simply don't want to answer.

If you're desperate for the feature, I'd recommend looking into the
possibility of going with 1 or 2.

[1] https://wiki.postgresql.org/wiki/Todo



Re: Erroneous behavior of primary key

2018-08-27 Thread David Rowley
On 28 August 2018 at 07:50, Daniel J Peacock  wrote:
> I've got an odd situation with a table that has a varchar(255) as the
> primary key that is getting key values from an Elasticsearch engine.  What
> I'm finding is that even though there is a primary key on the table, I am
> getting duplicated keys in the data and the constraint is not blocking
> these.

How do you know they're duplicated?

>  When I do a "select ,count(*) from  group by
>  having count(*) > 1" I get no results.  Yet, when I search the
> table for a value that is like a key I know to be duplicated, I get multiple
> results.  When I select from the table where field is equal to the
> duplicated field I get one result.  I verified that they are distinct row
> with ctid.  I also created a clone of the table with CTAS and then tried to
> create a unique index on the id varchar field but that failed with
> "duplicate keys found".  I'm stumped as to what could be the problem.
> The only thing that I can think of is that the primary key is somehow
> corrupt.  I've noticed this behavior on other tables on this database.
> What could be causing this sort of problem?

If the index is corrupt then you might find that:

set enable_indexscan = 0;
set enable_indexonlyscan = 0;
select ,count(*) from  group by  having count(*) > 1;

would return some rows.  You should also verify the above query does
use a Seq Scan by performing an EXPLAIN on the query.

There are issues that have been fixed in previous releases which could
have caused an index to get corrupted in this way, so it's quite
useful to know which version of PostgreSQL you're running here and if
you've paid attention to the release notes when you've previously
upgraded. For example, see [1].

[1] https://www.postgresql.org/docs/9.6/static/release-9-6-2.html

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Will there ever be support for Row Level Security on Materialized Views?

2018-08-27 Thread David Rowley
On 28 August 2018 at 11:06, Ken Tanzer  wrote:
> Or perhaps they "simply don't want to answer." Well I wondered about that 
> too, since the last person who asked about this didn't get a response either. 
>  Is it a particularly stupid question?  Is there some history that makes it a 
> touchy subject?  Or is it just the Bermuda Triangle of this mailing list? :)

I can't imagine why it would be a touchy subject.  Generally, if
you're working on a feature its best not to keep it a secret as if
someone else does the same, then you end up with redundant work being
done.

If I had to guess what's going on here then I'd say that nobody has
been sufficiently motivated to work on this yet. If that's the case,
everyone who reads your email is not the person working on this
feature, so can't answer your question.   I just answered to suggest
the reasons why you might not be getting an answer.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Postgres 11, partitioning with a custom hash function

2018-10-03 Thread David Rowley
On 4 October 2018 at 16:22, Harry B  wrote:
> I am still having trouble reconciling what happens under the HASH
> partitioning!. If I have text column forming the basis of PARTITIONED BY
> HASH, the HASH value used in the partitioning setup does not seem to match
> to `hashtext()` of that value

It won't match. The hash partition hash is seeded with a special const
(HASH_PARTITION_SEED) see [1].

You could likely roll your own hash ops. See [2] for an example. This
can then be used to create a hash partitioned table like [3].

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/partitioning/partbounds.c#l2056
[2] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/regress/sql/insert.sql#l241
[3] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/regress/sql/hash_part.sql#l10

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Postgres 11, partitioning with a custom hash function

2018-10-04 Thread David Rowley
On 5 October 2018 at 06:18, Harry B  wrote:
>
> Thank you David! These helped me create an operator class.
> However, there still seems to be a 'off-by-a-fixed-N' difference between the
> hash value returned and how PG selects the partition.

hmm, actually, this is probably due to the hash_combine64() call in
compute_partition_hash_value(). This seems to combine the hash value
with 0 regardless of if there's another partition key column to hash.
If you could somehow do the reverse of what hash_combine64() will do
to you hash before returning it to the function then you might get
somewhere, but that does not look possible since it appears to be
throwing away some bits.

It may not have been a great choice to decide to have
compute_partition_hash_value() do this unnecessary combine, but it's
likely a few months too late to change that now.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Postgres 11, partitioning with a custom hash function

2018-10-04 Thread David Rowley
On 5 October 2018 at 09:43, Harry B  wrote:
> Now the big question: How scared should I be relying on this? I don't mind
> it breaking on major version upgrades (which would mean I need to dump &
> restore my entire set), but how likely is it to change unannounced in a
> minor/security release? Unless of course, you break it in a way that makes
> custom-hash function impossible.

I don't see how we could possibly change it once v11 is out the door.
Such a change would break pg_upgrade and I imagine we want that to
work for a long time to come yet, at least until there is some other
reason that is worthy of breaking it. The bar is likely set pretty
high for that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Help with list partitioning on expression

2018-10-21 Thread David Rowley
On 19 October 2018 at 02:49, Dinko Papak  wrote:
> I have created table partitioned by list on expression using timestamp
> column, so for each timestamp value function returns int and each partition
> table is for single value (range of timestamp for which function result is
> the same). This helps me to simplify querying as I do not need to always
> write date ranges. Querying (partition pruning) works nice, however when I
> attach new partition it seems to always scan whole table, although I do have
> necessary check constraint on partitioned table. I have tried to make
> timestamp column both null and not null. Also, it takes longer to attach
> partition then to add constraint itself although per my understanding those
> 2 operations should do the same scan.

It's not all that obvious, but if you have PARTITION BY LIST
(extract(minute FROM ts)) and try to attach a partition like:

CREATE TABLE mypartition PARTITION OF parted FOR VALUES IN (1);

then the partition constraint is actually (extract(minute FROM ts) IS
NOT NULL AND extract(minute FROM ts) = 1). If your CHECK constraint is
just checking `extract(minute FROM ts) = 1` then the current code in
PartConstraintImpliedByRelConstraint() is not smart enough to know
that `extract(minute FROM ts) = 1` is strict and cannot match nulls.
Perhaps that could be improved, but that's how it is today.

Likely you'll have better luck with a check constraint that explicitly
checks the function IS NOT NULL.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Aggregated join vs. aggregate in column?

2019-07-12 Thread David Rowley
On Fri, 12 Jul 2019 at 19:32, Durumdara  wrote:
> 2.) I may relocate this section as join...
>
> select
> request.*, s.max_s_date
> from request
> left join
> (
> select schedule.product_id, max(s_date) as max_s_date from schedule
> where schedule.ok = True
> group by  schedule.product_id
>  ) s on (s.product_id = request.product_id)
> ...
>
> But I really don't know what the hell will happen in query optimizer with 
> this method.
>
> a.)
> Optimizer is clever, and it calculates the aggregates only in the needed rows.
> So it find the request.product_id-s, and execute the "s" query only in these 
> rows.
>
> b.)
> Or it isn't enough wise, it executes the "s" subquery on whole schedule, and 
> later joins to main table.
> The schedule table is big in customers' database, so this is worst case. :-(

I'm afraid for the particular query above, the answer is closer to b)
However, that's only going to be a problem if there are many more
distinct product_id records in "schedule". If you were to add a WHERE
clause to the outer query that did WHERE request.product_id = X, then
that qual would be pushed down into the subquery.  This qual pushing
only works for equality. So if you changed out WHERE
request.product_id = X to WHERE request.product_id IN(X,Y); then that
wouldn't push the qual to the subquery.

> I asked this because sometimes I need to get more result columns in the 
> select, but I couldn't retreive more...
>
> (
> select max(s_date) as s_date from schedule
> where schedule.product_id = request.product_id and schedule.ok = True
>  ) as max_s_date,   <=== only one column
>
> So sometimes the join is better.
>
> But if the optimizer isn't enough wise, I can get these values only "WITH" 
> queries (select the main rows to temp, run subselects with only these 
> records, return the mix of main and subselects in one query).

If you don't want to repeat the same subquery in the SELECT list then
you could perform a CROSS JOIN LATERAL. For example:

select
request.*, s.max_s_date, s.max_s1_date
from request
cross join lateral (
select max(s_date) as max_s_date, max(s1_date) as max_s1_date
from schedule
where schedule.ok = True
   and s.product_id = request.product_id) s;

In this case, the subquery will be executed once per output row, so if
you have some restrictive WHERE clause on the outer query then the
subquery will be executed fewer times.

With a bit of training, you should be able to see what the query
planner has done for yourself by using the EXPLAIN command:
https://www.postgresql.org/docs/current/sql-explain.html

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: lead() with arrays - strange behaviour

2019-08-08 Thread David Rowley
On Thu, 8 Aug 2019 at 21:06, Thomas Kellerer  wrote:
> The following statement tries to find the overlapping values in id_list 
> between the current row and the next row:
>
> select id,
>id_list,
>lead(id_list) over (order by id) as next_list,
>array(select unnest(id_list) intersect select unnest(lead(id_list) 
> over (order by id))) as common_ids
> from sample_data;
>
> The above returns:
>
> id | id_list | next_list | common_ids
> ---+-+---+---
>  1 | {1,2,3} | {2,3,4}   | {}
>  2 | {2,3,4} | {4,5,6}   | {}
>  3 | {4,5,6} |   | {}
>
> The empty array for "common_ids" is obviously incorrect.

I think you're confused with what the SELECT with the empty FROM
clause does here.  In your subquery "id_list" is just a parameter from
the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
return anything since those are both just effectively scalar values,
to which there is no "next" value.

--
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: FW: Undelivered Mail Returned to Sender

2019-08-11 Thread David Rowley
On Sun, 11 Aug 2019 at 06:53, stan  wrote:
>
> I apologize for asking, what I suspect will turn out to be a newbie
> question, but I have managed to get myself quite confused on this.
>
> I am defining a view as follows
>
>
> CREATE OR REPLACE view purchase_view as
> select
> project.proj_no ,
> qty ,
> mfg_part.mfg_part_no ,
> mfg.name as m_name ,
> mfg_part.descrip as description ,
> (
> SELECT
> name
> FROM
> vendor
> WHERE
> bom_item.vendor_key =
> (
> SELECT
> vendor_key
> FROM
> mfg_vendor_relationship
> WHERE
> bom_item.mfg_key = mfg_key
> AND
> prefered = TRUE
> AND
> bom_item.project_key = project_key
>
> )
> )
> as v_name ,

> ERROR:  more than one row returned by a subquery used as an expression
>
> Can someone please enlighten me as to the error of my ways?

Looks to me like your WHERE clause is wrong in the subquery. "WHERE
bom_item.vendor_key ="  surely that should be just "WHERE vendor_key =
" (assuming that's the primary key column of the vendor table).

Also, you've mentioned you've only a single record in the
mfg_vendor_relationship, so the error can't be due to multiple records
matching in the mfg_vendor_relationship table. However, given the
unique constraint on that table includes 3 columns and you're just
filtering on 2 of them, then it would only take some rows in there
with the same mfg_key and project_key values but a different
vendor_key to get the same error from that part of the query.  If that
shouldn't happen, then perhaps your UNIQUE constraint should not
contain the vendor_key column.  You'd have to explain what you need in
more detail for someone to be able to help you fix that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: How to make runtime partition pruning work?

2019-10-14 Thread David Rowley
On Fri, 11 Oct 2019 at 21:49, Markus Heiden  wrote:
>
> I partitioned a table "data_table" by the key "import_id" to reduce the
> number of partitions to be loaded in my queries.
> I used list partitions, each containing usually just one "import_id". I
> used a primary key (id, import_id)
> But PostgreSQL does not consider partition keys to avoid loading not
> needed partitions.
>
> My query:
> SELECT SUM(something) FROM data_table WHERE import_id IN (SELECT id FROM
> import_table WHERE ...)
> My problem:
> The query takes too long, because PostgreSQL uses a hash join over all
> partitions of "data_table" with the "import_table", instead of pruning
> the "data_table" partitions by the import_ids at runtime.
> Static pruning (when using ... IN (1, 2, 3, 4)) works fine though.
>
> What am I doing wrong that runtime partition pruning with PostgreSQL
> 11.5 does not work in my case?

The documentation for this reads:

"Partition pruning can be performed not only during the planning of a
given query, but also during its execution. This is useful as it can
allow more partitions to be pruned when clauses contain expressions
whose values are not known at query planning time; for example,
parameters defined in a PREPARE statement, using a value obtained from
a subquery or using a parameterized value on the inner side of a
nested loop join. Partition pruning during execution can be performed
at any of the following times:

During initialization of the query plan. Partition pruning can be
performed here for parameter values which are known during the
initialization phase of execution. Partitions which are pruned during
this stage will not show up in the query's EXPLAIN or EXPLAIN ANALYZE.
It is possible to determine the number of partitions which were
removed during this phase by observing the “Subplans Removed” property
in the EXPLAIN output.

During actual execution of the query plan. Partition pruning may also
be performed here to remove partitions using values which are only
known during actual query execution. This includes values from
subqueries and values from execution-time parameters such as those
from parameterized nested loop joins. Since the value of these
parameters may change many times during the execution of the query,
partition pruning is performed whenever one of the execution
parameters being used by partition pruning changes. Determining if
partitions were pruned during this phase requires careful inspection
of the loops property in the EXPLAIN ANALYZE output. Subplans
corresponding to different partitions may have different values for it
depending on how many times each of them was pruned during execution.
Some may be shown as (never executed) if they were pruned every time."

https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITION-PRUNING

Notice that only subqueries and parameterized nested loop joins are
mentioned.  The above text does not really go into the detail of which
types of subqueries can be used, but I can confirm that they must be
subqueries that can only return a scalar value. e.g WHERE x = (SELECT
y FROM ...).  The executor would raise an error if that subquery
returned more than 1 row. The IN() clause you have is not eligible.
This will be converted into a semi-join during planning, and even if
it wasn't, the executor wouldn't be raising an error if it returned
multiple rows.

Unfortunately, to do what you mention with a hash join, we'd need to
scan through the entire hash table and incrementally build the set of
partitions which could match each value in the table.  I'm sure there
are times where that would be well worth the trouble, but I imagine in
the average case we might find that the cost of scanning the entire
table this way would be more than just executing the query without any
partition pruning. I don't see any good way to know in advance if it
would be worthwhile or not.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: select view definition from pg_views feature request

2019-11-04 Thread David Rowley
On Tue, 5 Nov 2019 at 11:05, George Neuner  wrote:
>
> On Sun, 03 Nov 2019 16:15:23 -0500, Tom Lane 
> >Why don't you just change the search path to empty before selecting?
>
>
> I'm guessing this is fine inside a transaction, but what if you're
> using a client that commits by statement?  In that case, wouldn't the
> change to the path affect the other backends?

See https://www.postgresql.org/docs/current/sql-set.html

"SET only affects the value used by the current session."

Also:

"The effects of SET LOCAL last only till the end of the current transaction"

Neither affects other sessions.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Partition, inheritance for storing syslog records.

2019-11-18 Thread David Rowley
On Tue, 12 Nov 2019 at 00:45, Mimiko  wrote:
> There could be not sub-partition by fromhost. Only by syslogtag, then by year.
>
> So how could be this accomplished both in version 8.4 and in version 12. 
> Other ideas are welcome.
>
> The main goal is to be able to quickly investigate logs from some 
> applications and some host searching regex in `message` column.

The documents in
https://www.postgresql.org/docs/12/ddl-partitioning.html go to lengths
to explain how to do this in v12.  You should have a read of that and
pay particular attention in areas that mention "sub-partitioning".  If
you're still uncertain after that, then ask here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Query to retrieve the index columns when a function is used.

2020-03-11 Thread David Rowley
On Tue, 10 Mar 2020 at 02:16, Sterpu Victor  wrote:
> I'm testing on Postgresql 12.1 and I have a index like this:
> "check_dates_gist" EXCLUDE USING gist (id_test1 WITH =, id_test2 WITH =, 
> tsrange(valid_from::timestamp without time zone, valid_to::timestamp without 
> time zone) WITH &&)
>
> When I run this query:
> "select pc.relname, pi.indisunique, pi.indisprimary, array_agg(a.attname) as 
> attname
> FROM pg_class pc
> JOIN pg_index pi ON pc.oid = pi.indexrelid AND pc.oid IN
> (SELECT indexrelid FROM pg_index, pg_class WHERE pg_class.relname='test' AND 
> pg_class.oid=pg_index.indrelid)
> JOIN pg_attribute a ON a.attrelid = pc.oid
> GROUP BY pc.relname,pi.indisunique, pi.indisprimary;"
> I retrieve the index but there is no detail about the columns valid_from and 
> valid_to.
>
> How can I retrieve this detail?

Those details are in the indexprs column.  There's an item there for
each 0 valued indkey. It's not going to be particularly easy for you
to parse that from SQL.  Internally in PostgreSQL, we have functions
that could give you that information easily, but to access those from
SQL you'd need to write something in C.  The C function is named
pull_varattnos(). That will give you a Bitmapset for each of the
columns that are found. You'd need to write a set-returning function
to return those values then join that to pg_attribute.

Alternatively, it likely wouldn't be impossible to do at the SQL level
with regexp_matches(), but building an expression to reliably extract
what you want might not be an easy task. On a quick test, I see that:

select indexrelid, unnest(r.e) from pg_index i, lateral
regexp_matches(i.indexprs, 'VAR :varno 1 :varattno (\d{1,})', 'g')
r(e) where i.indexprs is not null;

does kick out the correct results for the expression indexes I have in
my database, but there may be some more complex expressions that break
it.




Re: Is there a script and sample data to test the performance of postgresql?

2020-04-17 Thread David Rowley
On Fri, 17 Apr 2020 at 20:13, Kiwoong Choi  wrote:
> Is there a script and sample data to test the performance of postgresql?

There is pgbench.  https://www.postgresql.org/docs/current/pgbench.html

David




Re: COPY blocking \dt+?

2020-05-04 Thread David Rowley
On Tue, 5 May 2020 at 13:05, pinker  wrote:
> I'm running standard pgbench and what's kinda strange copy pgbench_accounts
> from stdin is blocking my other query which is \dt+.
> Does copy hold any exclusive lock or there is something wrong with my
> system?

COPY does not. However, pgbench does TRUNCATE the tables before doing
the COPY, so that'll be most likely what's causing your query to get
blocked.

David




Re: COPY blocking \dt+?

2020-05-04 Thread David Rowley
On Tue, 5 May 2020 at 14:50, pinker  wrote:
> So it would need to run inside single transaction to cause lock, right? do
> you know if pgbench is opening transaction?

The TRUNCATE and COPY are done in the same transaction.  You can see
the code in 
https://github.com/postgres/postgres/blob/REL_11_6/src/bin/pgbench/pgbench.c#L3599

David




Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

2020-05-07 Thread David Rowley
On Fri, 8 May 2020 at 10:00, David G. Johnston
 wrote:
>
> On Thu, May 7, 2020 at 11:07 AM Amarendra Konda  wrote:
>>
>> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT pa.process_activity_id AS 
>> pa_process_activity_id  FROM process_activity pa WHERE pa.app_id = 
>> '126502930200650' AND pa.created > '1970-01-01 00:00:00'  AND EXISTS ( 
>> SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND 
>> pi.process_instance_id = pa.process_instance_id  AND pi.user_id = 
>> '137074931866340') ORDER BY pa.process_instance_id,  pa.created limit 50;
>>
>>
>>->  Index Scan using 
>> process_activity_process_instance_id_app_id_created_idx on 
>> public.process_activity pa  (cost=0.70..1061.62 rows=1436 width=32) (actual 
>> time=0.011..20.320 rows=23506 loops=2)
>
> > Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id 
> > = '126502930200650'::bigint) AND (m.created > '1970-01-01 
> > 00:00:00'::timestamp without time zone))
>
> I suppose during the nested loop the inner index scan could limit itself to 
> the first 50 entries it finds (since the first two index columns are being 
> held constant on each scan, m.created should define the traversal order...) 
> so that the output of the nested loop ends up being (max 2 x 50) 100 entries 
> which are then sorted and only the top 50 returned.
>
> Whether the executor could but isn't doing that here or isn't programmed to 
> do that (or my logic is totally off) I do not know.

I think the planner is likely not putting the process_activity table
on the outer side of the nested loop join due to the poor row
estimates.  If it knew that so many rows would match the join then it
likely would have done that to save from having to perform the sort at
all.  However, because the planner has put the process_instance on the
outer side of the nested loop join, it's the pathkeys from that path
that the nested loop node has, which is not the same as what the ORDER
BY needs, so the planner must add a sort step, which means that all
rows from the nested loop plan must be read so that they can be
sorted.

It might be worth trying: create index on process_instance
(user_id,app_id); as that might lower the cost of performing the join
in the opposite order and have the planner prefer that order instead.
If doing that, the OP could then ditch the
fki_conv_konotor_user_user_id index to save space.

If that's not enough to convince the planner that the opposite order
is better then certainly SET enable_sort TO off; would.

David




Re: AutoVacuum and growing transaction XID's

2020-05-07 Thread David Rowley
On Fri, 8 May 2020 at 09:18, github kran  wrote:
> 1)  We haven't changed anything related to autovacuum except a work_mem 
> parameter which was increased to 4 GB which I believe is not related to 
> autovacuum

It might want to look into increasing vacuum_cost_limit to something
well above 200 or dropping autovacuum_vacuum_cost_delay down from 20
to something much lower. However, you say you've not changed the
autovacuum settings, but you've also said:

>1)  I see there are 8 Vacuum workers ( Not sure what changed) running in 
> the background and the concern I have is all of these vacuum processes are 
> running with wrap around and while they are running

The default is 3, so if you have 8 then the settings are non-standard.

It might be good to supply the output of:

SELECT name,setting from pg_Settings where name like '%vacuum%';

You should know that the default speed that autovacuum runs at is
quite slow in 9.6. If you end up with all your autovacuum workers tied
up with anti-wraparound vacuums then other tables are likely to get
neglected and that could lead to stale stats or bloated tables. Best
to aim to get auto-vacuum running faster or aim to perform some manual
vacuums of tables that are over their max freeze age during an
off-peak period to make use of the lower load during those times.
Start with tables in pg_class with the largest age(relfrozenxid).
You'll still likely want to look at the speed autovacuum runs at
either way.

Please be aware that the first time a new cluster crosses the
autovacuum_freeze_max_age threshold can be a bit of a pain point as it
can mean that many tables require auto-vacuum activity all at once.
The impact of this is compounded if you have many tables that never
receive an UPDATE/DELETE as auto-vacuum, in 9.6, does not visit those
tables for any other reason. After the first time, the relfrozenxids
of tables tend to be more staggered so their vacuum freeze
requirements are also more staggered and that tends to cause fewer
problems.

David




Re: AutoVacuum and growing transaction XID's

2020-05-07 Thread David Rowley
On Fri, 8 May 2020 at 13:51, github kran  wrote:
>   I can't either DROP or ALTER any other tables ( REMOVE Inheritance for 
> any of old tables where the WRITES are not getting written to). Any of the 
> ALTER TABLE OR DROP TABLE  DDL's arer not getting exeucted even I WAITED FOR 
> SEVERAL MINUTES , so I have terminated those queries as I didn't have luck.

The auto-vacuum freeze holds an SharedUpdateExclusiveLock on the table
being vacuumed. If you try any DDL that requires an
AccessExclusiveLock, it'll have to wait until the vacuum has
completed. If you leave the DDL running then all accesses to the table
will be queued behind the ungranted AccessExclusiveLock.  It's likely
a good idea to always run DDL with a fairly short lock_timeout, just
in case this happens.

>3)  Can I increase the  autovacuum_freeze_max_age on the tables on 
> production system ?

Yes, but you cannot increase the per-table setting above the global
setting. Changing the global setting requires a restart.

David




Re: Slow SELECT

2020-05-26 Thread David Rowley
On Tue, 26 May 2020 at 19:23, Frank Millman  wrote:
> The table sizes are -
>  my_table : 167 rows
>  table_1 : 21 rows
>  table_2 : 11 rows
>  table_3 : 3 rows
>  table_4 : 16 rows
>
> Therefore for each tran_date in my_table there are potentially
> 21x11x3x16 = 11088 rows. Most will be null.
>
> I want to select the row_id for the last tran_date for each of those
> potential groups. This is my select -
>
>  SELECT (
>  SELECT a.row_id FROM my_table a
>  WHERE a.fld_1 = b.row_id
>  AND a.fld_2 = c.row_id
>  AND a.fld_3 = d.row_id
>  AND a.fld_4 = e.row_id
>  AND a.deleted_id = 0
>  ORDER BY a.tran_date DESC LIMIT 1
>  )
>  FROM table_1 b, table_2 c, table_3 d, table_4 e
>
> Out of 11088 rows selected, 103 are not null.

Perhaps SQL Server is doing something to rewrite the subquery in the
target list to a LEFT JOIN.  PostgreSQL currently does not do that.

Since "my_table" is small, you'd likely be much better doing a manual
rewrite of the query to join a subquery containing the required
details from "my_table".  It looks like you want the row_id from the
latest tran_date for each fld_N column. So something like:

SELECT a.row_id
FROM table_1 b
CROSS JOIN table_2 c
CROSS JOIN table_3 d
CROSS JOIN table_4 e
LEFT OUTER JOIN (
SELECT fld_1,fld_2,fld_3,fld_4,row_id,tran_date,
ROW_NUMBER() OVER (PARTITION BY fld_1,fld_2,fld_3,fld_4 ORDER BY
tran_date DESC) row_num
FROM my_table
WHERE deleted_id = 0
) a ON a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 =
d.row_id AND a.fld_4 = e.row_id AND a.row_num = 1;

Should do it. You could also perhaps do something with DISTINCT ON
instead of using ROW_NUMBER(). That might be a bit more efficient, but
it's unlikely to matter too much since there are only 167 rows in that
table.

David




Re: Slow SELECT

2020-05-26 Thread David Rowley
On Tue, 26 May 2020 at 22:31, Frank Millman  wrote:

> Thank you David. I tried that and it produced the correct result in
> 53ms, which is what I am looking for.
>
> It will take me some time to understand it fully, so I have some
> homework to do!

The main problem with your previous query was that the subquery was
being executed 11088 times and could only ever find anything 167
times. The remaining number of times nothing would be found.

I just changed the subquery which would be executed once per output
row and altered it so it became a subquery that's joined and only
executed once.  The ROW_NUMBER() is a windowing function, which is
explained in [1].  I used this to get the row_id of the record with
the lowest tran_date, just like you were doing with the ORDER BY
tran_date DESC LIMIT 1, but the subquery with the windowing function
gets them all at once, rather than doing it in a way that requires it
to be executed once for each row in the top-level query. In this case,
the functionality that the LIMIT 1 does in your query is achieved with
"AND a.row_num = 1;" in my version. This is pretty fast to execute
once due to there only being 167 rows.

It's also important to know that there may be cases where the method I
proposed is slower.  For example, if my_table was very large and
contained rows that were not in table_1 to table_4. Since the subquery
in my version calculates everything then it could be wasteful to do
that for values that would never be used. For you, you have foreign
keys that ensure my_table does not contain records that are not in the
other tables, but you could still see this issue if you were to add
some restrictive WHERE clause to the outer query.  Perhaps this won't
be a problem for you, but it's likely good to know.

[1] https://www.postgresql.org/docs/current/tutorial-window.html

David




Re: Slow SELECT

2020-05-26 Thread David Rowley
On Tue, 26 May 2020 at 23:41, Vik Fearing  wrote:
>
> On 5/26/20 12:04 PM, David Rowley wrote:
> > Since "my_table" is small, you'd likely be much better doing a manual
> > rewrite of the query to join a subquery containing the required
> > details from "my_table".  It looks like you want the row_id from the
> > latest tran_date for each fld_N column. So something like:
> >
> > SELECT a.row_id
> > FROM table_1 b
> > CROSS JOIN table_2 c
> > CROSS JOIN table_3 d
> > CROSS JOIN table_4 e
> > LEFT OUTER JOIN (
> > SELECT fld_1,fld_2,fld_3,fld_4,row_id,tran_date,
> > ROW_NUMBER() OVER (PARTITION BY fld_1,fld_2,fld_3,fld_4 ORDER BY
> > tran_date DESC) row_num
> > FROM my_table
> > WHERE deleted_id = 0
> > ) a ON a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 =
> > d.row_id AND a.fld_4 = e.row_id AND a.row_num = 1;
> >
> > Should do it. You could also perhaps do something with DISTINCT ON
> > instead of using ROW_NUMBER(). That might be a bit more efficient, but
> > it's unlikely to matter too much since there are only 167 rows in that
> > table.
>
> I would expect a lateral query to be better here.

But that would put it back to executing the subquery 11088 times. I
wrote it in a way to avoid that.

David




Re: Possible improvement

2020-06-05 Thread David Rowley
On Fri, 5 Jun 2020 at 14:41, Paul van der Linden
 wrote:
> If I have a query like:
>
> SELECT * FROM (
> SELECT
>   CASE
>   WHEN field='value1' THEN 1
>   WHEN field='value2' THEN 2
>   END AS category
> FROM table1
> ) AS foo
> WHERE category=1
>
> doesn't use the index on field, while technically it could do that.
> Is it hard to implement drilling down the constant in the WHERE to within the 
> CASE?

It doesn't look impossible to improve that particular case.  See
eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However,
this would need to take constant folding further than we take it
today. Today we just have the ability to simplify expressions which
are, by themselves, an expression which will always evaluate to a
constant value. This case is more complex as it requires something
outside the CASE expr to allow the simplification to take place. In
this case, we'd need to look at the other side of the OpExpr to see
the const there before any transformation could simplify it.  It's
also not entirely clear that the simplification would always be a good
idea.  What, for example if there was an index on the case statement
but none on "field". The query may perform worse!  The unfortunate
part about this is that, generally, when we perform constant folding,
we don't yet have an idea about which indexes exist.  I imagine the
only sane way to do it would be to allow expressions to have some sort
of "alternative" expression that could be matched up to the index
column instead. It wouldn't be a trivial piece of work to do that.

For the more simple cases, you can see from looking at:

postgres=# explain select * from pg_class where oid = (case when
'test' = 'test' then 1 else 0 end);
 QUERY PLAN
-
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=260)
   Index Cond: (oid = '1'::oid)
(2 rows)

that we do simplify case statements which are by themselves constant.

> Is this something that could be put on some wishlist? If so where are the 
> most looked at ones?

There is a todo list of sorts in [1]. However, I'm really unsure if
anyone ever looks at it for something to do. Mostly, people have their
own ideas and problems to solve and spend their free cycles hacking
away at those. You might have equal luck waiting until December and
writing it on a piece of paper and setting it on fire. Likely there
would be more chance if it was something simple as a novice who's
looking into getting into working on Postgres might skim that list for
something to work on.  More experienced people, I imagine, would never
look there.  FWIW, many people who are now working on PostgreSQL once
came along with a question or idea like yours. Many have been unable
to escape ever since :)

David

[1] https://wiki.postgresql.org/wiki/Todo




Re: Potential optimisation for the creation of a partial index condition over a newly created nullable, non-default-valued column?

2020-06-05 Thread David Rowley
On Thu, 4 Jun 2020 at 17:59, Tim Dawborn  wrote:
> tmp=> \timing on
> Timing is on.
> tmp=> BEGIN;
> BEGIN
> Time: 1.333 ms
> tmp=> ALTER TABLE foo ADD COLUMN d integer NULL;
> ALTER TABLE
> Time: 1.581 ms
> tmp=> CREATE UNIQUE INDEX myindex ON foo (a, b, c) where d = 2;
> CREATE INDEX
> Time: 37758.880 ms (00:37.759)
> tmp=> COMMIT;
> COMMIT
> Time: 3.922 ms
>
> Given that d = 2 could not ever be true as the nullable, non-default-valued 
> column was just added inside the same transaction, I was hoping that the 
> index creation would be instantaneous, as it realised there's no rows that 
> this condition could be true for.

While it could be technically possible to do something like check the
xmin of the pg_attribute record for all columns mentioned in the
index's predicate all are set to the current transaction ID and the
index predicate refutes an expression containing those columns with IS
NULL clauses or whatever the DEFAULT expression value is, we've just
no way to know if any rows were inserted or updated between the ALTER
TABLE and the CREATE INDEX. Certainly, no other transaction could have
done anything since we added the column due to us holding the
AccessExclusiveLock.  We just don't really have a way to know if our
own transaction did anything in between. So to do this, we'd need to
invent something to track that, and that something wouldn't be free to
maintain and we'd likely need to maintain it all the time since we'd
be unable to predict what future commands might need to use that
information.

It would likely be easier to go about it by having some sort of ALTER
TABLE ADD INDEX command, then just chain the alter tables together. We
have various other optimisations when multiple subcommands are used in
a single ALTER TABLE.  However, I'm not suggesting we go and allow
indexes to be created in ALTER TABLE. I"m just suggesting that it
would likely be a better alternative than inventing something to track
when a table last had

David




Re: When to use PARTITION BY HASH?

2020-06-08 Thread David Rowley
On Tue, 9 Jun 2020 at 01:07, Ron  wrote:
>
> On 6/8/20 3:40 AM, Oleksandr Shulgin wrote:
> [snip]
>
> I've found the original commit adding this feature in version 11: 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e
> It says:
>
> "Hash partitioning is useful when you want to partition a growing data
> set evenly.  This can be useful to keep table sizes reasonable, which
> makes maintenance operations such as VACUUM faster, or to enable
> partition-wise join."
>
>
> How does hashed (meaning "randomly?) distribution of records make 
> partition-wise joins more efficient?

Hash partitioning certainly does not mean putting the tuple in some
random partition. It means putting the tuple in the partition with the
correct remainder value after dividing the hash value by the largest
partition modulus.

> Or -- since I interpret that as having to do with "locality of data" -- am I 
> misunderstanding the meaning of "partition-wise joins"?

If it was not a partitioned table before then partition-wise joins
wouldn't be possible.  Having partition-wise joins could make joining
two identically partitioned tables faster. We need only look in the
corresponding partition on the other side of the join for join
partners for each tuple. For hash joins, hash tables can be smaller,
which can mean not having to batch, and possibly having the hash table
fit better into a CPU cache. For merge joins, sorts, having the data
partially pre-sorted in chunks means fewer operations for qsort which
can result in speedups.

David




Re: "INSERT ON CONFLICT UPDATE" - Use of indexes ?

2020-06-08 Thread David Rowley
On Tue, 9 Jun 2020 at 00:42, Laura Smith
 wrote:
> What'st the current state of play with indexes and ON CONFLICT ?  The docs 
> seem to vaguely suggest it is possible, but this SO question 
> (https://stackoverflow.com/questions/38945027/) seems to suggest it is not.
>
> I've got a unique named index on a table (i.e. "create unique index xyz...") 
> but I cannot seem to be able to refer to it in a function ?
> ON CONFLICT (index_name)  : does not work
> ON CONFLICT ON CONSTRAINT index_name: does not work

Creating a unique index does not create a unique constraint.  If you
create a unique constraint, it'll create a unique index to enforce the
constraint.  ON CONSTRAINT requires a constraint name, not an index
name.

David




Re: Planner misestimation for JOIN with VARCHAR

2020-06-09 Thread David Rowley
On Wed, 10 Jun 2020 at 07:31, Michael Lewis  wrote:
>
> On Tue, Jun 9, 2020 at 12:34 PM Sebastian Dressler  
> wrote:
>>
>> - Add an index on top of the whole PK
>> - Add indexes onto other columns trying to help the JOIN
>> - Add additional statistics on two related columns
>>
>> Another idea I had was to make use of generated columns and hash the PKs 
>> together to an BIGINT and solely use this for the JOIN. However, this would 
>> not work when not all columns of the PK are used for the JOIN.
>
>
> Can you expand on the additional statistics you created? Why was it on only 
> two columns? Did you include MCVs type of extended stats?

Unfortunately, the join selectivity functions have yet to learn about
extended statistics.

David




Re: Planner misestimation for JOIN with VARCHAR

2020-06-09 Thread David Rowley
On Wed, 10 Jun 2020 at 09:05, Michael Lewis  wrote:
>>
>> the join selectivity functions have yet to learn about extended statistics.
>
>
> That is very interesting to me. So, extended statistics would help to 
> properly estimate the result set coming out of a single table when comparing 
> each of those columns to one or many values, but not when joining up another 
> table? Is that on a roadmap or in progress as far as you know?

Yeah, they're currently only used for selectivity estimates on base
table.  Using extended statistics for join selectivity estimation is
being worked on [1], so there's a chance that it may happen for PG14.

David

[1] https://commitfest.postgresql.org/28/2386/




Re: HASH partitioning not working properly

2020-06-18 Thread David Rowley
On Fri, 19 Jun 2020 at 17:42, Srinivasa T N  wrote:
>After seeing the below, I feel partitioning is not working properly or it 
> maybe case that my understanding is wrong.  Can somebody explain me what is 
> happening?

> postgres=# select * from busbar_version6;
>  objectid | ver_id
> --+
> (0 rows)
>
> postgres=# select * from busbar_version7;
>  objectid | ver_id
> --+
> 6 |  6
> (1 row)
>
>Why second insert has gone to table busbar_version7 instead of 
> busbar_version6?

It's your understanding that's not correct.  The value of is passed
through a hash function and the partition is selected based partition
matching the remainder value after dividing the return value of the
hash function by the largest modulus of any partition.

That might surprise you, but how would you select which partition a
varchar value should go into if you didn't use a hash function.

David




Re: DISTINCT on jsonb fields and Indexes

2020-06-22 Thread David Rowley
On Mon, 22 Jun 2020 at 16:44, Sankar P  wrote:
> select distinct record ->> 'destinationServiceName' from fluent;

> This query takes a lot of time, and does full table scan. The query planning 
> is:

> I see that none of the indexes are used. I want to do a few
> aggregations, like "what are the distinct pairs of
> `destinationServiceName` and `sourceServiceName` etc. " in these
> records. Now, is such a querying possible at all without doing full
> table scans ? I get such kind of aggregation support in elasticsearch
> + kibana, without doing full-table scan (I believe so, but I do not
> have data to back this claim) and I am trying to see if this is
> possible with any other extra index creation in postgres.

There is some work in progress to improve this type of query, but
it'll be at least PG14 before we see that.

For your version, you might want to look at
https://wiki.postgresql.org/wiki/Loose_indexscan and try some of the
proposed solutions from there.

David




Re: pgbench and timestamps

2020-06-24 Thread David Rowley
On Wed, 24 Jun 2020 at 20:41, Jaime Soler  wrote:
>
> Hi, does anybody know what is wrong with pgbench in this case ?. Here is a 
> simple query to generate a random date in a interval time.sql:
>
>  (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 
> 00:00:00' -  timestamp '2005-09-01 00:00:00' ));
> pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U 
> postgres -p 5432
> ERROR:  la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 
> 00$1$2»
>
> I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of 
> timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

https://www.postgresql.org/docs/12/pgbench.html says:

"There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
digits, and underscores. Variables can be set by the command-line -D
option, explained above, or by the meta commands explained below. In
addition to any variables preset by -D command-line options, there are
a few variables that are preset automatically, listed in Table 257. A
value specified for these variables using -D takes precedence over the
automatic presets. Once set, a variable's value can be inserted into a
SQL command by writing :variablename. When running more than one
client session, each session has its own set of variables. pgbench
supports up to 255 variable uses in one statement."

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

I won't pretend to be familiar enough with pgbench internals to know
if there's any reasonable reasons why we do each of the above, but...

I guess you could work around this problem by just not putting the
midnight time in your timestamp. However, that might not work so well
if you want to specify a time other than midnight.

David




Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread David Rowley
On Tue, 7 Jul 2020 at 00:43, Simon Riggs  wrote:
>
> On Mon, 6 Jul 2020 at 12:37, Robins Tharakan  wrote:
>
>>
>> When an SQL needs to UNION constants on either side, it should be possible to
>> implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect 
>> understanding,
>> or something already discussed but rejected for some reason?
>>
>> This need came up while reviewing generated SQL, where the need was to 
>> return true when
>> at least one of two lists had a row. A simplified version is given below:
>>
>> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
>> vs.
>> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- 
>> Faster
>
>
> Those two queries aren't logically equivalent, so you can't apply the LIMIT 1 
> as an optimization.
>
> First query returns lots of random rows, the second query returns just one 
> random row.

I think the idea here is that because the target list contains only
constants that pulling additional rows from the query after the first
one will just be a duplicate row and never add any rows after the
UNION is processed.

David




Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread David Rowley
On Tue, 7 Jul 2020 at 09:03, Simon Riggs  wrote:
>
> On Mon, 6 Jul 2020 at 21:49, David Rowley  wrote:
>>
>> On Tue, 7 Jul 2020 at 00:43, Simon Riggs  wrote:
>> >
>> > On Mon, 6 Jul 2020 at 12:37, Robins Tharakan  wrote:
>> >> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
>> >> vs.
>> >> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); 
>> >> -- Faster
>> >
>> >
>> > Those two queries aren't logically equivalent, so you can't apply the 
>> > LIMIT 1 as an optimization.
>> >
>> > First query returns lots of random rows, the second query returns just one 
>> > random row.
>>
>> I think the idea here is that because the target list contains only
>> constants that pulling additional rows from the query after the first
>> one will just be a duplicate row and never add any rows after the
>> UNION is processed.
>
>
> OK, I see. Are you saying you think it's a worthwhile optimization to 
> autodetect?

I guess it's all about how much effort would be involved to detect
such cases vs how likely it is that we're going to speed up someone's
query.  I imagine it's not much effort to detect this, but also, this
is the first time I recall seeing this mentioned, so perhaps that
means not many people would be rewarded by making such a change. (It
does seem like quite a strange way to express the query.)

There is currently a patch floating around that implements UniqueKeys
which allows RelOptInfos to be tagged with the properties that they're
unique on.  With the current design of that patch, there is no way to
say "this relation *only* has duplicate rows".  Perhaps some design
tweaks there can make detecting this case cheaper in terms of CPU
effort during planning, and perhaps also in terms of how much code it
would take to make it work.  I'll try to keep this in mind when I
review that work soon.   If we were to start adding optimisations for
cases such as this, then I'd rather they were done in some
general-purpose way that just makes them "just work" rather than
adding special cases around the codebase that trigger some special
behaviour.  Likely in this case, it would still take at least some
code when planning setops.

Additionally, the setops code is badly in need of a rewrite, so the
bar is likely pretty high on adding too many smarts in there that we
need to migrate forwards after a rewrite. The setops planning code is
still not making use of the upper planner pathification work that Tom
did years ago. In many cases UNION would be far more optimal if it
were implemented as an Index Scan -> MergeAppend -> Unique.  I think
right now we only consider Append -> Sort -> Unique and Append -> Hash
Aggregate.


David




Re: Postgresql-12 taking more time to execute the query

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 02:48, Vishwa Kalyankar
 wrote:
>   Below is the output of the query explain and analyze result.

You may find better help here if you follow the advice given in
https://wiki.postgresql.org/wiki/Slow_Query_Questions

David




Re: Join optimization

2020-07-12 Thread David Rowley
On Sun, 12 Jul 2020 at 06:59,  wrote:
>
> I'm sorry for the bad example.
>
> Here is another, with some data on PG:  
> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=ccfd1c4fa291e74a6db9db1772e2b5ac
>   and Oracle:  
> https://dbfiddle.uk/?rdbms=oracle_18&fiddle=21a98f499065ad4e2c35ff4bd1487e14.

I believe what you're talking about is join removals.  It appears as
though Oracle is able to remove the inner join to the users table as
the join only serves to check the user record exists. No columns are
being selected.  The record must exist due to the foreign key
referencing users.

PostgreSQL currently can only remove left joins. Likely what you could
do here is just change your inner join into a left join. If you're
happy enough that the user record will always exist then that should
allow it to work.

The reason PostgreSQL does not currently support inner join is that by
default, foreign key constraints are only triggered at the end of the
query, (or if deferred, at the end of the transaction). WIth
PostgreSQL, it's possible for a SELECT query to see a violated foreign
key constraint.  This can happen if your SELECT query calls a function
which updates a referenced record.  The cascade of the foreign key
won't occur until the end of the statement, so the select may stumble
upon a violated foreign key.

Here's a quick example of this case:
drop table t1,t2;
create table t1 (a int primary key);
create table t2 (a int references t1 on update cascade);

insert into t1 values(1),(2);
insert into t2 values(2),(2);
create or replace function update_t1 (p_a int) returns int as $$ begin
update t1 set a = a + 1 where a = p_a; return p_a; end; $$ language
plpgsql volatile;

-- in theory, this should never return any rows as we're asking for
all rows that
-- don't exist in the referenced table. However, we do get a result
since the function
-- call updates t1 setting the row with a=2 to a=3. The cascade to t2
does not occur
-- until the end of the statement.
select update_t1(a+1),t1.a from t1 where not exists(select 1 from t2
where t1.a=t2.a);
 update_t1 | a
---+---
 2 | 1
(1 row)

If you're happy that you don't have any functions like that which
could momentarily cause the foreign key to appear violated, then there
shouldn't be any harm in changing the INNER JOIN on users to a LEFT
JOIN.  PostgreSQL will be able to remove the join in that case.

David




Re: some random() clarification needed

2020-07-14 Thread David Rowley
On Wed, 15 Jul 2020 at 04:01, Marc Millas  wrote:
> your answer helps me understand my first problem.
> so, I rewrote a simple loop so as to avoid the "volatile" behaviour.

Not sure what you're trying to do with the plpgsql, but you can just
escape the multiple evaluations by putting the volatile function in a
sub-select with no FROM clause.

SELECT ... FROM ... WHERE id = (SELECT ceiling(random()*2582));

Or the more traditional way to get a random row is:

SELECT ... FROM ... WHERE id BETWEEN 0 AND 2585 ORDER BY random() LIMIT 1;

David




Re: Same query taking less time in low configuration machine

2020-07-15 Thread David Rowley
On Thu, 16 Jul 2020 at 09:50, Thomas Munro  wrote:
> I'd look at EXPLAIN (BUFFERS) or auto_explain.log_buffers to get more
> information on cache hits and misses.  I'd look for settings
> differences with EXPLAIN (SETTINGS) to see if there's anything
> accidentally set differently (maybe JIT or paralelism or something
> like that).  I'd look at pg_stat_activity repeatedly while it runs to
> see what the processes are doing, especially the wait_event column.
> I'd also look at the CPU and I/O on the systems with operating system
> tools like iostat, htop, perf to try to find the difference.

It might also be good to look at size of the tables and indexes that
are looked at within the function. If the smaller end machine was
loaded with data via pg_restore form a pg_dump taken from the larger
machine then the indexes might be in much better shape and the heap
may have less bloat.  Of course, that's just speculation. We've not
seen what the function does yet.

Vishwa, it would be good if you could follow the guide here:
https://wiki.postgresql.org/wiki/Slow_Query_Questions , in particular:

"Post the definitions of all tables and indexes referenced in the
query. If the query touches views or custom functions, we'll need
those definitions as well. Run psql command "\d table" with the
tables/views/indices referenced in the problem query."

David




Re: Row estimates for empty tables

2020-07-23 Thread David Rowley
On Fri, 24 Jul 2020 at 16:01, Christophe Pettus  wrote:
> I realize I've never quite known this; where does the planner get the row 
> estimates for an empty table?  Example:

We just assume there are 10 pages if the relation has not yet been
vacuumed or analyzed. The row estimates you see are the number of
times 1 tuple is likely to fit onto a single page multiplied by the
assumed 10 pages.  If you had made your table wider then the planner
would have assumed fewer rows

There's a comment that justifies the 10 pages, which, as of master is
in table_block_relation_estimate_size(). It'll be somewhere else in
pg12.

* HACK: if the relation has never yet been vacuumed, use a minimum size
* estimate of 10 pages.  The idea here is to avoid assuming a
* newly-created table is really small, even if it currently is, because
* that may not be true once some data gets loaded into it.  Once a vacuum
* or analyze cycle has been done on it, it's more reasonable to believe
* the size is somewhat stable.
*
* (Note that this is only an issue if the plan gets cached and used again
* after the table has been filled.  What we're trying to avoid is using a
* nestloop-type plan on a table that has grown substantially since the
* plan was made.  Normally, autovacuum/autoanalyze will occur once enough
* inserts have happened and cause cached-plan invalidation; but that
* doesn't happen instantaneously, and it won't happen at all for cases
* such as temporary tables.)
*
* We approximate "never vacuumed" by "has relpages = 0", which means this
* will also fire on genuinely empty relations.  Not great, but
* fortunately that's a seldom-seen case in the real world, and it
* shouldn't degrade the quality of the plan too much anyway to err in
* this direction.
*
* If the table has inheritance children, we don't apply this heuristic.
* Totally empty parent tables are quite common, so we should be willing
* to believe that they are empty.

The code which decides if the table has been vacuumed here assumes it
has not if pg_class.relpages == 0. So even if you were to manually
vacuum the table the code here would think it's not yet been vacuumed.

David




Re: bad JIT decision

2020-07-24 Thread David Rowley
On Sat, 25 Jul 2020 at 08:46, Scott Ribe  wrote:
> Given the magnitude of the miss in using JIT here, I am wondering: is it 
> possible that the planner does not properly take into account the cost of 
> JIT'ing a function for multiple partitions? Or is it that the planner doesn't 
> have enough info about the restrictiveness of conditions, and is therefore 
> anticipating running the functions against a great many rows?

It does not really take into account the cost of jitting. If the total
plan cost is above the jit threshold then jit is enabled. If not, then
it's disabled.

There are various levels of jit and various thresholds that can be tweaked, see:

select name,setting from pg_settings where name like '%jit%';

But as far as each threshold goes, you either reach it or you don't.
Maybe that can be made better by considering jit in a more cost-based
way rather than by threshold, that way it might be possible to
consider jit per plan node rather than on the query as a whole. e.g,
if you have 1000 partitions and 999 of them have 1 row and the final
one has 1 billion rows, then it's likely a waste of time to jit
expressions for the 999 partitions.

However, for now, you might just want to try raising various jit
thresholds so that it only is enabled for more expensive plans.

David




Re: bad JIT decision

2020-07-24 Thread David Rowley
On Sat, 25 Jul 2020 at 10:37, Tom Lane  wrote:
>
> David Rowley  writes:
> > However, for now, you might just want to try raising various jit
> > thresholds so that it only is enabled for more expensive plans.
>
> Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> because we are constantly seeing complaints of this sort.

I think plan cost overestimation is a common cause of unwanted jit too.

It would be good to see the EXPLAIN ANALYZE so we knew if that was the
case here.

David




Re: bad JIT decision

2020-07-24 Thread David Rowley
On Sat, 25 Jul 2020 at 10:42, David Rowley  wrote:
>
> On Sat, 25 Jul 2020 at 10:37, Tom Lane  wrote:
> >
> > David Rowley  writes:
> > > However, for now, you might just want to try raising various jit
> > > thresholds so that it only is enabled for more expensive plans.
> >
> > Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> > because we are constantly seeing complaints of this sort.
>
> I think plan cost overestimation is a common cause of unwanted jit too.
>
> It would be good to see the EXPLAIN ANALYZE so we knew if that was the
> case here.

So Scott did send me the full EXPLAIN ANALYZE for this privately. He
wishes to keep the full output private.

After looking at it, it seems the portion that he pasted above, aka:

->  Index Scan using equities_rds_id on equities e0  (cost=0.42..33.74
rows=1 width=37) (actual time=6751.892..6751.892 rows=0 loops=1)
   Index Cond: (rds_id = ANY ('{..., ..., ..., ...}'::uuid[]))
   Filter: (security_type = 'ETP'::text)
   Rows Removed by Filter: 4

Is nested at the bottom level join, about 6 joins deep.  The lack of
any row being found results in upper level joins not having to do
anything, and the majority of the plan is (never executed).

David




Re: bad JIT decision

2020-07-25 Thread David Rowley
On Sun, 26 Jul 2020 at 02:17, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Sat, 25 Jul 2020 at 10:42, David Rowley  wrote:
> >> I think plan cost overestimation is a common cause of unwanted jit too.
> >> It would be good to see the EXPLAIN ANALYZE so we knew if that was the
> >> case here.
>
> > So Scott did send me the full EXPLAIN ANALYZE for this privately. He
> > wishes to keep the full output private.
>
> So ... what was the *top* line, ie total cost estimate?

Hash Right Join  (cost=1200566.17..1461446.31 rows=1651 width=141)
(actual time=5881.944..5881.944 rows=0 loops=1)

So well above the standard jit inline and optimize cost

David




Re: bad JIT decision

2020-07-25 Thread David Rowley
On Sun, 26 Jul 2020 at 02:23, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2020-07-24 18:37:02 -0400, Tom Lane wrote:
> >> Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> >> because we are constantly seeing complaints of this sort.
>
> > I think the issue is more that we need to take into accoutn that the
> > overhead of JITing scales ~linearly with the number of JITed
> > expressions. And that's not done right now.  I've had a patch somewhere
> > that had a prototype implementation of changing the costing to be
> > #expressions * some_cost, and I think that's a lot more accurate.
>
> Another thing we could try with much less effort is scaling it by the
> number of relations in the query.  There's already some code in the
> plancache that tries to estimate planning effort that way, IIRC.
> Such a scaling would be very legitimate for the cost of compiling
> tuple-deconstruction code, and for other expressions it'd kind of
> amount to an assumption that the expressions-per-table ratio is
> roughly constant.  If you don't like that, maybe some simple
> nonlinear growth rule would work.

I had imagined something a bit less all or nothing.  I had thought
that the planner could pretty cheaply choose if jit should occur or
not on a per-Expr level.  For WHERE clause items we know "norm_selec"
and we know what baserestrictinfos come before this RestrictInfo, so
we could estimate the number of executions per item in the WHERE
clause. For Exprs in the targetlist we have the estimated rows from
the RelOptInfo. HAVING clause Exprs will be evaluated a similar number
of times.   The planner could do something along the lines of
assuming, say 1000 * cpu_operator_cost to compile an Expr then assume
that a compiled Expr will be some percentage faster than an evaluated
one and only jit when the Expr is likely to be evaluated enough times
for it to be an overall win.  Optimize and inline would just have
higher thresholds.

David




Re: bad JIT decision

2020-07-25 Thread David Rowley
On Sun, 26 Jul 2020 at 02:54, Tom Lane  wrote:
>
> David Rowley  writes:
> > ... nested at the bottom level join, about 6 joins deep.  The lack of
> > any row being found results in upper level joins not having to do
> > anything, and the majority of the plan is (never executed).
>
> On re-reading this, that last point struck me forcibly.  If most of
> the plan never gets executed, could we avoid compiling it?  That is,
> maybe JIT isn't JIT enough, and we should make compilation happen
> at first use of an expression not during executor startup.

That's interesting.  But it would introduce an additional per
evaluation cost of checking if we're doing the first execution.

David




Re: bad JIT decision

2020-07-27 Thread David Rowley
On Tue, 28 Jul 2020 at 11:00, Andres Freund  wrote:
>
> On 2020-07-25 10:54:18 -0400, Tom Lane wrote:
> > David Rowley  writes:
> > > ... nested at the bottom level join, about 6 joins deep.  The lack of
> > > any row being found results in upper level joins not having to do
> > > anything, and the majority of the plan is (never executed).
> >
> > On re-reading this, that last point struck me forcibly.  If most of
> > the plan never gets executed, could we avoid compiling it?  That is,
> > maybe JIT isn't JIT enough, and we should make compilation happen
> > at first use of an expression not during executor startup.
>
> That unfortunately has its own downsides, in that there's significant
> overhead of emitting code multiple times. I suspect that taking the
> cost of all the JIT emissions together into account is the more
> promising approach.

Is there some reason that we can't consider jitting on a more granular
basis?  To me, it seems wrong to have a jit cost per expression and
demand that the plan cost > #nexprs * jit_expr_cost before we do jit
on anything.  It'll make it pretty hard to predict when jit will occur
and doing things like adding new partitions could suddenly cause jit
to not enable for some query any more.

ISTM a more granular approach would be better. For example, for the
expression we expect to evaluate once, there's likely little point in
jitting it, but for the one on some other relation that has more rows,
where we expect to evaluate it 1 billion times, there's likely good
reason to jit that.  Wouldn't it be better to consider it at the
RangeTblEntry level?

David




Re: is JIT available

2020-07-27 Thread David Rowley
On Tue, 28 Jul 2020 at 04:18, Scott Ribe  wrote:
>
> > On Jul 25, 2020, at 8:02 AM, Christoph Moench-Tegeder  
> > wrote:
> > pg_jit_available()  boolean  is JIT compilation available in this session
> >
> > https://www.postgresql.org/docs/12/functions-info.html
>
> Thanks, that seems to be exactly what I was looking for.
>
> Even though the documentation is not clear, it does return false when jit = 
> on but PG was not compiled with JIT.

If it's not clear we can certainly change it.

I looked at the manual page. It says:

"is JIT compilation available in this session (see Chapter 31)?
Returns false if jit is set to false."

Maybe this would be better?

"returns true if jit is enabled and JIT compilation is available in
this session (see Chapter 31)."

Open to other suggestions.

David




Re: is JIT available

2020-07-27 Thread David Rowley
On Tue, 28 Jul 2020 at 15:33, Tom Lane  wrote:
>
> David Rowley  writes:
> > Maybe this would be better?
>
> > "returns true if jit is enabled and JIT compilation is available in
> > this session (see Chapter 31)."
>
> The general, non-hacker meaning of "jit is enabled" would seem to
> be pretty much what this function is already doing; and for that
> matter, the same can be said for "JIT compilation is available".
> We need something that's less tautological-looking.  Maybe along
> the lines of
>
> "returns true if a JIT compiler extension is available and the
> jit parameter is set to on;

That's probably better.  FWIW, the "jit" is already a link to the GUC
docs, so I had in mind that users would have known we meant "jit" the
GUC rather than "jit" the feature.  Your wording will help for anyone
who thinks we're talking about the feature.

> when this is true, JIT compilation will be performed."

I'd probably drop this part since it's not really true. The query has
to exceed the cost thresholds before that'll happen.

David




Re: is JIT available

2020-07-28 Thread David Rowley
On Tue, 28 Jul 2020 at 15:55, David Rowley  wrote:
>
> On Tue, 28 Jul 2020 at 15:33, Tom Lane  wrote:
> >
> > David Rowley  writes:
> > > Maybe this would be better?
> >
> > > "returns true if jit is enabled and JIT compilation is available in
> > > this session (see Chapter 31)."
> >
> > The general, non-hacker meaning of "jit is enabled" would seem to
> > be pretty much what this function is already doing; and for that
> > matter, the same can be said for "JIT compilation is available".
> > We need something that's less tautological-looking.  Maybe along
> > the lines of
> >
> > "returns true if a JIT compiler extension is available and the
> > jit parameter is set to on;
>
> That's probably better.  FWIW, the "jit" is already a link to the GUC
> docs, so I had in mind that users would have known we meant "jit" the
> GUC rather than "jit" the feature.  Your wording will help for anyone
> who thinks we're talking about the feature.
>
> > when this is true, JIT compilation will be performed."
>
> I'd probably drop this part since it's not really true. The query has
> to exceed the cost thresholds before that'll happen.

I pushed a doc change for this with slightly revised wording from what
you mentioned.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d7c8576ebe3949a644c700a9f54d88e7e373a647

David




Re: determine what column(s) form the primary key, in C extention

2020-07-28 Thread David Rowley
On Wed, 29 Jul 2020 at 03:45, alex maslakov  wrote:
>  int i = -1;
>  while ((i = bms_next_member(pkattnos , i)) >= 0) {
>  /* do stuff with i */
>  /* you'll need to use i - FirstLowInvalidHeapAttributeNumber to
> get the pg_attribute.attnum */
>
>
>  elog(INFO, "bms_next_member i: %d", i);
>  }
>
> prints 10 and then 9
>
> Then:
>
>10 - FirstLowInvalidHeapAttributeNumber (-8) ==> 2
>
>9 - FirstLowInvalidHeapAttributeNumber (-8) ==> 1
>
> These are indexes of the columns, right?
>
> Do they start from 1, not from 0?

User attributes start at 1. Have a look at the pg_attribute system
catalogue table. The number you get will be the attnum column from
that table.

> (2)
>
> I'll use this C code as an example to build an extention in Rust. The
> Postgresql bindings for Rust I have don't contain a definition of
> `FirstLowInvalidHeapAttributeNumber` for some reason. I can define it
> since it's simply single digit constant.
>
> However what does in some source files it's defined as (-7) and in some
> as (-8)? Which should I use?

It did recently change from -8 to -7 when we removed Oid as a system
column in pg12. The number will never change on a major version, so
you'll always know what it is for versions that have already been
released.  There's always a chance it'll change from -7 in some future
PostgreSQL version though.

David




Re: is JIT available

2020-07-28 Thread David Rowley
On Wed, 29 Jul 2020 at 00:26, Scott Ribe  wrote:
> But does compilation with JIT enable and LLVM dev tools mean that all the 
> LLVM compilation/optimization is built into the PG binaries, or does it 
> require LLVM presence on the machine where deployed? And if so, does the 
> function take that into account as well?

It's not enough for just the build to have been built with jit
enabled. The jit extension must also be present on the machine. I
think the new wording in
https://www.postgresql.org/docs/devel/functions-info.html conveys
that:

"Returns true if a JIT compiler extension is available (see Chapter
31) and the jit configuration parameter is set to on."

David




Re: Costs of Heap Fetches in Postgres 13

2020-07-30 Thread David Rowley
On Fri, 31 Jul 2020 at 05:21,  wrote:
> does the planner do estimates about heap fetches on index only scans and 
> takes them into account?
> At least in Pg 13? If so, is it  possible to lower those costs? 
> random_page_costs seems not to have any influence.
> Looks like they cause bad planner decisions.
> Tuning Vacuum is not an option ;)

You'll probably need to give us an example of this not working by
means of an EXPLAIN output.

The planner uses the ratio of blocks marked as all visible from
pg_class.relallvisible and the current number of blocks in the
relation and applies random_page_cost to the expected heap blocks it
would read from an Index Scan offset by 1.0 - that ratio.

e.g. If the table has 1000 blocks and you have 900 marked as all
visible, and an index scan expects to read 200 blocks, then it will
apply random_page_cost * 200 * (1.0 - (900.0 / 1000.0)).  Which in
this case is 20 blocks. Your all visible ratio here is 90%, 10% are
not all visible, so 10% of 200 blocks is 20 blocks.

If I mock up a case like that and then tweak random_page_cost, then I
see the total cost changing just fine.  I did only test in master, but
we'll not have changed that since branching for PG13.

Perhaps you've got some tablespace level random_page_cost set and
you're not actually changing it?

David




Re: bad JIT decision

2020-08-02 Thread David Rowley
On Wed, 29 Jul 2020 at 09:07, Andres Freund  wrote:
> On 2020-07-28 11:54:53 +1200, David Rowley wrote:
> > Is there some reason that we can't consider jitting on a more granular
> > basis?
>
> There's a substantial "constant" overhead of doing JIT. And that it's
> nontrival to determine which parts of the query should be JITed in one
> part, and which not.
>
>
> > To me, it seems wrong to have a jit cost per expression and
> > demand that the plan cost > #nexprs * jit_expr_cost before we do jit
> > on anything.  It'll make it pretty hard to predict when jit will occur
> > and doing things like adding new partitions could suddenly cause jit
> > to not enable for some query any more.
>
> I think that's the right answer though:

I'm not quite sure why it would be so hard to do more granularly.

Take this case, for example:

create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);
insert into listp select 1,x from generate_Series(1,100) x;

The EXPLAIN looks like:

postgres=# explain select * from listp where b < 100;
QUERY PLAN
--
 Append  (cost=0.00..16967.51 rows=853 width=8)
   ->  Seq Scan on listp1 listp_1  (cost=0.00..16925.00 rows=100 width=8)
 Filter: (b < 100)
   ->  Seq Scan on listp2 listp_2  (cost=0.00..38.25 rows=753 width=8)
 Filter: (b < 100)
(5 rows)

For now, if the total cost of the plan exceeded the jit threshold,
then we'd JIT all the expressions. If it didn't, we'd compile none of
them.

What we could do instead would just add the jitFlags field into struct
Plan to indicate the JIT flags on a per plan node level and enable it
as we do now based on the total_cost of that plan node rather than at
the top-level of the plan as we do now in standard_planner(). The
jitFlags setting code would be moved to the end of
create_plan_recurse() instead.

In this case, if we had the threshold set to 1, then we'd JIT for
listp1 but not for listp2. I don't think this would even require a
signature change in the jit_compile_expr() function as we can get
access to the plan node from state->parent->plan to see which jitFlags
are set, if any.

David




Re: How to get the name of the current database in C function/extention inside a trigger?

2020-08-03 Thread David Rowley
On Mon, 3 Aug 2020 at 21:26, alex m  wrote:
> I'm writting a function/extention in C for a trigger. Inside a trigger, in C, 
> I want to get the name of the current database. However, not via SPI_exec(), 
> SPI_prepare() and the like, but more directly, in a more faster way.

You can use MyDatabaseId in miscadmin.h

If you git grep MyDatabaseId in src/contrib you can see some examples.

A good tip for the future would be to just look at the source code of
the built-in function and see how it does it.

David




Re: bad JIT decision

2020-08-03 Thread David Rowley
On Wed, 29 Jul 2020 at 09:28, Andres Freund  wrote:
> FWIW, I created a demo workload for this, and repro'ed the issue with
> that. Those improvements does make a very significant difference:

> Before:
> Timing: Generation 335.345 ms, Inlining 51.025 ms, Optimization 11967.776 
> ms, Emission 9201.499 ms, Total 21555.645 ms
> IR size: unoptimized: 9022868 bytes, optimized: 6206368 bytes
>
> After:
> Timing: Generation 261.283 ms, Inlining 30.875 ms, Optimization 1671.969 
> ms, Emission 18.557 ms, Total 1982.683 ms
> IR size: unoptimized 8776100 bytes, optimized 115868 bytes

That's a really impressive speedup.  However, no matter how fast we
make the compilation, it's still most likely to be a waste of time
doing it for plan nodes that are just not that costly.

I just wrote a patch to consider JIT on a per-plan-node basis instead
of globally over the entire plan. I'll post it to -hackers.

With a 1000 partition table where all of the cost is on just 1
partition, running a query that hits all partitions, I see:

Master jit=on:
 JIT:
   Functions: 3002
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 141.587 ms, Inlining 11.760 ms, Optimization
6518.664 ms, Emission 3152.266 ms, Total 9824.277 ms
 Execution Time: 12588.292 ms

Master jit=off:
 Execution Time: 3672.391 ms

Patched jit=on:
 JIT:
   Functions: 5
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.675 ms, Inlining 3.322 ms, Optimization 10.766
ms, Emission 5.892 ms, Total 20.655 ms
 Execution Time: 2754.160 ms

Most likely the EXPLAIN output will need to do something more than
show true/false for the options here, but I didn't want to go to too
much trouble unless this is seen as a good direction to go in.

> That obviously needs to be improved further, but it's already a lot
> better. In particular after these changes the generated code could be
> cached.

That would be a game-changer.

David




Re: Postgres 12 - default value for text column

2020-08-04 Thread David Rowley
On Wed, 5 Aug 2020 at 08:36, Raj Gandhi  wrote:
> The following alter table with default set to very large text used to work in 
> Postgres 10 but fails in Postgres 12 with ERROR:  row is too big: size 12960, 
> maximum size 8160

I didn't go to the trouble of debugging this, but I imagine this is
due to "Allow ALTER TABLE to add a column with a non-null default
without doing a table rewrite" mentioned in
https://www.postgresql.org/docs/11/release-11.html

In PG10 the table would have been rewritten when you add a NOT NULL
column with a DEFAULT. From PG11 onwards no rewrite takes place and
the default value is stored in pg_attribute.  Since pg_attribute does
not have a TOAST table, it's not possible to add NOT NULL columns
which have default values that won't fit in a heap page.

> The following two variants works in Postgres 12 without any error:
>
> create table test (id int);
> alter table test1 add column license text
> alter table test1 alter column license SET DEFAULT   ' with size more than 8160 >'

This behaves differently since existing rows won't receive the DEFAULT
value. Only new rows will. PostgreSQL12  does not need to store the
missing value in pg_attribute when you do this. Existing rows will
just have a NULL value for the new column.

> create table test (id int, license text  DEFAULT   ' size more than 8160 >' );

Likewise. No missing value needs to be stored here as no rows exist
that need that value. Rows that are added with no value specified for
the license column will just have the DEFAULT value, which is the one
stored in pg_attrdef.

David




Re: Understanding EXPLAIN ANALYZE estimates when loops != 1

2020-08-19 Thread David Rowley
On Thu, 20 Aug 2020 at 09:55, Philip Semanchuk
 wrote:
> I could use some help interpreting EXPLAIN ANALYZE output.
>
> ->  Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) 
> (actual time=0.006..0.918 rows=3760 loops=94)
>
> The actual rows returned by this plan node ~= 3760 * 94 = 353,440.

Yes.  It's total rows / loops rounded to the nearest integer number.

> Did postgres expect (estimate) 3283 rows from this join, or 3283 * 94 = 
> 308,602?

Yes, that's the case at least when the node is not a Parallel node.
If this index scan was part of a parameterized nested loop, then
you'll see the estimate of the number of expected loops from the outer
side of the join.

> Same question for this node.
>
> ->  Parallel Index Scan using pk_xyz on xyz  (cost=0.29..2354.67 rows=54285 
> width=25) (actual time=0.049..6.326 rows=14864 loops=5)
>
> Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5?

So parallel plans are a bit more complex.   The row estimates are the
total estimated rows  / the amount of workers we expect to do useful
work.  You might expect the divisor there to be an integer number
since you can't really have 0.5 workers.  However, it's more complex
than that since the leader has other tasks to take care of such as
pulling tuples from workers, it's not dedicated to helping out.

If you're into reading C code, then there's more information in
https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c#L5699
 , if you hunt around for usages of that function then you'll see the
estimated row counts are divided by the return value of that function.

David




Re: Understanding EXPLAIN ANALYZE estimates when loops != 1

2020-08-20 Thread David Rowley
On Fri, 21 Aug 2020 at 03:21, Philip Semanchuk
 wrote:
>
>
>
> > On Aug 19, 2020, at 6:24 PM, David Rowley  wrote:
> >
> > On Thu, 20 Aug 2020 at 09:55, Philip Semanchuk
> >  wrote:
> >> ->  Parallel Index Scan using pk_xyz on xyz  (cost=0.29..2354.67 
> >> rows=54285 width=25) (actual time=0.049..6.326 rows=14864 loops=5)
> >>
> >> Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5?
> >
> > So parallel plans are a bit more complex.   The row estimates are the
> > total estimated rows  / the amount of workers we expect to do useful
> > work.  You might expect the divisor there to be an integer number
> > since you can't really have 0.5 workers.  However, it's more complex
> > than that since the leader has other tasks to take care of such as
> > pulling tuples from workers, it's not dedicated to helping out.
>
> Sounds like it help to set max_parallel_workers = 1 before running EXPLAIN 
> ANALYZE in order to simplify the numbers, yes? Or is there a possibility that 
> doing so would send the planner down an entirely different path?

Since the costs are also divided (see costsize.c) and the path costs
are the basis on which paths the planner will choose to use, you're
likely to see the plan changing.  max_parallel_workers set to 1
wouldn't have been very helpful anyway since that's the leader process
+ 1 parallel worker resulting in the divisor of 1.7.

If you need to, you can just reverse engineer the costs from assuming
what get_parallel_divisor() will have returned. You can see it expects
each worker to take up 30% of its time.  leader + 1 worker = 1.7,
leader + 2 workers = 2.4, leader + 3 workers = 3.1, leader + 4 workers
= 4. You'll know the number of workers from "Workers Planned" in the
EXPLAIN output. You'd need to do something else if you happen to run
with parallel_leader_participation = off.

David




Re: Query plan prefers hash join when nested loop is much faster

2020-08-24 Thread David Rowley
On Sat, 22 Aug 2020 at 00:35, iulian dragos
 wrote:
> I am trying to understand why the query planner insists on using a hash join, 
> and how to make it choose the better option, which in this case would be a 
> nested loop.

> |   ->  Index Scan using 
> test_result_module_result_id_idx on test_result  (cost=0.57..6911.17 
> rows=4331 width=12) (actual time=0.002..0.002 rows=1 loops=14824) |
> | Index Cond: (module_result_id = 
> module_result.id) 
> |

You might want to check if the pg_stats view reports a realistic
n_distinct value for test_result.module_result_id.  If the
pg_class.retuples is correct for that relation then that would
indicate the n_distinct estimate is about 115000. Going by the number
of rows you've mentioned it would appear a more realistic value for
that would be -0.4. which is 0 - 1 / (5 / 2.0).
However, that's assuming each module_result  has a test_result.  You
could run a SELECT COUNT(DISTINCT module_result_id) FROM test_result;
to get a better idea.

If ANALYZE is not getting you a good value for n_distinct, then you
can overwrite it. See [1], search for n_distinct.

David

[1] https://www.postgresql.org/docs/current/sql-altertable.html




Re: Query plan prefers hash join when nested loop is much faster

2020-08-25 Thread David Rowley
On Tue, 25 Aug 2020 at 22:10, iulian dragos
 wrote:
> Thanks for the tip! Indeed, `n_distinct` isn't right. I found it in pg_stats 
> set at 131736.0, but the actual number is much higher: 210104361. I tried to 
> set it manually, but the plan is still the same (both the actual number and a 
> percentage, -0.4, as you suggested):

You'll need to run ANALYZE on the table after doing the ALTER TABLE to
change the n_distinct.  The ANALYZE writes the value to pg_statistic.
ALTER TABLE only takes it as far as pg_attribute's attoptions.
ANALYZE reads that column to see if the n_distinct estimate should be
overwritten before writing out pg_statistic

Just remember if you're hardcoding a positive value that it'll stay
fixed until you change it. If the table is likely to grow, then you
might want to reconsider using a positive value and consider using a
negative value as mentioned in the doc link.

David




Re: how to get top plan of GatherMerge in OSS10

2020-09-04 Thread David Rowley
On Fri, 4 Sep 2020 at 20:21, Yang, Rong  wrote:
> in OSS 10, how  to make the top plan of the plan tree to GatherMerge with 
> ‘except all’ in sql?

(I guess you're talking about PostgreSQL 10. I'm not sure what OSS 10 is.)

The ability for Gather Merge to work with Init Plan values was only
added in [1], which was new to PostgreSQL 11. So you're not going to
make that work in PostgreSQL 10.

You could try crafting the query in such a way that an Init plan is
not used. e.g a CROSS JOIN, but not sure if that'll improve your
performance any. So you could try reducing the parallel_tuple_cost a
bit, which might give you a Gather, so at least the Seq Scan will be
done in parallel. The sort will still be serial though.

A btree index on table1 (c) looks like it might be worth considering.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e89a71fb44




Re: bug in PG13?

2020-11-01 Thread David Rowley
On Thu, 15 Oct 2020 at 03:53, Andreas Kretschmer
 wrote:
> it seems to me a bug. i have a partitioned table:

I've just pushed a fix [1] for this to master only (PG14+)

The problem was that we only added the required information to allow
the executor to perform run-time pruning to the Append/MergeAppend for
the top-level Append.   The example you've given actually did have a
nested-Append at one point during planning.  However, since the
top-level Append only had a single sub-plan, it was removed and that
single sub-plan was used instead.  Since that single sub-plan happened
to be an Append, there was no run-time pruning information to allow
the executor to prune away the unneeded partitions.

The fix for this was a bit too invasive to go backpatching it.
Run-time pruning was coded purposefully to only prune on the top-level
Append/Merge Append.  In hindsight, that was likely a bad choice, but
it was the choice that was made originally, so I'm leaning towards not
classing this as a bug.  After thinking about this all over again, it
seems there are more legitimate reasons to have nested Append/Merge
Appends than I had thought when I was originally working on run-time
pruning, so it makes sense to allow run-time pruning on those to work
going forward.

Thanks for the report.

David

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a929e17e5a8c9b751b66002c8a89fdebdacfe194




Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

2021-02-14 Thread David Rowley
On Sun, 14 Feb 2021 at 13:15, Seamus Abshere
 wrote:
> The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
>
> /*
>  * If the use of parallel append is permitted, always request 
> at least
>  * log2(# of children) workers.
>
> In my case, every partition takes 1 second to scan, I have 64 cores, I have 
> 64 partitions, and the wall time is 8 seconds with 8 workers.
>
> I assume that if it it planned significantly more workers (16? 32? even 64?), 
> it would get significantly faster (even accounting for transaction cost). So 
> why doesn't it ask for more? Note that I've set max_parallel_workers=512, 
> etc. (postgresql.conf in my first message).

There's perhaps an argument for allowing ALTER TABLE  SET (parallel_workers=N); to be set on partitioned tables, but
we don't currently allow it.

What you might want to try is setting that for any of those 64
partitions.  Shortly above the code comment that you quoted above,
there's some code that finds the path for the partition with the
maximum number of parallel workers. If one of those partitions is
using, say 64 workers because you set the partitions
"parallel_workers" setting to 64, and providing you have
max_parallel_workers_per_gather set highly enough, then your Append
should get 64 workers.

You'll need to be careful though since changing the partitions
parallel_workers may affect things for other queries too. Also, if you
were to only change 1 partition and that partition were to be pruned,
then you'd not get the 64 workers.

David




Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

2021-02-15 Thread David Rowley
On Mon, 15 Feb 2021 at 10:16, Gavin Flower
 wrote:
> Just wondering why there is a hard coded limit.

I don't see where the hardcoded limit is.  The total number is limited
to max_parallel_workers_per_gather, but there's nothing hardcoded
about the value of that.

> While I agree it might be good to be able specify the number of workers,
> sure it would be possible to derive a suitable default based on the
> number of effective processors available?

It's a pretty tricky thing to get right.  The problem is that
something has to rationalise the use of parallel worker processes.
Does it seem reasonable to you to use the sum of the Append child
parallel workers?  If so, I imagine someone else would think that
would be pretty insane. We do have to consider the fact that we're
trying to share those parallel worker processes with other backends
which also might want to get some use out of them.

As for if we need some rel option for partitioned tables.  I think
that's also tricky. Sure, we could go and add a "parallel_workers"
relopt to partitioned tables, but then that's going to be applied
regardless of how many partitions survive partition pruning.  There
could be as little as 2 subpaths in an Append, or the number could be
in the thousands. I can't imagine anyone really wants the same number
of parallel workers in each of those two cases.  So I can understand
why ab7271677 wanted to take into account the number of append
children.

Maybe there's some room for some other relopt that just changes the
behaviour of that code.  It does not seem too unreasonable that
someone might like to take the sum of the Append child parallel
workers.  That value would still be capped at
max_parallel_workers_per_gather, so it shouldn't ever go too insane
unless someone set that GUC to something insane, which would be their
choice.  I'm not too sure which such a relopt would be called.

Additionally, for the case being reported here.  Since all Append
children are foreign tables, there is actually some work going on to
make it so workers don't have to sit by and wait until the foreign
server returns the results.  I don't think anyone would disagree that
it's pretty poor use of a parallel worker to have it sit there doing
nothing for minutes at a time waiting for a single tuple from a
foreign data wrapper.  I'm not sure of the status of that work, but if
you want to learn more about it, please see [1]

David

[1] https://commitfest.postgresql.org/32/2491/




Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

2021-02-15 Thread David Rowley
On Tue, 16 Feb 2021 at 02:12, Fabio Pardi  wrote:
>
> On 14/02/2021 22:16, Gavin Flower wrote:
> > While I agree it might be good to be able specify the number of workers, 
> > sure it would be possible to derive a suitable default based on the number 
> > of effective processors available?
>
> I had the same problem and my conclusion was that it is not possible to go 
> above 8 cores because of Amdahl's law on parallel computing. More here: 
> https://en.wikipedia.org/wiki/Amdahl%27s_law

That would really depend on what the non-parallel part of the equation
was.  There are some plan shapes such as GROUP BY or aggregate queries
with very few or just 1 group where the serial portion of the
execution is very small indeed.

David




Re: Deleting takes days, should I add some index?

2021-02-25 Thread David Rowley
On Fri, 26 Feb 2021 at 02:06, Alexander Farber
 wrote:
> However the deletion still takes forever and I have to ctrl-c it:
>
> # delete from words_games where created < now() - interval '12 month';
>
> Do you please have any further suggestions?
>
> When I try to prepend "explain analyze" to the above query, then in the 
> production database it also lasts forever.

EXPLAIN with ANALYZE executes the query. So it'll probably to take just as long.

Since your foreign keys perform a cascade delete on the tables
referencing the tables you're deleting from, any records in those
referencing tables will be deleted too.  You must also look at those
referencing tables and see what references those and index the
column(s) which are referencing.

Here's a simplified example that's easier to understand than your case.

Setup:
create table t1 (id int primary key);
create table t2 (id int primary key, t1_id int not null references t1
on update cascade on delete cascade);
create index on t2 (t1_id);
create table t3 (id int primary key, t2_id int not null references t2
on update cascade on delete cascade);

So I have 2 levels of reference.  t2 -> t1 and t3 -> t2.
If I remove a row from t1 then PostgreSQL must perform: DELETE FROM t2
WHERE t1_id = ;

Luckily I indexed t2(t1_id), so that should be fast.

Since t3 references t2, the database must also perform: DELETE FROM t3
WHERE t2_id = ; for the row that gets
removed from t2.

Unfortunately, I forgot to index t3(t2_id).

Let me insert some data and see how the lack of index effects performance:

insert into t1 select x from generate_Series(1,100) x;
insert into t2 select x,x from generate_Series(1,100) x;
insert into t3 select x,x from generate_Series(1,100) x;

Delete 100 records.

delete from t1 where id <= 100;
DELETE 100
Time: 8048.975 ms (00:08.049)

Pretty slow.

create index on t3 (t2_id);
CREATE INDEX

(truncate t1 cascade and reinsert the data)

delete from t1 where id <= 100;
DELETE 100
Time: 5.151 ms

Better.

So, you need to follow each of the "Referenced by" from the table
you're deleting from. In the \d output, just ignore the tables
mentioned in "Foreign-key constraints:".  Those are only checked on
INSERT/UPDATE and must already contain a proper unique constraint and
therefore index.

David




Re: Pgbackrest version 2.28 Bug/Issue

2021-03-02 Thread David Rowley
On Tue, 2 Mar 2021 at 21:53, Brajendra Pratap Singh
 wrote:
> We are getting the below issue while executing the backup through pgbackrest.

> 2021-03-02 02:10:01.622 P00  ERROR: [056]: unable to find primary cluster - 
> cannot proceed

That's not an error that's coming from PostgreSQL.

You might have better luck asking the authors of pgbackrest. Check
their website for details on how they want issues reported.

David




Re: PostgreSQL : bug (ou not) in CTE Common Table Expressions or Subqueries in the FROM Clause

2022-03-22 Thread David Rowley
On Wed, 23 Mar 2022 at 04:03, PALAYRET Jacques
 wrote:
> For example, in Oracle it is not possible to have a SELECT statement without 
> a FROM clause (using DUAL table), so maybe " ambiguous " columns are not 
> handled in the same way.

But if your original complaint, the column was not ambiguous. The CTE
had no column named "id".

I really highly recommend that you write queries giving each relation
a short alias then make a habit to always prefix your column names
with the alias.  If you don't do this then you can have all sorts of
problems when you one day want to start dropping old unused columns
out of your tables.  If you alias these then dropping the columns will
properly cause queries that reference these columns to ERROR.  Without
aliases, your queries might just start doing something you don't want
them to do and you might not realise that for a very long time.

David




Re: Transaction and SQL errors

2022-04-04 Thread David Rowley
On Mon, 4 Apr 2022 at 23:33, Sebastien Flaesch
 wrote:
> Any arguments I could give to programmers?   Maybe:
> "You better replay all SQL of the whole transaction... (and make them as 
> short as possible!)"
> ?

There are SAVEPOINTs [1].

David

[1] https://www.postgresql.org/docs/current/sql-savepoint.html




Re: Select .... where id not in (....) returns 0 incorrectly

2022-04-04 Thread David Rowley
On Tue, 5 Apr 2022 at 01:21, J. Roeleveld  wrote:
> Personally, I think NULL should be treated as a seperate value and not lead to
> strange behaviour.

I think the rationale behind IN and NOT IN are that c IN(1,2,3) is
equivalent of writing: c = 1 OR c = 2 OR c = 3, whereas NOT IN(1,2,3)
would be the same as c <> 1 AND c <> 2 AND c <> 3.  You can imagine
what would happen in the latter case if you replaced 3 with NULL. "c
<> NULL" is NULL therefore, due to the quals being ANDed, will cause
the WHERE clause not to match anything.

In any case, it's what the SQL standard says, so that's the way we do it.

David




Re: unoptimized nested loops

2022-06-01 Thread David Rowley
On Wed, 1 Jun 2022 at 08:04, Tim Kelly  wrote:
>   ->  Seq Scan on data  (cost=0.00..47132.93 rows=125 width=135)
> (actual time=0.542..182952.708 rows=1167810 loops=1)
> Filter: (data ~~ '%some text%'::text)

The problem is coming from the 125 row estimate in the above plan
fragment.  Because the number of estimated rows is low, the query
planner likely thinks a Nested Loop join is best.

What I'd do first is verify that some other join method is better by
running the query after having done:

SET enable_nestloop TO off;

RESET enble_nestloop;

If the query then runs faster then it's going to be worth doing
something about trying to improve those statistics.

I see the like pattern matching selectivity estimation code does look
at histogram buckets, so you might have luck if you increase the
statistics targets on this column:

ALTER TABLE data ALTER COLUMN data SET STATISTICS 1000;
ANALYZE data;

The standard number of buckets is 100. The above will set it to 1000.
You can go as high as 1, but going too high is going to slow down
the planner, so you should only go as high as you need to go.

David




Re: Unique index prohibits partial aggregates

2022-06-27 Thread David Rowley
On Mon, 27 Jun 2022 at 23:49, Bos, Fred  wrote:
> "Settings: effective_cache_size = '128GB', force_parallel_mode = 'on', 
> max_parallel_workers = '40',

You shouldn't be using force_parallel_mode. It does not do what you
think. See the documentation for that GUC, or read [1]

> I expected the query to become faster
> with a unique index or column, so why does the query planner decide on group
> aggregation instead of partial aggregation?

It just simply does not know how many groups are likely to exists on
your expression.  Statistics are only gathered on bare columns. The
planner has no idea how many groups are likely to exist for
"t/(1000*3600*24)".

In PostgreSQL 14 and above you could create extended statistics for
the expression using:

create statistics t1_t_stats (ndistinct) on (t/(1000*3600*24)) from
bhload_nohyp_noin; -- pg14
analyze bhload_nohyp_noin;

for your version (pg13), you'd need to create an expression index.

create index on bhload_nohyp_noin using brin ((t/(1000*3600*24)));
analyze bhload_nohyp_noin;

I added "using brin" as a brin index is going to be a bit more
lightweight than a btree index.  You only need the index to instruct
ANALYZE to gather statistics. You might also want to ramp up the pages
per range.

With that, the planner might then realise that parallel aggregate
might be worthwhile.

David

[1] 
https://www.enterprisedb.com/postgres-tutorials/using-forceparallelmode-correctly-postgresql




Re: Unique index prohibits partial aggregates

2022-06-28 Thread David Rowley
On Wed, 29 Jun 2022 at 00:45, Bos, Fred  wrote:
> Finally, is there a way to force postgres to do the partial hash aggregate,
> either by changing a setting or by influencing the expected amount of output
> groups for each query?

You could do something like:

ALTER TABLE bhload_nohyp_noin ALTER COLUMN t SET (n_distinct = 200);
ANALYZE bhload_nohyp_noin;

Please be aware that this may have detrimental effects if you do any
joins or group bys directly on this column.  Otherwise, providing you
don't have a unique index on that column, then it should trick the
planner into thinking there will be fewer groups than it currently
thinks there will be, which will likely result in the parallel plan
that you desire.

David




Re: Multiple Indexes

2022-07-06 Thread David Rowley
On Thu, 7 Jul 2022 at 04:07, DAVID ROTH  wrote:
> I understand the planner can use multiple indexes to get the best plan.
> Can someone point me a paper that explains how this works.

I don't know of a paper, but if you're talking about using multiple
indexes to scan a single relation in order to satisfy a condition such
as; WHERE a = 5 OR b = 12; then the query planner is able to make use
of "bitmap index scans".  A bitmap index scan simply scans an index
type which supports such scans and collects a set of ctids.  For this
example, providing there's a suitable index on the "a" column and
another on the "b" column, the planner may choose to perform a bitmap
index scan on the "a" index and another on the "b" index then perform
a bitmap OR operation to obtain the intersecting ctids.  The heap of
the table can then be scanned to fetch the intersecting ctids.

A ctid is the physical (more physiological) address of a tuple the
heap of a table.

David




Re: limits, indexes, views and query planner

2022-08-15 Thread David Rowley
On Mon, 15 Aug 2022 at 20:30, Marc Mamin  wrote:
> in the example below, we can see that the view test_ab prevents the usage of 
> the index to retrieve the top last rows.
> This is a special case, as the where clause excludes data from the second 
> table, and the explain output do not references it at all.
> I wonder if the planner could be able to exclude the table_b earlier in its 
> plan and to possibly fallback to a plan equivalent to the first one.
> with a view on a single table (test_av), the index is used.

I think you might be confusing UNION and UNION ALL.   PostgreSQL is
under no obligation to uniquify the final result in your version of
the view with a single table, but it is when you UNION both tables
together, regardless of if one of the scans causes one of the union
branches to be removed or not.

Having said that, the planner still could do a better job of this as
it could have done Limit -> Unique -> Merge Append -> Index Scan.  I
do have some work in progress code to rewrite the union planner so
that it able to request pre-sorted input to allow Merge Append to be
used rather than Unique -> Sort. I'm not sure if it'll help this case
or not.

David




Re: ***SPAM*** Re: Can I get the number of results plus the results with a single query?

2022-08-16 Thread David Rowley
On Tue, 16 Aug 2022 at 21:15, Walter Dörwald  wrote:
> select count(*) over (), e.* from email.email e;

Depending on the complexity of the query, putting the count(*) as a
subquery in the SELECT clause might execute more quickly. i.e. select
(select count(*) from email.email) c, * from email.emails; A
non-correlated subquery will become an "initplan", which will execute
only once.

The reason the window aggregate might be slower is due to the fact
that internally PostgreSQL will store tuples in the window frame in a
tuplestore.  In this example, all rows will be put in that tuple store
at once. These tuple stores will spill to disk when they exceed
work_mem.  On the other hand, the window aggregate version could be
faster if fetching each tuple from the outer query was sufficiently
more expensive than storing it in a tuple store. That could easily
happen if the query contained expensive joins or many rows were
filtered out. That could be more expensive as the subquery version
would have to do that twice whereas the window agg version would only
have to do that once.

David




Re: Different execution plan between PostgreSQL 8.2 and 12.5

2022-08-17 Thread David Rowley
On Thu, 18 Aug 2022 at 15:32, Tom Lane  wrote:
> The 12.5 plan looks like it thinks that the join condition is not
> hashable --- and probably not mergeable as well, else it would have
> done a mergejoin.  This is odd if we assume that the lower()
> outputs are just text.  But you haven't said anything about the
> data types involved, nor what locale setting you're using, nor
> what nondefault settings or extensions you might be using, so
> speculation about the cause would just be speculation.

In addition to that, I couldn't help notice that the quoted SQL does
not seem to belong to the explain.  The EXPLAIN has a Limit node, but
the query does not. I'm assuming this isn't due to the relations being
views since we don't pull up subqueries with a LIMIT.

The costs for the 12.5 are cheaper than 8.4's, so I imagine the more
likely cause is the planner favouring an early startup plan.

It's probably more likely that lower() is providing the planner with
bad estimates and there's likely far less than the expected rows,
resulting in the LIMIT 1 being a much larger proportion of the
total rows than the planner expects.

David




Re: Restriction on table partition expressions

2022-08-25 Thread David Rowley
On Fri, 26 Aug 2022 at 03:08, James Vanns  wrote:
> CREATE TABLE foobar(
> id BIGINT NOT NULL PRIMARY KEY,
> baz VARCHAR NULL DEFAULT NULL
> ) PARTITION BY HASH(my_func(id));
>
> Error: primary key constraints cannot be used when partition keys
> include expressions.

> I couldn't find much, if anything, about using expressions in table
> partitions let alone describing the restriction. Can anyone enlighten
> me? Or point me to what I've missed! Also, is there a chance that this
> limitation will be relaxed in the future?

This is mentioned in [1]:

"Unique constraints (and hence primary keys) on partitioned tables
must include all the partition key columns. This limitation exists
because the individual indexes making up the constraint can only
directly enforce uniqueness within their own partitions; therefore,
the partition structure itself must guarantee that there are not
duplicates in different partitions."

Maybe that could be more clear and mention that all the primary key
columns must be present and not be part of a function call or
expression.

David

[1] https://www.postgresql.org/docs/13/ddl-partitioning.html




Re: Restriction on table partition expressions

2022-08-25 Thread David Rowley
On Fri, 26 Aug 2022 at 03:08, James Vanns  wrote:
> Also, is there a chance that this
> limitation will be relaxed in the future?

(forgot to answer this part)

Certainly not in the near future, I'm afraid.  It would require
allowing a single index to exist over multiple tables. There has been
discussions about this in the past and the general thoughts are that
if you have a single index over all partitions, then it massively
detracts from the advantages of partitioning.  With partitioning, you
can DETACH or DROP a partition and get rid of all the data quickly in
a single metadata operation.  If you have an index over all partitions
then that operation is no longer a metadata-only operation. It
suddenly needs to go and remove or invalidate all records pointing to
the partition you want to detach/drop.

David




  1   2   3   4   >