Re: Why is tuple_percent so low?
On 27 February 2018 at 18:03, Sam Saffron wrote: > 1. Where is all my missing space, is this in page alignment stuff and > per-page overhead? Yes, tuples are MAXALIGNed when copied onto the page. That appears to be 8 bytes on your system. You're just 2 bytes over the 8-byte boundary. You'll notice the table becoming more compact if you somehow could do away with the dsecs column. > 2. Is there any other schemes I can look at for storing this data to > have a more efficient yet easily queryable / updateable table. Hard to say without learning about the use case. Some sort of de-normalisation may help to allow fewer tuples, e.g storing one column in an array, but may be more painful from a design point of view. It may also cause pain from a dead-tuple point of view if you have to UPDATE large tuples frequently. > Keep in mind these tables get huge and in many of our cases will span > 10-20GB just to store this information. I guess it's all relative, but that does not seem that large. Is this causing some sort of performance problems you're trying to solve? or is this completely down to disk space consumption? If it's down to performance then you might be able to solve that problem with an index. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: index-only-scan when there is an index on all columns
On 28 February 2018 at 11:11, Andres Freund wrote: > I'm fairly certain that only vacuum and table rewrites like cluster sets > all-visible, I don't think the pages are set all visible again after a rewrite. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Is there a way to create a functional index that tables tableoid column as an arg?
On 11 March 2018 at 07:53, Ryan Murphy wrote: > I am using table inheritance and have e.g. the following tables: > > create table animal ( ... ); > create table dog ( ... ) inherits (animal); > create table cat ( ... ) inherits (animal); > create table person ( ... ) inherits (animal); > create table musician ( ... ) inherits (person); > create table politician ( ... ) inherits (person); > > Now I have a query that gets all the "animal"'s except for those that are > "person"'s. > > select * from only animal > > won't cut it, because it leaves out the dogs and cats. > > select *, tableoid::regclass relname from animal > where relname != 'person'::regclass > > also won't cut it because it leaves out the musicians and politicians. > > So I have created an immutable function is_a_kind_of(tbl regclass, > parent_tbl regclass) that returns true iff tbl is identical with, or > directly or indirectly inherits from, parent_tbl. For example: > > is_a_kind_of('person','person') => true > is_a_kind_of('person','animal') => true > is_a_kind_of('musician','person') => true > is_a_kind_of('animal','person') => false > is_a_kind_of('dog','person') => false > > No problems so far. Now my query works: > > select *,tableoid from "animal" > where not is_a_kind_of(tableoid::regclass::text, 'person') > > This query is somewhat slow though - I'd like to index the is_a_kind_of() > call. And Postgres supports functional indexes! So I try: > create index animal_is_person on animal ( > is_a_kind_of(tableoid::regclass, 'person') ); > > ERROR: index creation on system columns is not supported > > I see that this is because "tableoid" is a system column. Does anyone know > any workaround for this? So close yet so far away! I don't think having an index on the tableoid would help you here anyway. Previous versions did allow indexes on system columns, but that was disallowed recently due to the fact that the value of most system columns are changed internally by postgres and the indexes wouldn't correctly be updated. tableoid might be an exception to this, but it does not really seem like a useful column to index, giving it would be indexing the same value for each record in the table. The reason it's slow is that PostgreSQL is executing the function once per input row. In this case, that must be done because you're passing in tableoid as a function parameter, and that could change from one row to the next (once the Append moves to the next subplan). What you'd want to do instead is create a set-returning function that finds all the inherited children and returns them all. This will allow PostgreSQL to execute the function just once at the start of the query, then join the results to this function. Something like: create or replace function get_inherited_tables (prelid oid) returns setof oid as $$ with recursive c as ( select inhrelid,inhparent from pg_inherits where inhparent = prelid union all select i.inhrelid,i.inhparent from pg_inherits i inner join c on i.inhparent = c.inhrelid ) select prelid union all select inhrelid from c; $$ language sql stable; then have your queries do: -- get all animals that are persons select ... from animal where tableoid in (select get_inherited_tables('person'::regclass); -- get all animals that are not persons select ... from animal where tableoid not in (select get_inherited_tables('person'::regclass); Just be careful around search_paths and your use of regclass. In this case, if "animal" was not in the first schema in search_path, but someone created another table called "person" that was in the first schema listed in search_path, then the query would not do what you want. You might want to consider prefixing the input parameter into get_inherited_tables with the schema name too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: primary key and unique index
On 23 March 2018 at 20:55, Thomas Poty wrote: > In MySQL a "show create table" gives me : > ... > PRIMARY KEY (`ID`,`CountryCode`,`LanguageCode`), > UNIQUE KEY `unique_my table_4` (`ID`,`CountryCode`,`LanguageCode`), > ... > > So, In PostgreSQL, does it make sense to create a primary key AND a unique > index based on the same columns? > Is PostgreSQL smart enough to use the unique index created for the primary > key. Doing this makes no sense in PostgreSQL. I'm struggling to imagine why it would in MySQL. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_stat_statements: password in command is not obfuscated
On 24 March 2018 at 10:30, legrand legrand wrote: > It seems that passwords used in commands are not removed when caught by > pg_stat_statements > (they are not "normalized" being utility statements) > > exemple: > alter role tt with password '123'; > > select query from public.pg_stat_statements > where query like '%password%'; > > query > > alter role tt with password '123'; > > Do you think its a bug ? If it is, then it's not a bug in pg_stat_statements. log_statement = 'ddl' would have kept a record of the same thing. Perhaps the best fix would be a documentation improvement to mention the fact and that it's best not to use plain text passwords in CREATE/ALTER ROLE. Passwords can be md5 encrypted. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bad Query Plans on 10.3 vs 9.6
On 29 March 2018 at 18:26, Cory Tucker wrote: > Hello all. I'm migrating a database from PG 9.6 to 10.3 and have noticed a > particular query that is performing very badly compared to its 9.6 > counterpart. > > The plan on 9.6 v 10.3 are effectively identical except in 9.6 the planner > decides to use an index only scan on the primary key and in 10.3 it does a > sequential scan. The problem is the sequential scan is for a table of 75M > rows and 25 columns so its quiet a lot of pages it has to traverse. How certain are you that all the indexes match on each instance? It would be useful to see psql's \d output for each table in question. Another option for you to consider would be to get rid of the OR clause in the query entirely and have two separate CTEs and INSERT INTO your orphaned_matches table having SELECTed from both CTEs with a UNION ALL. A record already deleted won't appear in the 2nd branch of the UNION ALL result. However, that still might not fix your issue with the index not being used, but you may find the anti-joined version of the query is faster anyway. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bad Query Plans on 10.3 vs 9.6
'On 30 March 2018 at 03:21, Cory Tucker wrote: >> Another possibility is that 10.3 sees the index-only scan as too expensive >> because it thinks most of the table isn't all-visible. Comparing >> pg_class.relallvisible values might be informative. > I'm happy to try to dig into this one more, however, I'm not familiar with > this value. What should I be looking for here? Each table in your database has an entry in the pg_class table. Something like: SELECT relallvisible from pg_class where oid = 'build.household'::regclass; would show you the value, however, I think a problem here is unlikely since that would just control the likelihood of an index-only-scan vs an index-scan. You're getting a Seq-scan, which I imagine is going to be quite a bit more expensive than even an index scan. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Inconsistent compilation error
On 19 April 2018 at 13:28, Melvin Davidson wrote: > It is not fine. You have specifed TWO percent signs (%) which requires TWO > argumenrts, > but you have only provided ONE -> r. I'm confused about this statement. Did you perhaps overlook the fact that there are in fact two arguments? I'm unsure why you're considering 'blah_history.original_id' not to be an argument here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: What is the C function to modify attribute
On 17 May 2018 at 15:36, a <372660...@qq.com> wrote: > I am writing a trigger that will record all adjustment of the row into the > last column of the row. I have done a iterator to produce text record and > the last step would be add the text record into the last column (which > designed as an array of user defined type, with the output function display > the abstraction of update and a user defined function is used to retrieve > detail update info). > > Now I would like to find a way that will allow me to update the last column > in C efficiently. > > Can anyone help? Thank you so much!! You might find https://www.postgresql.org/docs/10/static/triggers.html useful. There's a complete example of a trigger function written in C there. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: limit and query planner
On 6 June 2018 at 07:17, armand pirvu wrote: > So since > select count(*) from sp_i2birst_reg_staging_test; > count > --- > 6860 > and > select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' > and status=0 ; > count > --- > 4239 > > That means to me I fetch almost the whole table and then I fall in the case > you described seqscan-and-sort wins over indexscan . The planner simply assumes that 1 in (6860.0 / 4239.0) rows matches your WHERE clause. Since you want 10 rows from the LIMIT, it thinks it'll just need to read 17 rows from the heap to answer your query. > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = > 'ACSF17' > and status=1; > QUERY PLAN > Index Scan using fooidx on sp_i2birst_reg_staging_test (cost=0.28..202.91 > rows=500 width=519) (actual time=0.097..0.527 rows=500 loops=1) >Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) > Planning time: 1.024 ms > Execution time: 0.766 ms > this gets 500 rows out of 6860 > > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = > 'ACSF17' > and status=1 limit 10; > >QUERY PLAN > Limit (cost=0.28..4.33 rows=10 width=519) (actual time=0.073..0.105 rows=10 > loops=1) >-> Index Scan using fooidx on sp_i2birst_reg_staging_test > (cost=0.28..202.91 rows=500 width=519) (actual time=0.072..0.101 rows=10 > loops=1) > Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) > Planning time: 0.280 ms > Execution time: 0.173 ms > > Back to the original > > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = > 'ACSF17' > and status=0 limit 1 ; > > QUERY PLAN > Limit (cost=0.00..0.13 rows=1 width=519) (actual time=0.021..0.021 rows=1 > loops=1) >-> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 > width=519) (actual time=0.019..0.019 rows=1 loops=1) > Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0)) > Rows Removed by Filter: 1 > Planning time: 0.286 ms > Execution time: 0.110 ms For the more restrictive status, the planner thinks more rows will need to be looked at, which increases the cost of the seqscan, and the planner favours the index scan. You can see the planner estimates 500 rows will match the status=1 query. So thinks ceil(1 * (6860 / 500.0) * 10) = 138 rows will need looked at in the seqscan plan. That's obviously more costly than 17 rows. So the index scan begins to look more favourable. The planner always assumes the rows are evenly distributed, which is not always the case. If your ten rows were at the end of the heap, then the seqscan would be a bad plan. In this case "Rows Removed by Filter" would be high. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: help understanding create statistic
On 28 June 2018 at 21:38, Luca Ferrari wrote: > CREATE INDEX idx_year > ON expenses ( EXTRACT( year FROM day ) ); > > why is the planner not choosing to use such index on a 'year' raw query? > > EXPLAIN SELECT * FROM expenses > WHERE year = 2016; The expression in the where clause must match the indexed expression. You'd need to add an index on just (year) for that to work. > Am I misunderstaing this functional dependency? Yeah, the statistics are just there to drive the planner's costing. They won't serve as proof for anything else. All you've done by creating those stats is to allow better estimates for queries such as: SELECT * FROM expenses WHERE day = '2018-06-28' and year = 2018; > stxdependencies | {"3 => 5": 1.00} It would appear that "3" is the attnum for day and "5" is year. All that tells the planner is that on the records sampled during analyze is that each "day" had about exactly 1 year. There's nothing then to stop you going and adding a record with the day '2017-01-01' and the year 2018. The stats will remain the same until you analyze the table again. If those stats didn't exist, the planner would have multiplied the selectivity estimates of each item in the WHERE clause individually. So if about 10% of records had year=2018, and 0.01% had '2018-06-28', then the selectivity would have been 0.1 * 0.001. With a functional dependency of 1, the selectivity just becomes 0.001. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Convert Existing Table to a Partition Table in PG10
On 1 July 2018 at 10:15, Clifford Snow wrote: > I also leaned that my range partition value I used on a timestamp needed to > have fractional seconds. I used a range of 2017-01-01 00:00:00 to > 2017-23:59:59 which failed when I attempted to add a record that had a > timestamp of 2017-23:59:59. Adding a fractional second to the range solved > the problem. Please be aware that with RANGE partitions the upper bound is non-inclusive. The lower bound is inclusive. If you want a 2017 partition, then FOR VALUES FROM ('2017-01-01') TO ('2018-01-01') will allow all 2017 timestamps and only 2017 timestamps. You've no need to consider precision of the type and how many 9's you add to anything here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: FK v.s unique indexes
On 3 July 2018 at 19:30, Rafal Pietrak wrote: > tst=# create table test1(load bigint, a int, b int, c bool) partition by > list (c); > CREATE TABLE > tst=# create table test1_true partition of test1 for values in (true); > CREATE TABLE > tst=# create table test1_false partition of test1 for values in (false); > CREATE TABLE > tst=# create unique index load ON test1 (load,a,b,c); > CREATE INDEX > tst=# create table info_text1 (load text, a int, b int, c bool, info > text, foreign key (load,a,b,c) references test1(load,a,b,c)) ; > ERROR: cannot reference partitioned table "test1" > -- > > Why is this forbidden? I don't think there were any actual roadblocks, it was more of just not enough time in the cycle to make it work due to a few technical details that required extra effort to make work. Alvaro managed to simplify the problem and allow foreign keys to be defined on partitioned tables and get that into PG11. So it was a case of 50% is better than 0%, which I very agree with. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_dump out of memory
On 4 July 2018 at 14:43, Andy Colson wrote: > I moved a physical box to a VM, and set its memory to 1Gig. Everything > runs fine except one backup: > > > /pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep > > g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed. > pg_dump: Error message from server: ERROR: out of memory > DETAIL: Failed on request of size 1073741823.> pg_dump: The command was: > COPY public.ofrrds (id, updateddate, bytes) TO > stdout; There will be less memory pressure on the server if the pg_dump was performed from another host. When running pg_dump locally the 290MB bytea value will be allocated in both the backend process pg_dump is using and pg_dump itself. Running the backup remotely won't require the latter to be allocated on the server. > I've been reducing my memory settings: > > maintenance_work_mem = 80MB > work_mem = 5MB > shared_buffers = 200MB You may also get it to work by reducing shared_buffers further. work_mem won't have any affect, neither will maintenance_work_mem. Failing that, the suggestions of more RAM and/or swap look good. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Optimizing execution of expensive subqueries
On 12 July 2018 at 02:24, Mathieu Fenniak wrote: > I'm currently looking at a query that is generally selecting a bunch of > simple columns from a table, and also performing some subqueries to > aggregate related data, and then sorting by one of the simple columns and > paginating the result. > > eg. > > SELECT > tbl.field1, tbl.field2, tbl.field3, ..., > (SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId > AND anothertbl.ThingyId = 1) as Thingy1Sum, > ... repeat for multiply thingies ... > FROM > tbl > ORDER BY tbl.field1 LIMIT 20 > > I'm finding that if "tbl" contains hundreds of thousands of rows, the > subqueries are being executed hundreds of thousands of times. Because of > the sorting and pagination, this is appears to be unnecessary, and the > result is slow performance. (PostgreSQL 9.5.9 server) You've got two choices. 1) You can add a btree index on field1 so that the executor does not need to examine all records before taking the top-20, or; 2) move the subquery out of the target list and instead make it a LEFT JOIN adding an appropriate GROUP BY clause. #2 might not be a great option since it may require building groups that don't get used, but it would likely be the bast option if you didn't have a LIMIT clause, or the LIMIT was a larger percentage of the total records. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Finding out why parallel queries not avoided
On 21 July 2018 at 20:15, Didier Carlier wrote: > explain select count(*) from calendar c1, calendar c2, measure m where > c1.stddate='2015-01-01' and c2.stddate='2015-12-31' and m.fromdateid > >=c1.calendarid and m.fromdateid < c2.calendarid; > QUERY PLAN > -- > Aggregate (cost=5073362.73..5073362.74 rows=1 width=8) >-> Nested Loop (cost=8718.47..4988195.81 rows=34066770 width=0) > -> Index Scan using calendar_stddate_unique on calendar c2 > (cost=0.28..2.30 rows=1 width=4) >Index Cond: (stddate = '2015-12-31 00:00:00+01'::timestamp > with time zone) > -> Nested Loop (cost=8718.19..4647525.81 rows=34066770 width=4) >-> Index Scan using calendar_stddate_unique on calendar c1 > (cost=0.28..2.30 rows=1 width=4) > Index Cond: (stddate = '2015-01-01 > 00:00:00+01'::timestamp with time zone) >-> Bitmap Heap Scan on measure m (cost=8717.91..4306855.81 > rows=34066770 width=4) > Recheck Cond: ((fromdateid >= c1.calendarid) AND > (fromdateid < c2.calendarid)) > -> Bitmap Index Scan on idx_measure_fromdate > (cost=0.00..201.22 rows=34072527 width=0) >Index Cond: ((fromdateid >= c1.calendarid) AND > (fromdateid < c2.calendarid)) > > Both queries return the same answers but I don't see why the second one > doesn't use parallel query. You'd likely be better of writing the query as: select count(*) from measure where fromdateid >= (select calendarid from calendar where stddate = '2015-01-01') and fromdateid < (select calendarid from calendar where stddate = '2015-12-31'); The reason you get the poor nested loop plan is that nested loop is the only join method that supports non-equijoin. Unsure why you didn't get a parallel plan. Parallel in pg10 supports a few more plan shapes than 9.6 did. Unsure what version you're using. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Restore relhaspkey in PostgreSQL Version 11 Beta
On 31 July 2018 at 11:11, Melvin Davidson wrote: >> If you want stability use information_schema which we'll try very hard to >> not ever break. > Of course. Would you be so kind as to point out where in the > information_schema it > indicates if a table has a primary key or not. Oh wait, now I remember...no > place. With all due respect Sir, you're making a fool of yourself here. I'd suggest that before you debate or argue with people that you ensure that you're correct. This can often be hard to do on the spot, but excuses dwindle a bit more when the communication is asynchronous via email. It's not that difficult to find information_schema.table_constraints and see that constraint_type has "PRIMARY KEY" >>Keeping random atavistic things around, would slow us down, which will be a >>price everybody is paying. > Random atavistic things? I hardly think relhaspkey is random. It's been > there since version 7.2. > Exactly how does keeping it around slow you/us down? Well, it's been known that some people misuse relhaspkey. For example, in [1], someone is recommending to use relhaspkey to check for tables which don't have a PRIMARY KEY constraint. This was the wrong advice as the flag could remain set after the primary key was dropped from the table and before any vacuum took place on the table. The correct advice should have been the same as what Tom mentioned above, by checking for a pg_index record for the table with indisprimary as true. Alternatively, another useful response would have been to check information_schema, which would have provided an SQL standard way to check. Now, in regards to [1]. I rather hope that you can sympathize with the decision to remove the column a little as the person who made the incorrect recommendation in [1] was none other than you yourself. So it seems that you've only assisted in contributing to the columns removal by not only misusing it yourself but also instructing others, publically to do the same. Now, in regards to your general tone here. It appears you're under the assumption that the column was removed for some malicious reason in order to break people's scripts, but I can assure you, there was no malicious intent involved. However, the column *was* removed exactly in order to break queries. The reason being it was most likely the queries were already broken and we deemed the problem big enough to remove the misleading column in order to let people know their queries were broken. Your argument to put the column back carries very little weight, as it appears your script is trying to determine which tables have no primary key incorrectly. So I'd recommend that, instead of expending some keystrokes in replying to this email, that instead, you spend them fixing your broken code. Tom has kindly given you a very good starting point too. Personally, if I had been using a query like yours, I'd be thanking Peter for highlighting it was broken for me. If you'd like something else to read, please also look at [2]. I imagine this is the sort of thing that Andres is talking about. [1] https://www.postgresql.org/message-id/CANu8FiyQsQg7bF3FPT+FU=kK=wjhfewpp+6qe9fxf6yxr+w...@mail.gmail.com [2] https://www.postgresql.org/message-id/flat/20140317185255.20724.49675%40wrigleys.postgresql.org -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Template0 datfrozenxid age is 160million and progressing
On 2 August 2018 at 02:31, Rijo Roy wrote: > I agree.. But unfortunately it is the business call and we cannot alter it.. > I am planning to convince them by keeping autovacuum_analyze_threshold to a > high value so that auto analyse will not kick in very often leaving the > autovacuum to do its job.. > Please advise. If autovacuum is disabled, it'll still kick in for any anti-wraparound work that needs to be performed. This is also mentioned in the docs: "Note that even when this parameter is disabled, the system will launch autovacuum processes if necessary to prevent transaction ID wraparound. See Section 24.1.5 for more information." https://www.postgresql.org/docs/10/static/runtime-config-autovacuum.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Weird behaviour of the planner
On 2 August 2018 at 02:48, Guillaume Lelarge wrote: > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2; > >QUERY PLAN > - > Unique (cost=12005.97..12049.20 rows=1 width=1430) (actual > time=20055.294..20323.348 rows=1 loops=1) >-> Sort (cost=12005.97..12006.30 rows=132 width=1430) (actual > time=20055.290..20105.738 rows=6 loops=1) > Sort Key: (... 130 columns ...) > Sort Method: external sort Disk: 84464kB > -> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) > (actual time=0.109..114.142 rows=6 loops=1) > Planning time: 10.012 ms > Execution time: 20337.854 ms > (7 rows) > > That looks quite good. The total cost is 12049, so I expect this plan to > have the smaller cost as it's the choosen plan. Now, I'm disabling Sort, and > here is what I get: > > SET enable_sort TO off; > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2; > >QUERY PLAN > - > HashAggregate (cost=12044.22..12044.23 rows=1 width=1430) (actual > time=508.342..508.343 rows=1 loops=1) >Group Key: (... 130 columns ...) >-> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) (actual > time=0.036..57.088 rows=6 loops=1) > Planning time: 1.335 ms > Execution time: 508.529 ms > (5 rows) When the planner adds a new path it compares the cost not exactly, but with a 'fuzz factor'. It's very likely that the hashagg path did not make it as it was not fuzzily any cheaper than the unique path. By default, this fuzz factor is 1%. It seems in your case the costs don't quite match reality which is quite likely due to the poor row estimates on "gleu2". Has that table been analyzed recently? or is there some reason that auto-vacuum is not getting to it? There's a bit more reading of what I'm talking about in https://github.com/postgres/postgres/blob/master/src/backend/optimizer/util/pathnode.c#L141 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: ProjectSet
On 2 August 2018 at 21:17, Oliver Kohll wrote: > Is there an explanation of ProjectSet anywhere? Plan node types and what they each do are not very well documented outside of the source code. ProjectSet appears when the SELECT or ORDER BY clause of the query. They basically just execute the set-returning function(s) for each tuple until none of the functions return any more records. Simple Example: EXPLAIN SELECT generate_series(1,2); QUERY PLAN - ProjectSet (cost=0.00..5.02 rows=1000 width=4) -> Result (cost=0.00..0.01 rows=1 width=0) (2 rows) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: ProjectSet
On 3 August 2018 at 01:16, Oliver Kohll wrote: > Ah thanks David, at least I know what it is now. I don't think I have any > set returning functions though will double check in case there's a joined > view that has one. Perhaps it could be the something to do with cross > product which similarly creates multiple rows on the right for each row on > the left side. postgres=# select proretset from pg_proc where proname = 'regexp_matches'; proretset ------- t t (2 rows) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Will there ever be support for Row Level Security on Materialized Views?
On 28 August 2018 at 09:58, Ken Tanzer wrote: > On Tue, Aug 14, 2018 at 5:24 PM Ken Tanzer wrote: >> Basically, I'm wondering whether materialized views are likely to ever >> support row-level security. > > Hi. Just wanted to follow up on my own email and see if anyone could answer > this. You might think there's some master project-wide list of things that are to implement that we all work from, but there's not. There is a todo list [1] that might serve as some sort of guide for a new person who wishes to contribute something, who's looking for inspiration as to what to contribute, but I don't think that list ever goes much beyond that. New features normally appear for one of two reasons: 1) A developer gets inspired to make something happen; or 2) Someone pays a developer to make something happen. So I guess in this case either 1 or 2 has not happened, or they have but the correct people have either not seen your email or they just simply don't want to answer. If you're desperate for the feature, I'd recommend looking into the possibility of going with 1 or 2. [1] https://wiki.postgresql.org/wiki/Todo
Re: Erroneous behavior of primary key
On 28 August 2018 at 07:50, Daniel J Peacock wrote: > I've got an odd situation with a table that has a varchar(255) as the > primary key that is getting key values from an Elasticsearch engine. What > I'm finding is that even though there is a primary key on the table, I am > getting duplicated keys in the data and the constraint is not blocking > these. How do you know they're duplicated? > When I do a "select ,count(*) from group by > having count(*) > 1" I get no results. Yet, when I search the > table for a value that is like a key I know to be duplicated, I get multiple > results. When I select from the table where field is equal to the > duplicated field I get one result. I verified that they are distinct row > with ctid. I also created a clone of the table with CTAS and then tried to > create a unique index on the id varchar field but that failed with > "duplicate keys found". I'm stumped as to what could be the problem. > The only thing that I can think of is that the primary key is somehow > corrupt. I've noticed this behavior on other tables on this database. > What could be causing this sort of problem? If the index is corrupt then you might find that: set enable_indexscan = 0; set enable_indexonlyscan = 0; select ,count(*) from group by having count(*) > 1; would return some rows. You should also verify the above query does use a Seq Scan by performing an EXPLAIN on the query. There are issues that have been fixed in previous releases which could have caused an index to get corrupted in this way, so it's quite useful to know which version of PostgreSQL you're running here and if you've paid attention to the release notes when you've previously upgraded. For example, see [1]. [1] https://www.postgresql.org/docs/9.6/static/release-9-6-2.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Will there ever be support for Row Level Security on Materialized Views?
On 28 August 2018 at 11:06, Ken Tanzer wrote: > Or perhaps they "simply don't want to answer." Well I wondered about that > too, since the last person who asked about this didn't get a response either. > Is it a particularly stupid question? Is there some history that makes it a > touchy subject? Or is it just the Bermuda Triangle of this mailing list? :) I can't imagine why it would be a touchy subject. Generally, if you're working on a feature its best not to keep it a secret as if someone else does the same, then you end up with redundant work being done. If I had to guess what's going on here then I'd say that nobody has been sufficiently motivated to work on this yet. If that's the case, everyone who reads your email is not the person working on this feature, so can't answer your question. I just answered to suggest the reasons why you might not be getting an answer. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgres 11, partitioning with a custom hash function
On 4 October 2018 at 16:22, Harry B wrote: > I am still having trouble reconciling what happens under the HASH > partitioning!. If I have text column forming the basis of PARTITIONED BY > HASH, the HASH value used in the partitioning setup does not seem to match > to `hashtext()` of that value It won't match. The hash partition hash is seeded with a special const (HASH_PARTITION_SEED) see [1]. You could likely roll your own hash ops. See [2] for an example. This can then be used to create a hash partitioned table like [3]. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/partitioning/partbounds.c#l2056 [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/regress/sql/insert.sql#l241 [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/regress/sql/hash_part.sql#l10 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgres 11, partitioning with a custom hash function
On 5 October 2018 at 06:18, Harry B wrote: > > Thank you David! These helped me create an operator class. > However, there still seems to be a 'off-by-a-fixed-N' difference between the > hash value returned and how PG selects the partition. hmm, actually, this is probably due to the hash_combine64() call in compute_partition_hash_value(). This seems to combine the hash value with 0 regardless of if there's another partition key column to hash. If you could somehow do the reverse of what hash_combine64() will do to you hash before returning it to the function then you might get somewhere, but that does not look possible since it appears to be throwing away some bits. It may not have been a great choice to decide to have compute_partition_hash_value() do this unnecessary combine, but it's likely a few months too late to change that now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgres 11, partitioning with a custom hash function
On 5 October 2018 at 09:43, Harry B wrote: > Now the big question: How scared should I be relying on this? I don't mind > it breaking on major version upgrades (which would mean I need to dump & > restore my entire set), but how likely is it to change unannounced in a > minor/security release? Unless of course, you break it in a way that makes > custom-hash function impossible. I don't see how we could possibly change it once v11 is out the door. Such a change would break pg_upgrade and I imagine we want that to work for a long time to come yet, at least until there is some other reason that is worthy of breaking it. The bar is likely set pretty high for that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Help with list partitioning on expression
On 19 October 2018 at 02:49, Dinko Papak wrote: > I have created table partitioned by list on expression using timestamp > column, so for each timestamp value function returns int and each partition > table is for single value (range of timestamp for which function result is > the same). This helps me to simplify querying as I do not need to always > write date ranges. Querying (partition pruning) works nice, however when I > attach new partition it seems to always scan whole table, although I do have > necessary check constraint on partitioned table. I have tried to make > timestamp column both null and not null. Also, it takes longer to attach > partition then to add constraint itself although per my understanding those > 2 operations should do the same scan. It's not all that obvious, but if you have PARTITION BY LIST (extract(minute FROM ts)) and try to attach a partition like: CREATE TABLE mypartition PARTITION OF parted FOR VALUES IN (1); then the partition constraint is actually (extract(minute FROM ts) IS NOT NULL AND extract(minute FROM ts) = 1). If your CHECK constraint is just checking `extract(minute FROM ts) = 1` then the current code in PartConstraintImpliedByRelConstraint() is not smart enough to know that `extract(minute FROM ts) = 1` is strict and cannot match nulls. Perhaps that could be improved, but that's how it is today. Likely you'll have better luck with a check constraint that explicitly checks the function IS NOT NULL. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Aggregated join vs. aggregate in column?
On Fri, 12 Jul 2019 at 19:32, Durumdara wrote: > 2.) I may relocate this section as join... > > select > request.*, s.max_s_date > from request > left join > ( > select schedule.product_id, max(s_date) as max_s_date from schedule > where schedule.ok = True > group by schedule.product_id > ) s on (s.product_id = request.product_id) > ... > > But I really don't know what the hell will happen in query optimizer with > this method. > > a.) > Optimizer is clever, and it calculates the aggregates only in the needed rows. > So it find the request.product_id-s, and execute the "s" query only in these > rows. > > b.) > Or it isn't enough wise, it executes the "s" subquery on whole schedule, and > later joins to main table. > The schedule table is big in customers' database, so this is worst case. :-( I'm afraid for the particular query above, the answer is closer to b) However, that's only going to be a problem if there are many more distinct product_id records in "schedule". If you were to add a WHERE clause to the outer query that did WHERE request.product_id = X, then that qual would be pushed down into the subquery. This qual pushing only works for equality. So if you changed out WHERE request.product_id = X to WHERE request.product_id IN(X,Y); then that wouldn't push the qual to the subquery. > I asked this because sometimes I need to get more result columns in the > select, but I couldn't retreive more... > > ( > select max(s_date) as s_date from schedule > where schedule.product_id = request.product_id and schedule.ok = True > ) as max_s_date, <=== only one column > > So sometimes the join is better. > > But if the optimizer isn't enough wise, I can get these values only "WITH" > queries (select the main rows to temp, run subselects with only these > records, return the mix of main and subselects in one query). If you don't want to repeat the same subquery in the SELECT list then you could perform a CROSS JOIN LATERAL. For example: select request.*, s.max_s_date, s.max_s1_date from request cross join lateral ( select max(s_date) as max_s_date, max(s1_date) as max_s1_date from schedule where schedule.ok = True and s.product_id = request.product_id) s; In this case, the subquery will be executed once per output row, so if you have some restrictive WHERE clause on the outer query then the subquery will be executed fewer times. With a bit of training, you should be able to see what the query planner has done for yourself by using the EXPLAIN command: https://www.postgresql.org/docs/current/sql-explain.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: lead() with arrays - strange behaviour
On Thu, 8 Aug 2019 at 21:06, Thomas Kellerer wrote: > The following statement tries to find the overlapping values in id_list > between the current row and the next row: > > select id, >id_list, >lead(id_list) over (order by id) as next_list, >array(select unnest(id_list) intersect select unnest(lead(id_list) > over (order by id))) as common_ids > from sample_data; > > The above returns: > > id | id_list | next_list | common_ids > ---+-+---+--- > 1 | {1,2,3} | {2,3,4} | {} > 2 | {2,3,4} | {4,5,6} | {} > 3 | {4,5,6} | | {} > > The empty array for "common_ids" is obviously incorrect. I think you're confused with what the SELECT with the empty FROM clause does here. In your subquery "id_list" is just a parameter from the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to return anything since those are both just effectively scalar values, to which there is no "next" value. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: FW: Undelivered Mail Returned to Sender
On Sun, 11 Aug 2019 at 06:53, stan wrote: > > I apologize for asking, what I suspect will turn out to be a newbie > question, but I have managed to get myself quite confused on this. > > I am defining a view as follows > > > CREATE OR REPLACE view purchase_view as > select > project.proj_no , > qty , > mfg_part.mfg_part_no , > mfg.name as m_name , > mfg_part.descrip as description , > ( > SELECT > name > FROM > vendor > WHERE > bom_item.vendor_key = > ( > SELECT > vendor_key > FROM > mfg_vendor_relationship > WHERE > bom_item.mfg_key = mfg_key > AND > prefered = TRUE > AND > bom_item.project_key = project_key > > ) > ) > as v_name , > ERROR: more than one row returned by a subquery used as an expression > > Can someone please enlighten me as to the error of my ways? Looks to me like your WHERE clause is wrong in the subquery. "WHERE bom_item.vendor_key =" surely that should be just "WHERE vendor_key = " (assuming that's the primary key column of the vendor table). Also, you've mentioned you've only a single record in the mfg_vendor_relationship, so the error can't be due to multiple records matching in the mfg_vendor_relationship table. However, given the unique constraint on that table includes 3 columns and you're just filtering on 2 of them, then it would only take some rows in there with the same mfg_key and project_key values but a different vendor_key to get the same error from that part of the query. If that shouldn't happen, then perhaps your UNIQUE constraint should not contain the vendor_key column. You'd have to explain what you need in more detail for someone to be able to help you fix that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: How to make runtime partition pruning work?
On Fri, 11 Oct 2019 at 21:49, Markus Heiden wrote: > > I partitioned a table "data_table" by the key "import_id" to reduce the > number of partitions to be loaded in my queries. > I used list partitions, each containing usually just one "import_id". I > used a primary key (id, import_id) > But PostgreSQL does not consider partition keys to avoid loading not > needed partitions. > > My query: > SELECT SUM(something) FROM data_table WHERE import_id IN (SELECT id FROM > import_table WHERE ...) > My problem: > The query takes too long, because PostgreSQL uses a hash join over all > partitions of "data_table" with the "import_table", instead of pruning > the "data_table" partitions by the import_ids at runtime. > Static pruning (when using ... IN (1, 2, 3, 4)) works fine though. > > What am I doing wrong that runtime partition pruning with PostgreSQL > 11.5 does not work in my case? The documentation for this reads: "Partition pruning can be performed not only during the planning of a given query, but also during its execution. This is useful as it can allow more partitions to be pruned when clauses contain expressions whose values are not known at query planning time; for example, parameters defined in a PREPARE statement, using a value obtained from a subquery or using a parameterized value on the inner side of a nested loop join. Partition pruning during execution can be performed at any of the following times: During initialization of the query plan. Partition pruning can be performed here for parameter values which are known during the initialization phase of execution. Partitions which are pruned during this stage will not show up in the query's EXPLAIN or EXPLAIN ANALYZE. It is possible to determine the number of partitions which were removed during this phase by observing the “Subplans Removed” property in the EXPLAIN output. During actual execution of the query plan. Partition pruning may also be performed here to remove partitions using values which are only known during actual query execution. This includes values from subqueries and values from execution-time parameters such as those from parameterized nested loop joins. Since the value of these parameters may change many times during the execution of the query, partition pruning is performed whenever one of the execution parameters being used by partition pruning changes. Determining if partitions were pruned during this phase requires careful inspection of the loops property in the EXPLAIN ANALYZE output. Subplans corresponding to different partitions may have different values for it depending on how many times each of them was pruned during execution. Some may be shown as (never executed) if they were pruned every time." https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITION-PRUNING Notice that only subqueries and parameterized nested loop joins are mentioned. The above text does not really go into the detail of which types of subqueries can be used, but I can confirm that they must be subqueries that can only return a scalar value. e.g WHERE x = (SELECT y FROM ...). The executor would raise an error if that subquery returned more than 1 row. The IN() clause you have is not eligible. This will be converted into a semi-join during planning, and even if it wasn't, the executor wouldn't be raising an error if it returned multiple rows. Unfortunately, to do what you mention with a hash join, we'd need to scan through the entire hash table and incrementally build the set of partitions which could match each value in the table. I'm sure there are times where that would be well worth the trouble, but I imagine in the average case we might find that the cost of scanning the entire table this way would be more than just executing the query without any partition pruning. I don't see any good way to know in advance if it would be worthwhile or not. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: select view definition from pg_views feature request
On Tue, 5 Nov 2019 at 11:05, George Neuner wrote: > > On Sun, 03 Nov 2019 16:15:23 -0500, Tom Lane > >Why don't you just change the search path to empty before selecting? > > > I'm guessing this is fine inside a transaction, but what if you're > using a client that commits by statement? In that case, wouldn't the > change to the path affect the other backends? See https://www.postgresql.org/docs/current/sql-set.html "SET only affects the value used by the current session." Also: "The effects of SET LOCAL last only till the end of the current transaction" Neither affects other sessions. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Partition, inheritance for storing syslog records.
On Tue, 12 Nov 2019 at 00:45, Mimiko wrote: > There could be not sub-partition by fromhost. Only by syslogtag, then by year. > > So how could be this accomplished both in version 8.4 and in version 12. > Other ideas are welcome. > > The main goal is to be able to quickly investigate logs from some > applications and some host searching regex in `message` column. The documents in https://www.postgresql.org/docs/12/ddl-partitioning.html go to lengths to explain how to do this in v12. You should have a read of that and pay particular attention in areas that mention "sub-partitioning". If you're still uncertain after that, then ask here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Query to retrieve the index columns when a function is used.
On Tue, 10 Mar 2020 at 02:16, Sterpu Victor wrote: > I'm testing on Postgresql 12.1 and I have a index like this: > "check_dates_gist" EXCLUDE USING gist (id_test1 WITH =, id_test2 WITH =, > tsrange(valid_from::timestamp without time zone, valid_to::timestamp without > time zone) WITH &&) > > When I run this query: > "select pc.relname, pi.indisunique, pi.indisprimary, array_agg(a.attname) as > attname > FROM pg_class pc > JOIN pg_index pi ON pc.oid = pi.indexrelid AND pc.oid IN > (SELECT indexrelid FROM pg_index, pg_class WHERE pg_class.relname='test' AND > pg_class.oid=pg_index.indrelid) > JOIN pg_attribute a ON a.attrelid = pc.oid > GROUP BY pc.relname,pi.indisunique, pi.indisprimary;" > I retrieve the index but there is no detail about the columns valid_from and > valid_to. > > How can I retrieve this detail? Those details are in the indexprs column. There's an item there for each 0 valued indkey. It's not going to be particularly easy for you to parse that from SQL. Internally in PostgreSQL, we have functions that could give you that information easily, but to access those from SQL you'd need to write something in C. The C function is named pull_varattnos(). That will give you a Bitmapset for each of the columns that are found. You'd need to write a set-returning function to return those values then join that to pg_attribute. Alternatively, it likely wouldn't be impossible to do at the SQL level with regexp_matches(), but building an expression to reliably extract what you want might not be an easy task. On a quick test, I see that: select indexrelid, unnest(r.e) from pg_index i, lateral regexp_matches(i.indexprs, 'VAR :varno 1 :varattno (\d{1,})', 'g') r(e) where i.indexprs is not null; does kick out the correct results for the expression indexes I have in my database, but there may be some more complex expressions that break it.
Re: Is there a script and sample data to test the performance of postgresql?
On Fri, 17 Apr 2020 at 20:13, Kiwoong Choi wrote: > Is there a script and sample data to test the performance of postgresql? There is pgbench. https://www.postgresql.org/docs/current/pgbench.html David
Re: COPY blocking \dt+?
On Tue, 5 May 2020 at 13:05, pinker wrote: > I'm running standard pgbench and what's kinda strange copy pgbench_accounts > from stdin is blocking my other query which is \dt+. > Does copy hold any exclusive lock or there is something wrong with my > system? COPY does not. However, pgbench does TRUNCATE the tables before doing the COPY, so that'll be most likely what's causing your query to get blocked. David
Re: COPY blocking \dt+?
On Tue, 5 May 2020 at 14:50, pinker wrote: > So it would need to run inside single transaction to cause lock, right? do > you know if pgbench is opening transaction? The TRUNCATE and COPY are done in the same transaction. You can see the code in https://github.com/postgres/postgres/blob/REL_11_6/src/bin/pgbench/pgbench.c#L3599 David
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )
On Fri, 8 May 2020 at 10:00, David G. Johnston wrote: > > On Thu, May 7, 2020 at 11:07 AM Amarendra Konda wrote: >> >> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS >> pa_process_activity_id FROM process_activity pa WHERE pa.app_id = >> '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( >> SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND >> pi.process_instance_id = pa.process_instance_id AND pi.user_id = >> '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50; >> >> >>-> Index Scan using >> process_activity_process_instance_id_app_id_created_idx on >> public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual >> time=0.011..20.320 rows=23506 loops=2) > > > Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id > > = '126502930200650'::bigint) AND (m.created > '1970-01-01 > > 00:00:00'::timestamp without time zone)) > > I suppose during the nested loop the inner index scan could limit itself to > the first 50 entries it finds (since the first two index columns are being > held constant on each scan, m.created should define the traversal order...) > so that the output of the nested loop ends up being (max 2 x 50) 100 entries > which are then sorted and only the top 50 returned. > > Whether the executor could but isn't doing that here or isn't programmed to > do that (or my logic is totally off) I do not know. I think the planner is likely not putting the process_activity table on the outer side of the nested loop join due to the poor row estimates. If it knew that so many rows would match the join then it likely would have done that to save from having to perform the sort at all. However, because the planner has put the process_instance on the outer side of the nested loop join, it's the pathkeys from that path that the nested loop node has, which is not the same as what the ORDER BY needs, so the planner must add a sort step, which means that all rows from the nested loop plan must be read so that they can be sorted. It might be worth trying: create index on process_instance (user_id,app_id); as that might lower the cost of performing the join in the opposite order and have the planner prefer that order instead. If doing that, the OP could then ditch the fki_conv_konotor_user_user_id index to save space. If that's not enough to convince the planner that the opposite order is better then certainly SET enable_sort TO off; would. David
Re: AutoVacuum and growing transaction XID's
On Fri, 8 May 2020 at 09:18, github kran wrote: > 1) We haven't changed anything related to autovacuum except a work_mem > parameter which was increased to 4 GB which I believe is not related to > autovacuum It might want to look into increasing vacuum_cost_limit to something well above 200 or dropping autovacuum_vacuum_cost_delay down from 20 to something much lower. However, you say you've not changed the autovacuum settings, but you've also said: >1) I see there are 8 Vacuum workers ( Not sure what changed) running in > the background and the concern I have is all of these vacuum processes are > running with wrap around and while they are running The default is 3, so if you have 8 then the settings are non-standard. It might be good to supply the output of: SELECT name,setting from pg_Settings where name like '%vacuum%'; You should know that the default speed that autovacuum runs at is quite slow in 9.6. If you end up with all your autovacuum workers tied up with anti-wraparound vacuums then other tables are likely to get neglected and that could lead to stale stats or bloated tables. Best to aim to get auto-vacuum running faster or aim to perform some manual vacuums of tables that are over their max freeze age during an off-peak period to make use of the lower load during those times. Start with tables in pg_class with the largest age(relfrozenxid). You'll still likely want to look at the speed autovacuum runs at either way. Please be aware that the first time a new cluster crosses the autovacuum_freeze_max_age threshold can be a bit of a pain point as it can mean that many tables require auto-vacuum activity all at once. The impact of this is compounded if you have many tables that never receive an UPDATE/DELETE as auto-vacuum, in 9.6, does not visit those tables for any other reason. After the first time, the relfrozenxids of tables tend to be more staggered so their vacuum freeze requirements are also more staggered and that tends to cause fewer problems. David
Re: AutoVacuum and growing transaction XID's
On Fri, 8 May 2020 at 13:51, github kran wrote: > I can't either DROP or ALTER any other tables ( REMOVE Inheritance for > any of old tables where the WRITES are not getting written to). Any of the > ALTER TABLE OR DROP TABLE DDL's arer not getting exeucted even I WAITED FOR > SEVERAL MINUTES , so I have terminated those queries as I didn't have luck. The auto-vacuum freeze holds an SharedUpdateExclusiveLock on the table being vacuumed. If you try any DDL that requires an AccessExclusiveLock, it'll have to wait until the vacuum has completed. If you leave the DDL running then all accesses to the table will be queued behind the ungranted AccessExclusiveLock. It's likely a good idea to always run DDL with a fairly short lock_timeout, just in case this happens. >3) Can I increase the autovacuum_freeze_max_age on the tables on > production system ? Yes, but you cannot increase the per-table setting above the global setting. Changing the global setting requires a restart. David
Re: Slow SELECT
On Tue, 26 May 2020 at 19:23, Frank Millman wrote: > The table sizes are - > my_table : 167 rows > table_1 : 21 rows > table_2 : 11 rows > table_3 : 3 rows > table_4 : 16 rows > > Therefore for each tran_date in my_table there are potentially > 21x11x3x16 = 11088 rows. Most will be null. > > I want to select the row_id for the last tran_date for each of those > potential groups. This is my select - > > SELECT ( > SELECT a.row_id FROM my_table a > WHERE a.fld_1 = b.row_id > AND a.fld_2 = c.row_id > AND a.fld_3 = d.row_id > AND a.fld_4 = e.row_id > AND a.deleted_id = 0 > ORDER BY a.tran_date DESC LIMIT 1 > ) > FROM table_1 b, table_2 c, table_3 d, table_4 e > > Out of 11088 rows selected, 103 are not null. Perhaps SQL Server is doing something to rewrite the subquery in the target list to a LEFT JOIN. PostgreSQL currently does not do that. Since "my_table" is small, you'd likely be much better doing a manual rewrite of the query to join a subquery containing the required details from "my_table". It looks like you want the row_id from the latest tran_date for each fld_N column. So something like: SELECT a.row_id FROM table_1 b CROSS JOIN table_2 c CROSS JOIN table_3 d CROSS JOIN table_4 e LEFT OUTER JOIN ( SELECT fld_1,fld_2,fld_3,fld_4,row_id,tran_date, ROW_NUMBER() OVER (PARTITION BY fld_1,fld_2,fld_3,fld_4 ORDER BY tran_date DESC) row_num FROM my_table WHERE deleted_id = 0 ) a ON a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.row_num = 1; Should do it. You could also perhaps do something with DISTINCT ON instead of using ROW_NUMBER(). That might be a bit more efficient, but it's unlikely to matter too much since there are only 167 rows in that table. David
Re: Slow SELECT
On Tue, 26 May 2020 at 22:31, Frank Millman wrote: > Thank you David. I tried that and it produced the correct result in > 53ms, which is what I am looking for. > > It will take me some time to understand it fully, so I have some > homework to do! The main problem with your previous query was that the subquery was being executed 11088 times and could only ever find anything 167 times. The remaining number of times nothing would be found. I just changed the subquery which would be executed once per output row and altered it so it became a subquery that's joined and only executed once. The ROW_NUMBER() is a windowing function, which is explained in [1]. I used this to get the row_id of the record with the lowest tran_date, just like you were doing with the ORDER BY tran_date DESC LIMIT 1, but the subquery with the windowing function gets them all at once, rather than doing it in a way that requires it to be executed once for each row in the top-level query. In this case, the functionality that the LIMIT 1 does in your query is achieved with "AND a.row_num = 1;" in my version. This is pretty fast to execute once due to there only being 167 rows. It's also important to know that there may be cases where the method I proposed is slower. For example, if my_table was very large and contained rows that were not in table_1 to table_4. Since the subquery in my version calculates everything then it could be wasteful to do that for values that would never be used. For you, you have foreign keys that ensure my_table does not contain records that are not in the other tables, but you could still see this issue if you were to add some restrictive WHERE clause to the outer query. Perhaps this won't be a problem for you, but it's likely good to know. [1] https://www.postgresql.org/docs/current/tutorial-window.html David
Re: Slow SELECT
On Tue, 26 May 2020 at 23:41, Vik Fearing wrote: > > On 5/26/20 12:04 PM, David Rowley wrote: > > Since "my_table" is small, you'd likely be much better doing a manual > > rewrite of the query to join a subquery containing the required > > details from "my_table". It looks like you want the row_id from the > > latest tran_date for each fld_N column. So something like: > > > > SELECT a.row_id > > FROM table_1 b > > CROSS JOIN table_2 c > > CROSS JOIN table_3 d > > CROSS JOIN table_4 e > > LEFT OUTER JOIN ( > > SELECT fld_1,fld_2,fld_3,fld_4,row_id,tran_date, > > ROW_NUMBER() OVER (PARTITION BY fld_1,fld_2,fld_3,fld_4 ORDER BY > > tran_date DESC) row_num > > FROM my_table > > WHERE deleted_id = 0 > > ) a ON a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = > > d.row_id AND a.fld_4 = e.row_id AND a.row_num = 1; > > > > Should do it. You could also perhaps do something with DISTINCT ON > > instead of using ROW_NUMBER(). That might be a bit more efficient, but > > it's unlikely to matter too much since there are only 167 rows in that > > table. > > I would expect a lateral query to be better here. But that would put it back to executing the subquery 11088 times. I wrote it in a way to avoid that. David
Re: Possible improvement
On Fri, 5 Jun 2020 at 14:41, Paul van der Linden wrote: > If I have a query like: > > SELECT * FROM ( > SELECT > CASE > WHEN field='value1' THEN 1 > WHEN field='value2' THEN 2 > END AS category > FROM table1 > ) AS foo > WHERE category=1 > > doesn't use the index on field, while technically it could do that. > Is it hard to implement drilling down the constant in the WHERE to within the > CASE? It doesn't look impossible to improve that particular case. See eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However, this would need to take constant folding further than we take it today. Today we just have the ability to simplify expressions which are, by themselves, an expression which will always evaluate to a constant value. This case is more complex as it requires something outside the CASE expr to allow the simplification to take place. In this case, we'd need to look at the other side of the OpExpr to see the const there before any transformation could simplify it. It's also not entirely clear that the simplification would always be a good idea. What, for example if there was an index on the case statement but none on "field". The query may perform worse! The unfortunate part about this is that, generally, when we perform constant folding, we don't yet have an idea about which indexes exist. I imagine the only sane way to do it would be to allow expressions to have some sort of "alternative" expression that could be matched up to the index column instead. It wouldn't be a trivial piece of work to do that. For the more simple cases, you can see from looking at: postgres=# explain select * from pg_class where oid = (case when 'test' = 'test' then 1 else 0 end); QUERY PLAN - Index Scan using pg_class_oid_index on pg_class (cost=0.27..8.29 rows=1 width=260) Index Cond: (oid = '1'::oid) (2 rows) that we do simplify case statements which are by themselves constant. > Is this something that could be put on some wishlist? If so where are the > most looked at ones? There is a todo list of sorts in [1]. However, I'm really unsure if anyone ever looks at it for something to do. Mostly, people have their own ideas and problems to solve and spend their free cycles hacking away at those. You might have equal luck waiting until December and writing it on a piece of paper and setting it on fire. Likely there would be more chance if it was something simple as a novice who's looking into getting into working on Postgres might skim that list for something to work on. More experienced people, I imagine, would never look there. FWIW, many people who are now working on PostgreSQL once came along with a question or idea like yours. Many have been unable to escape ever since :) David [1] https://wiki.postgresql.org/wiki/Todo
Re: Potential optimisation for the creation of a partial index condition over a newly created nullable, non-default-valued column?
On Thu, 4 Jun 2020 at 17:59, Tim Dawborn wrote: > tmp=> \timing on > Timing is on. > tmp=> BEGIN; > BEGIN > Time: 1.333 ms > tmp=> ALTER TABLE foo ADD COLUMN d integer NULL; > ALTER TABLE > Time: 1.581 ms > tmp=> CREATE UNIQUE INDEX myindex ON foo (a, b, c) where d = 2; > CREATE INDEX > Time: 37758.880 ms (00:37.759) > tmp=> COMMIT; > COMMIT > Time: 3.922 ms > > Given that d = 2 could not ever be true as the nullable, non-default-valued > column was just added inside the same transaction, I was hoping that the > index creation would be instantaneous, as it realised there's no rows that > this condition could be true for. While it could be technically possible to do something like check the xmin of the pg_attribute record for all columns mentioned in the index's predicate all are set to the current transaction ID and the index predicate refutes an expression containing those columns with IS NULL clauses or whatever the DEFAULT expression value is, we've just no way to know if any rows were inserted or updated between the ALTER TABLE and the CREATE INDEX. Certainly, no other transaction could have done anything since we added the column due to us holding the AccessExclusiveLock. We just don't really have a way to know if our own transaction did anything in between. So to do this, we'd need to invent something to track that, and that something wouldn't be free to maintain and we'd likely need to maintain it all the time since we'd be unable to predict what future commands might need to use that information. It would likely be easier to go about it by having some sort of ALTER TABLE ADD INDEX command, then just chain the alter tables together. We have various other optimisations when multiple subcommands are used in a single ALTER TABLE. However, I'm not suggesting we go and allow indexes to be created in ALTER TABLE. I"m just suggesting that it would likely be a better alternative than inventing something to track when a table last had David
Re: When to use PARTITION BY HASH?
On Tue, 9 Jun 2020 at 01:07, Ron wrote: > > On 6/8/20 3:40 AM, Oleksandr Shulgin wrote: > [snip] > > I've found the original commit adding this feature in version 11: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e > It says: > > "Hash partitioning is useful when you want to partition a growing data > set evenly. This can be useful to keep table sizes reasonable, which > makes maintenance operations such as VACUUM faster, or to enable > partition-wise join." > > > How does hashed (meaning "randomly?) distribution of records make > partition-wise joins more efficient? Hash partitioning certainly does not mean putting the tuple in some random partition. It means putting the tuple in the partition with the correct remainder value after dividing the hash value by the largest partition modulus. > Or -- since I interpret that as having to do with "locality of data" -- am I > misunderstanding the meaning of "partition-wise joins"? If it was not a partitioned table before then partition-wise joins wouldn't be possible. Having partition-wise joins could make joining two identically partitioned tables faster. We need only look in the corresponding partition on the other side of the join for join partners for each tuple. For hash joins, hash tables can be smaller, which can mean not having to batch, and possibly having the hash table fit better into a CPU cache. For merge joins, sorts, having the data partially pre-sorted in chunks means fewer operations for qsort which can result in speedups. David
Re: "INSERT ON CONFLICT UPDATE" - Use of indexes ?
On Tue, 9 Jun 2020 at 00:42, Laura Smith wrote: > What'st the current state of play with indexes and ON CONFLICT ? The docs > seem to vaguely suggest it is possible, but this SO question > (https://stackoverflow.com/questions/38945027/) seems to suggest it is not. > > I've got a unique named index on a table (i.e. "create unique index xyz...") > but I cannot seem to be able to refer to it in a function ? > ON CONFLICT (index_name) : does not work > ON CONFLICT ON CONSTRAINT index_name: does not work Creating a unique index does not create a unique constraint. If you create a unique constraint, it'll create a unique index to enforce the constraint. ON CONSTRAINT requires a constraint name, not an index name. David
Re: Planner misestimation for JOIN with VARCHAR
On Wed, 10 Jun 2020 at 07:31, Michael Lewis wrote: > > On Tue, Jun 9, 2020 at 12:34 PM Sebastian Dressler > wrote: >> >> - Add an index on top of the whole PK >> - Add indexes onto other columns trying to help the JOIN >> - Add additional statistics on two related columns >> >> Another idea I had was to make use of generated columns and hash the PKs >> together to an BIGINT and solely use this for the JOIN. However, this would >> not work when not all columns of the PK are used for the JOIN. > > > Can you expand on the additional statistics you created? Why was it on only > two columns? Did you include MCVs type of extended stats? Unfortunately, the join selectivity functions have yet to learn about extended statistics. David
Re: Planner misestimation for JOIN with VARCHAR
On Wed, 10 Jun 2020 at 09:05, Michael Lewis wrote: >> >> the join selectivity functions have yet to learn about extended statistics. > > > That is very interesting to me. So, extended statistics would help to > properly estimate the result set coming out of a single table when comparing > each of those columns to one or many values, but not when joining up another > table? Is that on a roadmap or in progress as far as you know? Yeah, they're currently only used for selectivity estimates on base table. Using extended statistics for join selectivity estimation is being worked on [1], so there's a chance that it may happen for PG14. David [1] https://commitfest.postgresql.org/28/2386/
Re: HASH partitioning not working properly
On Fri, 19 Jun 2020 at 17:42, Srinivasa T N wrote: >After seeing the below, I feel partitioning is not working properly or it > maybe case that my understanding is wrong. Can somebody explain me what is > happening? > postgres=# select * from busbar_version6; > objectid | ver_id > --+ > (0 rows) > > postgres=# select * from busbar_version7; > objectid | ver_id > --+ > 6 | 6 > (1 row) > >Why second insert has gone to table busbar_version7 instead of > busbar_version6? It's your understanding that's not correct. The value of is passed through a hash function and the partition is selected based partition matching the remainder value after dividing the return value of the hash function by the largest modulus of any partition. That might surprise you, but how would you select which partition a varchar value should go into if you didn't use a hash function. David
Re: DISTINCT on jsonb fields and Indexes
On Mon, 22 Jun 2020 at 16:44, Sankar P wrote: > select distinct record ->> 'destinationServiceName' from fluent; > This query takes a lot of time, and does full table scan. The query planning > is: > I see that none of the indexes are used. I want to do a few > aggregations, like "what are the distinct pairs of > `destinationServiceName` and `sourceServiceName` etc. " in these > records. Now, is such a querying possible at all without doing full > table scans ? I get such kind of aggregation support in elasticsearch > + kibana, without doing full-table scan (I believe so, but I do not > have data to back this claim) and I am trying to see if this is > possible with any other extra index creation in postgres. There is some work in progress to improve this type of query, but it'll be at least PG14 before we see that. For your version, you might want to look at https://wiki.postgresql.org/wiki/Loose_indexscan and try some of the proposed solutions from there. David
Re: pgbench and timestamps
On Wed, 24 Jun 2020 at 20:41, Jaime Soler wrote: > > Hi, does anybody know what is wrong with pgbench in this case ?. Here is a > simple query to generate a random date in a interval time.sql: > > (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 > 00:00:00' - timestamp '2005-09-01 00:00:00' )); > pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U > postgres -p 5432 > ERROR: la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 > 00$1$2» > > I don't know why pgbench use timestamp: «2006-03-01 00$1$2» instead of > timestamp '2006-03-01 00:00:00' I've not debugged it, but it looks like pgbench thinks that :00 is a pgbench variable and is replacing each instance with a query parameter. https://www.postgresql.org/docs/12/pgbench.html says: "There is a simple variable-substitution facility for script files. Variable names must consist of letters (including non-Latin letters), digits, and underscores. Variables can be set by the command-line -D option, explained above, or by the meta commands explained below. In addition to any variables preset by -D command-line options, there are a few variables that are preset automatically, listed in Table 257. A value specified for these variables using -D takes precedence over the automatic presets. Once set, a variable's value can be inserted into a SQL command by writing :variablename. When running more than one client session, each session has its own set of variables. pgbench supports up to 255 variable uses in one statement." I don't often do much with pgbench and variables, but there are a few things that surprise me here. 1) That pgbench replaces variables within single quotes, and; 2) that we still think it's a variable name when it starts with a digit, and; 3) We replace variables that are undefined. I won't pretend to be familiar enough with pgbench internals to know if there's any reasonable reasons why we do each of the above, but... I guess you could work around this problem by just not putting the midnight time in your timestamp. However, that might not work so well if you want to specify a time other than midnight. David
Re: Apply LIMIT when computation is logically irrelevant
On Tue, 7 Jul 2020 at 00:43, Simon Riggs wrote: > > On Mon, 6 Jul 2020 at 12:37, Robins Tharakan wrote: > >> >> When an SQL needs to UNION constants on either side, it should be possible to >> implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect >> understanding, >> or something already discussed but rejected for some reason? >> >> This need came up while reviewing generated SQL, where the need was to >> return true when >> at least one of two lists had a row. A simplified version is given below: >> >> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class); >> vs. >> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- >> Faster > > > Those two queries aren't logically equivalent, so you can't apply the LIMIT 1 > as an optimization. > > First query returns lots of random rows, the second query returns just one > random row. I think the idea here is that because the target list contains only constants that pulling additional rows from the query after the first one will just be a duplicate row and never add any rows after the UNION is processed. David
Re: Apply LIMIT when computation is logically irrelevant
On Tue, 7 Jul 2020 at 09:03, Simon Riggs wrote: > > On Mon, 6 Jul 2020 at 21:49, David Rowley wrote: >> >> On Tue, 7 Jul 2020 at 00:43, Simon Riggs wrote: >> > >> > On Mon, 6 Jul 2020 at 12:37, Robins Tharakan wrote: >> >> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class); >> >> vs. >> >> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); >> >> -- Faster >> > >> > >> > Those two queries aren't logically equivalent, so you can't apply the >> > LIMIT 1 as an optimization. >> > >> > First query returns lots of random rows, the second query returns just one >> > random row. >> >> I think the idea here is that because the target list contains only >> constants that pulling additional rows from the query after the first >> one will just be a duplicate row and never add any rows after the >> UNION is processed. > > > OK, I see. Are you saying you think it's a worthwhile optimization to > autodetect? I guess it's all about how much effort would be involved to detect such cases vs how likely it is that we're going to speed up someone's query. I imagine it's not much effort to detect this, but also, this is the first time I recall seeing this mentioned, so perhaps that means not many people would be rewarded by making such a change. (It does seem like quite a strange way to express the query.) There is currently a patch floating around that implements UniqueKeys which allows RelOptInfos to be tagged with the properties that they're unique on. With the current design of that patch, there is no way to say "this relation *only* has duplicate rows". Perhaps some design tweaks there can make detecting this case cheaper in terms of CPU effort during planning, and perhaps also in terms of how much code it would take to make it work. I'll try to keep this in mind when I review that work soon. If we were to start adding optimisations for cases such as this, then I'd rather they were done in some general-purpose way that just makes them "just work" rather than adding special cases around the codebase that trigger some special behaviour. Likely in this case, it would still take at least some code when planning setops. Additionally, the setops code is badly in need of a rewrite, so the bar is likely pretty high on adding too many smarts in there that we need to migrate forwards after a rewrite. The setops planning code is still not making use of the upper planner pathification work that Tom did years ago. In many cases UNION would be far more optimal if it were implemented as an Index Scan -> MergeAppend -> Unique. I think right now we only consider Append -> Sort -> Unique and Append -> Hash Aggregate. David
Re: Postgresql-12 taking more time to execute the query
On Sat, 11 Jul 2020 at 02:48, Vishwa Kalyankar wrote: > Below is the output of the query explain and analyze result. You may find better help here if you follow the advice given in https://wiki.postgresql.org/wiki/Slow_Query_Questions David
Re: Join optimization
On Sun, 12 Jul 2020 at 06:59, wrote: > > I'm sorry for the bad example. > > Here is another, with some data on PG: > https://dbfiddle.uk/?rdbms=postgres_13&fiddle=ccfd1c4fa291e74a6db9db1772e2b5ac > and Oracle: > https://dbfiddle.uk/?rdbms=oracle_18&fiddle=21a98f499065ad4e2c35ff4bd1487e14. I believe what you're talking about is join removals. It appears as though Oracle is able to remove the inner join to the users table as the join only serves to check the user record exists. No columns are being selected. The record must exist due to the foreign key referencing users. PostgreSQL currently can only remove left joins. Likely what you could do here is just change your inner join into a left join. If you're happy enough that the user record will always exist then that should allow it to work. The reason PostgreSQL does not currently support inner join is that by default, foreign key constraints are only triggered at the end of the query, (or if deferred, at the end of the transaction). WIth PostgreSQL, it's possible for a SELECT query to see a violated foreign key constraint. This can happen if your SELECT query calls a function which updates a referenced record. The cascade of the foreign key won't occur until the end of the statement, so the select may stumble upon a violated foreign key. Here's a quick example of this case: drop table t1,t2; create table t1 (a int primary key); create table t2 (a int references t1 on update cascade); insert into t1 values(1),(2); insert into t2 values(2),(2); create or replace function update_t1 (p_a int) returns int as $$ begin update t1 set a = a + 1 where a = p_a; return p_a; end; $$ language plpgsql volatile; -- in theory, this should never return any rows as we're asking for all rows that -- don't exist in the referenced table. However, we do get a result since the function -- call updates t1 setting the row with a=2 to a=3. The cascade to t2 does not occur -- until the end of the statement. select update_t1(a+1),t1.a from t1 where not exists(select 1 from t2 where t1.a=t2.a); update_t1 | a ---+--- 2 | 1 (1 row) If you're happy that you don't have any functions like that which could momentarily cause the foreign key to appear violated, then there shouldn't be any harm in changing the INNER JOIN on users to a LEFT JOIN. PostgreSQL will be able to remove the join in that case. David
Re: some random() clarification needed
On Wed, 15 Jul 2020 at 04:01, Marc Millas wrote: > your answer helps me understand my first problem. > so, I rewrote a simple loop so as to avoid the "volatile" behaviour. Not sure what you're trying to do with the plpgsql, but you can just escape the multiple evaluations by putting the volatile function in a sub-select with no FROM clause. SELECT ... FROM ... WHERE id = (SELECT ceiling(random()*2582)); Or the more traditional way to get a random row is: SELECT ... FROM ... WHERE id BETWEEN 0 AND 2585 ORDER BY random() LIMIT 1; David
Re: Same query taking less time in low configuration machine
On Thu, 16 Jul 2020 at 09:50, Thomas Munro wrote: > I'd look at EXPLAIN (BUFFERS) or auto_explain.log_buffers to get more > information on cache hits and misses. I'd look for settings > differences with EXPLAIN (SETTINGS) to see if there's anything > accidentally set differently (maybe JIT or paralelism or something > like that). I'd look at pg_stat_activity repeatedly while it runs to > see what the processes are doing, especially the wait_event column. > I'd also look at the CPU and I/O on the systems with operating system > tools like iostat, htop, perf to try to find the difference. It might also be good to look at size of the tables and indexes that are looked at within the function. If the smaller end machine was loaded with data via pg_restore form a pg_dump taken from the larger machine then the indexes might be in much better shape and the heap may have less bloat. Of course, that's just speculation. We've not seen what the function does yet. Vishwa, it would be good if you could follow the guide here: https://wiki.postgresql.org/wiki/Slow_Query_Questions , in particular: "Post the definitions of all tables and indexes referenced in the query. If the query touches views or custom functions, we'll need those definitions as well. Run psql command "\d table" with the tables/views/indices referenced in the problem query." David
Re: Row estimates for empty tables
On Fri, 24 Jul 2020 at 16:01, Christophe Pettus wrote: > I realize I've never quite known this; where does the planner get the row > estimates for an empty table? Example: We just assume there are 10 pages if the relation has not yet been vacuumed or analyzed. The row estimates you see are the number of times 1 tuple is likely to fit onto a single page multiplied by the assumed 10 pages. If you had made your table wider then the planner would have assumed fewer rows There's a comment that justifies the 10 pages, which, as of master is in table_block_relation_estimate_size(). It'll be somewhere else in pg12. * HACK: if the relation has never yet been vacuumed, use a minimum size * estimate of 10 pages. The idea here is to avoid assuming a * newly-created table is really small, even if it currently is, because * that may not be true once some data gets loaded into it. Once a vacuum * or analyze cycle has been done on it, it's more reasonable to believe * the size is somewhat stable. * * (Note that this is only an issue if the plan gets cached and used again * after the table has been filled. What we're trying to avoid is using a * nestloop-type plan on a table that has grown substantially since the * plan was made. Normally, autovacuum/autoanalyze will occur once enough * inserts have happened and cause cached-plan invalidation; but that * doesn't happen instantaneously, and it won't happen at all for cases * such as temporary tables.) * * We approximate "never vacuumed" by "has relpages = 0", which means this * will also fire on genuinely empty relations. Not great, but * fortunately that's a seldom-seen case in the real world, and it * shouldn't degrade the quality of the plan too much anyway to err in * this direction. * * If the table has inheritance children, we don't apply this heuristic. * Totally empty parent tables are quite common, so we should be willing * to believe that they are empty. The code which decides if the table has been vacuumed here assumes it has not if pg_class.relpages == 0. So even if you were to manually vacuum the table the code here would think it's not yet been vacuumed. David
Re: bad JIT decision
On Sat, 25 Jul 2020 at 08:46, Scott Ribe wrote: > Given the magnitude of the miss in using JIT here, I am wondering: is it > possible that the planner does not properly take into account the cost of > JIT'ing a function for multiple partitions? Or is it that the planner doesn't > have enough info about the restrictiveness of conditions, and is therefore > anticipating running the functions against a great many rows? It does not really take into account the cost of jitting. If the total plan cost is above the jit threshold then jit is enabled. If not, then it's disabled. There are various levels of jit and various thresholds that can be tweaked, see: select name,setting from pg_settings where name like '%jit%'; But as far as each threshold goes, you either reach it or you don't. Maybe that can be made better by considering jit in a more cost-based way rather than by threshold, that way it might be possible to consider jit per plan node rather than on the query as a whole. e.g, if you have 1000 partitions and 999 of them have 1 row and the final one has 1 billion rows, then it's likely a waste of time to jit expressions for the 999 partitions. However, for now, you might just want to try raising various jit thresholds so that it only is enabled for more expensive plans. David
Re: bad JIT decision
On Sat, 25 Jul 2020 at 10:37, Tom Lane wrote: > > David Rowley writes: > > However, for now, you might just want to try raising various jit > > thresholds so that it only is enabled for more expensive plans. > > Yeah. I'm fairly convinced that the v12 defaults are far too low, > because we are constantly seeing complaints of this sort. I think plan cost overestimation is a common cause of unwanted jit too. It would be good to see the EXPLAIN ANALYZE so we knew if that was the case here. David
Re: bad JIT decision
On Sat, 25 Jul 2020 at 10:42, David Rowley wrote: > > On Sat, 25 Jul 2020 at 10:37, Tom Lane wrote: > > > > David Rowley writes: > > > However, for now, you might just want to try raising various jit > > > thresholds so that it only is enabled for more expensive plans. > > > > Yeah. I'm fairly convinced that the v12 defaults are far too low, > > because we are constantly seeing complaints of this sort. > > I think plan cost overestimation is a common cause of unwanted jit too. > > It would be good to see the EXPLAIN ANALYZE so we knew if that was the > case here. So Scott did send me the full EXPLAIN ANALYZE for this privately. He wishes to keep the full output private. After looking at it, it seems the portion that he pasted above, aka: -> Index Scan using equities_rds_id on equities e0 (cost=0.42..33.74 rows=1 width=37) (actual time=6751.892..6751.892 rows=0 loops=1) Index Cond: (rds_id = ANY ('{..., ..., ..., ...}'::uuid[])) Filter: (security_type = 'ETP'::text) Rows Removed by Filter: 4 Is nested at the bottom level join, about 6 joins deep. The lack of any row being found results in upper level joins not having to do anything, and the majority of the plan is (never executed). David
Re: bad JIT decision
On Sun, 26 Jul 2020 at 02:17, Tom Lane wrote: > > David Rowley writes: > > On Sat, 25 Jul 2020 at 10:42, David Rowley wrote: > >> I think plan cost overestimation is a common cause of unwanted jit too. > >> It would be good to see the EXPLAIN ANALYZE so we knew if that was the > >> case here. > > > So Scott did send me the full EXPLAIN ANALYZE for this privately. He > > wishes to keep the full output private. > > So ... what was the *top* line, ie total cost estimate? Hash Right Join (cost=1200566.17..1461446.31 rows=1651 width=141) (actual time=5881.944..5881.944 rows=0 loops=1) So well above the standard jit inline and optimize cost David
Re: bad JIT decision
On Sun, 26 Jul 2020 at 02:23, Tom Lane wrote: > > Andres Freund writes: > > On 2020-07-24 18:37:02 -0400, Tom Lane wrote: > >> Yeah. I'm fairly convinced that the v12 defaults are far too low, > >> because we are constantly seeing complaints of this sort. > > > I think the issue is more that we need to take into accoutn that the > > overhead of JITing scales ~linearly with the number of JITed > > expressions. And that's not done right now. I've had a patch somewhere > > that had a prototype implementation of changing the costing to be > > #expressions * some_cost, and I think that's a lot more accurate. > > Another thing we could try with much less effort is scaling it by the > number of relations in the query. There's already some code in the > plancache that tries to estimate planning effort that way, IIRC. > Such a scaling would be very legitimate for the cost of compiling > tuple-deconstruction code, and for other expressions it'd kind of > amount to an assumption that the expressions-per-table ratio is > roughly constant. If you don't like that, maybe some simple > nonlinear growth rule would work. I had imagined something a bit less all or nothing. I had thought that the planner could pretty cheaply choose if jit should occur or not on a per-Expr level. For WHERE clause items we know "norm_selec" and we know what baserestrictinfos come before this RestrictInfo, so we could estimate the number of executions per item in the WHERE clause. For Exprs in the targetlist we have the estimated rows from the RelOptInfo. HAVING clause Exprs will be evaluated a similar number of times. The planner could do something along the lines of assuming, say 1000 * cpu_operator_cost to compile an Expr then assume that a compiled Expr will be some percentage faster than an evaluated one and only jit when the Expr is likely to be evaluated enough times for it to be an overall win. Optimize and inline would just have higher thresholds. David
Re: bad JIT decision
On Sun, 26 Jul 2020 at 02:54, Tom Lane wrote: > > David Rowley writes: > > ... nested at the bottom level join, about 6 joins deep. The lack of > > any row being found results in upper level joins not having to do > > anything, and the majority of the plan is (never executed). > > On re-reading this, that last point struck me forcibly. If most of > the plan never gets executed, could we avoid compiling it? That is, > maybe JIT isn't JIT enough, and we should make compilation happen > at first use of an expression not during executor startup. That's interesting. But it would introduce an additional per evaluation cost of checking if we're doing the first execution. David
Re: bad JIT decision
On Tue, 28 Jul 2020 at 11:00, Andres Freund wrote: > > On 2020-07-25 10:54:18 -0400, Tom Lane wrote: > > David Rowley writes: > > > ... nested at the bottom level join, about 6 joins deep. The lack of > > > any row being found results in upper level joins not having to do > > > anything, and the majority of the plan is (never executed). > > > > On re-reading this, that last point struck me forcibly. If most of > > the plan never gets executed, could we avoid compiling it? That is, > > maybe JIT isn't JIT enough, and we should make compilation happen > > at first use of an expression not during executor startup. > > That unfortunately has its own downsides, in that there's significant > overhead of emitting code multiple times. I suspect that taking the > cost of all the JIT emissions together into account is the more > promising approach. Is there some reason that we can't consider jitting on a more granular basis? To me, it seems wrong to have a jit cost per expression and demand that the plan cost > #nexprs * jit_expr_cost before we do jit on anything. It'll make it pretty hard to predict when jit will occur and doing things like adding new partitions could suddenly cause jit to not enable for some query any more. ISTM a more granular approach would be better. For example, for the expression we expect to evaluate once, there's likely little point in jitting it, but for the one on some other relation that has more rows, where we expect to evaluate it 1 billion times, there's likely good reason to jit that. Wouldn't it be better to consider it at the RangeTblEntry level? David
Re: is JIT available
On Tue, 28 Jul 2020 at 04:18, Scott Ribe wrote: > > > On Jul 25, 2020, at 8:02 AM, Christoph Moench-Tegeder > > wrote: > > pg_jit_available() boolean is JIT compilation available in this session > > > > https://www.postgresql.org/docs/12/functions-info.html > > Thanks, that seems to be exactly what I was looking for. > > Even though the documentation is not clear, it does return false when jit = > on but PG was not compiled with JIT. If it's not clear we can certainly change it. I looked at the manual page. It says: "is JIT compilation available in this session (see Chapter 31)? Returns false if jit is set to false." Maybe this would be better? "returns true if jit is enabled and JIT compilation is available in this session (see Chapter 31)." Open to other suggestions. David
Re: is JIT available
On Tue, 28 Jul 2020 at 15:33, Tom Lane wrote: > > David Rowley writes: > > Maybe this would be better? > > > "returns true if jit is enabled and JIT compilation is available in > > this session (see Chapter 31)." > > The general, non-hacker meaning of "jit is enabled" would seem to > be pretty much what this function is already doing; and for that > matter, the same can be said for "JIT compilation is available". > We need something that's less tautological-looking. Maybe along > the lines of > > "returns true if a JIT compiler extension is available and the > jit parameter is set to on; That's probably better. FWIW, the "jit" is already a link to the GUC docs, so I had in mind that users would have known we meant "jit" the GUC rather than "jit" the feature. Your wording will help for anyone who thinks we're talking about the feature. > when this is true, JIT compilation will be performed." I'd probably drop this part since it's not really true. The query has to exceed the cost thresholds before that'll happen. David
Re: is JIT available
On Tue, 28 Jul 2020 at 15:55, David Rowley wrote: > > On Tue, 28 Jul 2020 at 15:33, Tom Lane wrote: > > > > David Rowley writes: > > > Maybe this would be better? > > > > > "returns true if jit is enabled and JIT compilation is available in > > > this session (see Chapter 31)." > > > > The general, non-hacker meaning of "jit is enabled" would seem to > > be pretty much what this function is already doing; and for that > > matter, the same can be said for "JIT compilation is available". > > We need something that's less tautological-looking. Maybe along > > the lines of > > > > "returns true if a JIT compiler extension is available and the > > jit parameter is set to on; > > That's probably better. FWIW, the "jit" is already a link to the GUC > docs, so I had in mind that users would have known we meant "jit" the > GUC rather than "jit" the feature. Your wording will help for anyone > who thinks we're talking about the feature. > > > when this is true, JIT compilation will be performed." > > I'd probably drop this part since it's not really true. The query has > to exceed the cost thresholds before that'll happen. I pushed a doc change for this with slightly revised wording from what you mentioned. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d7c8576ebe3949a644c700a9f54d88e7e373a647 David
Re: determine what column(s) form the primary key, in C extention
On Wed, 29 Jul 2020 at 03:45, alex maslakov wrote: > int i = -1; > while ((i = bms_next_member(pkattnos , i)) >= 0) { > /* do stuff with i */ > /* you'll need to use i - FirstLowInvalidHeapAttributeNumber to > get the pg_attribute.attnum */ > > > elog(INFO, "bms_next_member i: %d", i); > } > > prints 10 and then 9 > > Then: > >10 - FirstLowInvalidHeapAttributeNumber (-8) ==> 2 > >9 - FirstLowInvalidHeapAttributeNumber (-8) ==> 1 > > These are indexes of the columns, right? > > Do they start from 1, not from 0? User attributes start at 1. Have a look at the pg_attribute system catalogue table. The number you get will be the attnum column from that table. > (2) > > I'll use this C code as an example to build an extention in Rust. The > Postgresql bindings for Rust I have don't contain a definition of > `FirstLowInvalidHeapAttributeNumber` for some reason. I can define it > since it's simply single digit constant. > > However what does in some source files it's defined as (-7) and in some > as (-8)? Which should I use? It did recently change from -8 to -7 when we removed Oid as a system column in pg12. The number will never change on a major version, so you'll always know what it is for versions that have already been released. There's always a chance it'll change from -7 in some future PostgreSQL version though. David
Re: is JIT available
On Wed, 29 Jul 2020 at 00:26, Scott Ribe wrote: > But does compilation with JIT enable and LLVM dev tools mean that all the > LLVM compilation/optimization is built into the PG binaries, or does it > require LLVM presence on the machine where deployed? And if so, does the > function take that into account as well? It's not enough for just the build to have been built with jit enabled. The jit extension must also be present on the machine. I think the new wording in https://www.postgresql.org/docs/devel/functions-info.html conveys that: "Returns true if a JIT compiler extension is available (see Chapter 31) and the jit configuration parameter is set to on." David
Re: Costs of Heap Fetches in Postgres 13
On Fri, 31 Jul 2020 at 05:21, wrote: > does the planner do estimates about heap fetches on index only scans and > takes them into account? > At least in Pg 13? If so, is it possible to lower those costs? > random_page_costs seems not to have any influence. > Looks like they cause bad planner decisions. > Tuning Vacuum is not an option ;) You'll probably need to give us an example of this not working by means of an EXPLAIN output. The planner uses the ratio of blocks marked as all visible from pg_class.relallvisible and the current number of blocks in the relation and applies random_page_cost to the expected heap blocks it would read from an Index Scan offset by 1.0 - that ratio. e.g. If the table has 1000 blocks and you have 900 marked as all visible, and an index scan expects to read 200 blocks, then it will apply random_page_cost * 200 * (1.0 - (900.0 / 1000.0)). Which in this case is 20 blocks. Your all visible ratio here is 90%, 10% are not all visible, so 10% of 200 blocks is 20 blocks. If I mock up a case like that and then tweak random_page_cost, then I see the total cost changing just fine. I did only test in master, but we'll not have changed that since branching for PG13. Perhaps you've got some tablespace level random_page_cost set and you're not actually changing it? David
Re: bad JIT decision
On Wed, 29 Jul 2020 at 09:07, Andres Freund wrote: > On 2020-07-28 11:54:53 +1200, David Rowley wrote: > > Is there some reason that we can't consider jitting on a more granular > > basis? > > There's a substantial "constant" overhead of doing JIT. And that it's > nontrival to determine which parts of the query should be JITed in one > part, and which not. > > > > To me, it seems wrong to have a jit cost per expression and > > demand that the plan cost > #nexprs * jit_expr_cost before we do jit > > on anything. It'll make it pretty hard to predict when jit will occur > > and doing things like adding new partitions could suddenly cause jit > > to not enable for some query any more. > > I think that's the right answer though: I'm not quite sure why it would be so hard to do more granularly. Take this case, for example: create table listp (a int, b int) partition by list(a); create table listp1 partition of listp for values in(1); create table listp2 partition of listp for values in(2); insert into listp select 1,x from generate_Series(1,100) x; The EXPLAIN looks like: postgres=# explain select * from listp where b < 100; QUERY PLAN -- Append (cost=0.00..16967.51 rows=853 width=8) -> Seq Scan on listp1 listp_1 (cost=0.00..16925.00 rows=100 width=8) Filter: (b < 100) -> Seq Scan on listp2 listp_2 (cost=0.00..38.25 rows=753 width=8) Filter: (b < 100) (5 rows) For now, if the total cost of the plan exceeded the jit threshold, then we'd JIT all the expressions. If it didn't, we'd compile none of them. What we could do instead would just add the jitFlags field into struct Plan to indicate the JIT flags on a per plan node level and enable it as we do now based on the total_cost of that plan node rather than at the top-level of the plan as we do now in standard_planner(). The jitFlags setting code would be moved to the end of create_plan_recurse() instead. In this case, if we had the threshold set to 1, then we'd JIT for listp1 but not for listp2. I don't think this would even require a signature change in the jit_compile_expr() function as we can get access to the plan node from state->parent->plan to see which jitFlags are set, if any. David
Re: How to get the name of the current database in C function/extention inside a trigger?
On Mon, 3 Aug 2020 at 21:26, alex m wrote: > I'm writting a function/extention in C for a trigger. Inside a trigger, in C, > I want to get the name of the current database. However, not via SPI_exec(), > SPI_prepare() and the like, but more directly, in a more faster way. You can use MyDatabaseId in miscadmin.h If you git grep MyDatabaseId in src/contrib you can see some examples. A good tip for the future would be to just look at the source code of the built-in function and see how it does it. David
Re: bad JIT decision
On Wed, 29 Jul 2020 at 09:28, Andres Freund wrote: > FWIW, I created a demo workload for this, and repro'ed the issue with > that. Those improvements does make a very significant difference: > Before: > Timing: Generation 335.345 ms, Inlining 51.025 ms, Optimization 11967.776 > ms, Emission 9201.499 ms, Total 21555.645 ms > IR size: unoptimized: 9022868 bytes, optimized: 6206368 bytes > > After: > Timing: Generation 261.283 ms, Inlining 30.875 ms, Optimization 1671.969 > ms, Emission 18.557 ms, Total 1982.683 ms > IR size: unoptimized 8776100 bytes, optimized 115868 bytes That's a really impressive speedup. However, no matter how fast we make the compilation, it's still most likely to be a waste of time doing it for plan nodes that are just not that costly. I just wrote a patch to consider JIT on a per-plan-node basis instead of globally over the entire plan. I'll post it to -hackers. With a 1000 partition table where all of the cost is on just 1 partition, running a query that hits all partitions, I see: Master jit=on: JIT: Functions: 3002 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 141.587 ms, Inlining 11.760 ms, Optimization 6518.664 ms, Emission 3152.266 ms, Total 9824.277 ms Execution Time: 12588.292 ms Master jit=off: Execution Time: 3672.391 ms Patched jit=on: JIT: Functions: 5 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.675 ms, Inlining 3.322 ms, Optimization 10.766 ms, Emission 5.892 ms, Total 20.655 ms Execution Time: 2754.160 ms Most likely the EXPLAIN output will need to do something more than show true/false for the options here, but I didn't want to go to too much trouble unless this is seen as a good direction to go in. > That obviously needs to be improved further, but it's already a lot > better. In particular after these changes the generated code could be > cached. That would be a game-changer. David
Re: Postgres 12 - default value for text column
On Wed, 5 Aug 2020 at 08:36, Raj Gandhi wrote: > The following alter table with default set to very large text used to work in > Postgres 10 but fails in Postgres 12 with ERROR: row is too big: size 12960, > maximum size 8160 I didn't go to the trouble of debugging this, but I imagine this is due to "Allow ALTER TABLE to add a column with a non-null default without doing a table rewrite" mentioned in https://www.postgresql.org/docs/11/release-11.html In PG10 the table would have been rewritten when you add a NOT NULL column with a DEFAULT. From PG11 onwards no rewrite takes place and the default value is stored in pg_attribute. Since pg_attribute does not have a TOAST table, it's not possible to add NOT NULL columns which have default values that won't fit in a heap page. > The following two variants works in Postgres 12 without any error: > > create table test (id int); > alter table test1 add column license text > alter table test1 alter column license SET DEFAULT ' with size more than 8160 >' This behaves differently since existing rows won't receive the DEFAULT value. Only new rows will. PostgreSQL12 does not need to store the missing value in pg_attribute when you do this. Existing rows will just have a NULL value for the new column. > create table test (id int, license text DEFAULT ' size more than 8160 >' ); Likewise. No missing value needs to be stored here as no rows exist that need that value. Rows that are added with no value specified for the license column will just have the DEFAULT value, which is the one stored in pg_attrdef. David
Re: Understanding EXPLAIN ANALYZE estimates when loops != 1
On Thu, 20 Aug 2020 at 09:55, Philip Semanchuk wrote: > I could use some help interpreting EXPLAIN ANALYZE output. > > -> Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) > (actual time=0.006..0.918 rows=3760 loops=94) > > The actual rows returned by this plan node ~= 3760 * 94 = 353,440. Yes. It's total rows / loops rounded to the nearest integer number. > Did postgres expect (estimate) 3283 rows from this join, or 3283 * 94 = > 308,602? Yes, that's the case at least when the node is not a Parallel node. If this index scan was part of a parameterized nested loop, then you'll see the estimate of the number of expected loops from the outer side of the join. > Same question for this node. > > -> Parallel Index Scan using pk_xyz on xyz (cost=0.29..2354.67 rows=54285 > width=25) (actual time=0.049..6.326 rows=14864 loops=5) > > Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5? So parallel plans are a bit more complex. The row estimates are the total estimated rows / the amount of workers we expect to do useful work. You might expect the divisor there to be an integer number since you can't really have 0.5 workers. However, it's more complex than that since the leader has other tasks to take care of such as pulling tuples from workers, it's not dedicated to helping out. If you're into reading C code, then there's more information in https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c#L5699 , if you hunt around for usages of that function then you'll see the estimated row counts are divided by the return value of that function. David
Re: Understanding EXPLAIN ANALYZE estimates when loops != 1
On Fri, 21 Aug 2020 at 03:21, Philip Semanchuk wrote: > > > > > On Aug 19, 2020, at 6:24 PM, David Rowley wrote: > > > > On Thu, 20 Aug 2020 at 09:55, Philip Semanchuk > > wrote: > >> -> Parallel Index Scan using pk_xyz on xyz (cost=0.29..2354.67 > >> rows=54285 width=25) (actual time=0.049..6.326 rows=14864 loops=5) > >> > >> Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5? > > > > So parallel plans are a bit more complex. The row estimates are the > > total estimated rows / the amount of workers we expect to do useful > > work. You might expect the divisor there to be an integer number > > since you can't really have 0.5 workers. However, it's more complex > > than that since the leader has other tasks to take care of such as > > pulling tuples from workers, it's not dedicated to helping out. > > Sounds like it help to set max_parallel_workers = 1 before running EXPLAIN > ANALYZE in order to simplify the numbers, yes? Or is there a possibility that > doing so would send the planner down an entirely different path? Since the costs are also divided (see costsize.c) and the path costs are the basis on which paths the planner will choose to use, you're likely to see the plan changing. max_parallel_workers set to 1 wouldn't have been very helpful anyway since that's the leader process + 1 parallel worker resulting in the divisor of 1.7. If you need to, you can just reverse engineer the costs from assuming what get_parallel_divisor() will have returned. You can see it expects each worker to take up 30% of its time. leader + 1 worker = 1.7, leader + 2 workers = 2.4, leader + 3 workers = 3.1, leader + 4 workers = 4. You'll know the number of workers from "Workers Planned" in the EXPLAIN output. You'd need to do something else if you happen to run with parallel_leader_participation = off. David
Re: Query plan prefers hash join when nested loop is much faster
On Sat, 22 Aug 2020 at 00:35, iulian dragos wrote: > I am trying to understand why the query planner insists on using a hash join, > and how to make it choose the better option, which in this case would be a > nested loop. > | -> Index Scan using > test_result_module_result_id_idx on test_result (cost=0.57..6911.17 > rows=4331 width=12) (actual time=0.002..0.002 rows=1 loops=14824) | > | Index Cond: (module_result_id = > module_result.id) > | You might want to check if the pg_stats view reports a realistic n_distinct value for test_result.module_result_id. If the pg_class.retuples is correct for that relation then that would indicate the n_distinct estimate is about 115000. Going by the number of rows you've mentioned it would appear a more realistic value for that would be -0.4. which is 0 - 1 / (5 / 2.0). However, that's assuming each module_result has a test_result. You could run a SELECT COUNT(DISTINCT module_result_id) FROM test_result; to get a better idea. If ANALYZE is not getting you a good value for n_distinct, then you can overwrite it. See [1], search for n_distinct. David [1] https://www.postgresql.org/docs/current/sql-altertable.html
Re: Query plan prefers hash join when nested loop is much faster
On Tue, 25 Aug 2020 at 22:10, iulian dragos wrote: > Thanks for the tip! Indeed, `n_distinct` isn't right. I found it in pg_stats > set at 131736.0, but the actual number is much higher: 210104361. I tried to > set it manually, but the plan is still the same (both the actual number and a > percentage, -0.4, as you suggested): You'll need to run ANALYZE on the table after doing the ALTER TABLE to change the n_distinct. The ANALYZE writes the value to pg_statistic. ALTER TABLE only takes it as far as pg_attribute's attoptions. ANALYZE reads that column to see if the n_distinct estimate should be overwritten before writing out pg_statistic Just remember if you're hardcoding a positive value that it'll stay fixed until you change it. If the table is likely to grow, then you might want to reconsider using a positive value and consider using a negative value as mentioned in the doc link. David
Re: how to get top plan of GatherMerge in OSS10
On Fri, 4 Sep 2020 at 20:21, Yang, Rong wrote: > in OSS 10, how to make the top plan of the plan tree to GatherMerge with > ‘except all’ in sql? (I guess you're talking about PostgreSQL 10. I'm not sure what OSS 10 is.) The ability for Gather Merge to work with Init Plan values was only added in [1], which was new to PostgreSQL 11. So you're not going to make that work in PostgreSQL 10. You could try crafting the query in such a way that an Init plan is not used. e.g a CROSS JOIN, but not sure if that'll improve your performance any. So you could try reducing the parallel_tuple_cost a bit, which might give you a Gather, so at least the Seq Scan will be done in parallel. The sort will still be serial though. A btree index on table1 (c) looks like it might be worth considering. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e89a71fb44
Re: bug in PG13?
On Thu, 15 Oct 2020 at 03:53, Andreas Kretschmer wrote: > it seems to me a bug. i have a partitioned table: I've just pushed a fix [1] for this to master only (PG14+) The problem was that we only added the required information to allow the executor to perform run-time pruning to the Append/MergeAppend for the top-level Append. The example you've given actually did have a nested-Append at one point during planning. However, since the top-level Append only had a single sub-plan, it was removed and that single sub-plan was used instead. Since that single sub-plan happened to be an Append, there was no run-time pruning information to allow the executor to prune away the unneeded partitions. The fix for this was a bit too invasive to go backpatching it. Run-time pruning was coded purposefully to only prune on the top-level Append/Merge Append. In hindsight, that was likely a bad choice, but it was the choice that was made originally, so I'm leaning towards not classing this as a bug. After thinking about this all over again, it seems there are more legitimate reasons to have nested Append/Merge Appends than I had thought when I was originally working on run-time pruning, so it makes sense to allow run-time pruning on those to work going forward. Thanks for the report. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a929e17e5a8c9b751b66002c8a89fdebdacfe194
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
On Sun, 14 Feb 2021 at 13:15, Seamus Abshere wrote: > The comment from Robert says: (src/backend/optimizer/path/allpaths.c) > > /* > * If the use of parallel append is permitted, always request > at least > * log2(# of children) workers. > > In my case, every partition takes 1 second to scan, I have 64 cores, I have > 64 partitions, and the wall time is 8 seconds with 8 workers. > > I assume that if it it planned significantly more workers (16? 32? even 64?), > it would get significantly faster (even accounting for transaction cost). So > why doesn't it ask for more? Note that I've set max_parallel_workers=512, > etc. (postgresql.conf in my first message). There's perhaps an argument for allowing ALTER TABLE SET (parallel_workers=N); to be set on partitioned tables, but we don't currently allow it. What you might want to try is setting that for any of those 64 partitions. Shortly above the code comment that you quoted above, there's some code that finds the path for the partition with the maximum number of parallel workers. If one of those partitions is using, say 64 workers because you set the partitions "parallel_workers" setting to 64, and providing you have max_parallel_workers_per_gather set highly enough, then your Append should get 64 workers. You'll need to be careful though since changing the partitions parallel_workers may affect things for other queries too. Also, if you were to only change 1 partition and that partition were to be pruned, then you'd not get the 64 workers. David
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
On Mon, 15 Feb 2021 at 10:16, Gavin Flower wrote: > Just wondering why there is a hard coded limit. I don't see where the hardcoded limit is. The total number is limited to max_parallel_workers_per_gather, but there's nothing hardcoded about the value of that. > While I agree it might be good to be able specify the number of workers, > sure it would be possible to derive a suitable default based on the > number of effective processors available? It's a pretty tricky thing to get right. The problem is that something has to rationalise the use of parallel worker processes. Does it seem reasonable to you to use the sum of the Append child parallel workers? If so, I imagine someone else would think that would be pretty insane. We do have to consider the fact that we're trying to share those parallel worker processes with other backends which also might want to get some use out of them. As for if we need some rel option for partitioned tables. I think that's also tricky. Sure, we could go and add a "parallel_workers" relopt to partitioned tables, but then that's going to be applied regardless of how many partitions survive partition pruning. There could be as little as 2 subpaths in an Append, or the number could be in the thousands. I can't imagine anyone really wants the same number of parallel workers in each of those two cases. So I can understand why ab7271677 wanted to take into account the number of append children. Maybe there's some room for some other relopt that just changes the behaviour of that code. It does not seem too unreasonable that someone might like to take the sum of the Append child parallel workers. That value would still be capped at max_parallel_workers_per_gather, so it shouldn't ever go too insane unless someone set that GUC to something insane, which would be their choice. I'm not too sure which such a relopt would be called. Additionally, for the case being reported here. Since all Append children are foreign tables, there is actually some work going on to make it so workers don't have to sit by and wait until the foreign server returns the results. I don't think anyone would disagree that it's pretty poor use of a parallel worker to have it sit there doing nothing for minutes at a time waiting for a single tuple from a foreign data wrapper. I'm not sure of the status of that work, but if you want to learn more about it, please see [1] David [1] https://commitfest.postgresql.org/32/2491/
Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]
On Tue, 16 Feb 2021 at 02:12, Fabio Pardi wrote: > > On 14/02/2021 22:16, Gavin Flower wrote: > > While I agree it might be good to be able specify the number of workers, > > sure it would be possible to derive a suitable default based on the number > > of effective processors available? > > I had the same problem and my conclusion was that it is not possible to go > above 8 cores because of Amdahl's law on parallel computing. More here: > https://en.wikipedia.org/wiki/Amdahl%27s_law That would really depend on what the non-parallel part of the equation was. There are some plan shapes such as GROUP BY or aggregate queries with very few or just 1 group where the serial portion of the execution is very small indeed. David
Re: Deleting takes days, should I add some index?
On Fri, 26 Feb 2021 at 02:06, Alexander Farber wrote: > However the deletion still takes forever and I have to ctrl-c it: > > # delete from words_games where created < now() - interval '12 month'; > > Do you please have any further suggestions? > > When I try to prepend "explain analyze" to the above query, then in the > production database it also lasts forever. EXPLAIN with ANALYZE executes the query. So it'll probably to take just as long. Since your foreign keys perform a cascade delete on the tables referencing the tables you're deleting from, any records in those referencing tables will be deleted too. You must also look at those referencing tables and see what references those and index the column(s) which are referencing. Here's a simplified example that's easier to understand than your case. Setup: create table t1 (id int primary key); create table t2 (id int primary key, t1_id int not null references t1 on update cascade on delete cascade); create index on t2 (t1_id); create table t3 (id int primary key, t2_id int not null references t2 on update cascade on delete cascade); So I have 2 levels of reference. t2 -> t1 and t3 -> t2. If I remove a row from t1 then PostgreSQL must perform: DELETE FROM t2 WHERE t1_id = ; Luckily I indexed t2(t1_id), so that should be fast. Since t3 references t2, the database must also perform: DELETE FROM t3 WHERE t2_id = ; for the row that gets removed from t2. Unfortunately, I forgot to index t3(t2_id). Let me insert some data and see how the lack of index effects performance: insert into t1 select x from generate_Series(1,100) x; insert into t2 select x,x from generate_Series(1,100) x; insert into t3 select x,x from generate_Series(1,100) x; Delete 100 records. delete from t1 where id <= 100; DELETE 100 Time: 8048.975 ms (00:08.049) Pretty slow. create index on t3 (t2_id); CREATE INDEX (truncate t1 cascade and reinsert the data) delete from t1 where id <= 100; DELETE 100 Time: 5.151 ms Better. So, you need to follow each of the "Referenced by" from the table you're deleting from. In the \d output, just ignore the tables mentioned in "Foreign-key constraints:". Those are only checked on INSERT/UPDATE and must already contain a proper unique constraint and therefore index. David
Re: Pgbackrest version 2.28 Bug/Issue
On Tue, 2 Mar 2021 at 21:53, Brajendra Pratap Singh wrote: > We are getting the below issue while executing the backup through pgbackrest. > 2021-03-02 02:10:01.622 P00 ERROR: [056]: unable to find primary cluster - > cannot proceed That's not an error that's coming from PostgreSQL. You might have better luck asking the authors of pgbackrest. Check their website for details on how they want issues reported. David
Re: PostgreSQL : bug (ou not) in CTE Common Table Expressions or Subqueries in the FROM Clause
On Wed, 23 Mar 2022 at 04:03, PALAYRET Jacques wrote: > For example, in Oracle it is not possible to have a SELECT statement without > a FROM clause (using DUAL table), so maybe " ambiguous " columns are not > handled in the same way. But if your original complaint, the column was not ambiguous. The CTE had no column named "id". I really highly recommend that you write queries giving each relation a short alias then make a habit to always prefix your column names with the alias. If you don't do this then you can have all sorts of problems when you one day want to start dropping old unused columns out of your tables. If you alias these then dropping the columns will properly cause queries that reference these columns to ERROR. Without aliases, your queries might just start doing something you don't want them to do and you might not realise that for a very long time. David
Re: Transaction and SQL errors
On Mon, 4 Apr 2022 at 23:33, Sebastien Flaesch wrote: > Any arguments I could give to programmers? Maybe: > "You better replay all SQL of the whole transaction... (and make them as > short as possible!)" > ? There are SAVEPOINTs [1]. David [1] https://www.postgresql.org/docs/current/sql-savepoint.html
Re: Select .... where id not in (....) returns 0 incorrectly
On Tue, 5 Apr 2022 at 01:21, J. Roeleveld wrote: > Personally, I think NULL should be treated as a seperate value and not lead to > strange behaviour. I think the rationale behind IN and NOT IN are that c IN(1,2,3) is equivalent of writing: c = 1 OR c = 2 OR c = 3, whereas NOT IN(1,2,3) would be the same as c <> 1 AND c <> 2 AND c <> 3. You can imagine what would happen in the latter case if you replaced 3 with NULL. "c <> NULL" is NULL therefore, due to the quals being ANDed, will cause the WHERE clause not to match anything. In any case, it's what the SQL standard says, so that's the way we do it. David
Re: unoptimized nested loops
On Wed, 1 Jun 2022 at 08:04, Tim Kelly wrote: > -> Seq Scan on data (cost=0.00..47132.93 rows=125 width=135) > (actual time=0.542..182952.708 rows=1167810 loops=1) > Filter: (data ~~ '%some text%'::text) The problem is coming from the 125 row estimate in the above plan fragment. Because the number of estimated rows is low, the query planner likely thinks a Nested Loop join is best. What I'd do first is verify that some other join method is better by running the query after having done: SET enable_nestloop TO off; RESET enble_nestloop; If the query then runs faster then it's going to be worth doing something about trying to improve those statistics. I see the like pattern matching selectivity estimation code does look at histogram buckets, so you might have luck if you increase the statistics targets on this column: ALTER TABLE data ALTER COLUMN data SET STATISTICS 1000; ANALYZE data; The standard number of buckets is 100. The above will set it to 1000. You can go as high as 1, but going too high is going to slow down the planner, so you should only go as high as you need to go. David
Re: Unique index prohibits partial aggregates
On Mon, 27 Jun 2022 at 23:49, Bos, Fred wrote: > "Settings: effective_cache_size = '128GB', force_parallel_mode = 'on', > max_parallel_workers = '40', You shouldn't be using force_parallel_mode. It does not do what you think. See the documentation for that GUC, or read [1] > I expected the query to become faster > with a unique index or column, so why does the query planner decide on group > aggregation instead of partial aggregation? It just simply does not know how many groups are likely to exists on your expression. Statistics are only gathered on bare columns. The planner has no idea how many groups are likely to exist for "t/(1000*3600*24)". In PostgreSQL 14 and above you could create extended statistics for the expression using: create statistics t1_t_stats (ndistinct) on (t/(1000*3600*24)) from bhload_nohyp_noin; -- pg14 analyze bhload_nohyp_noin; for your version (pg13), you'd need to create an expression index. create index on bhload_nohyp_noin using brin ((t/(1000*3600*24))); analyze bhload_nohyp_noin; I added "using brin" as a brin index is going to be a bit more lightweight than a btree index. You only need the index to instruct ANALYZE to gather statistics. You might also want to ramp up the pages per range. With that, the planner might then realise that parallel aggregate might be worthwhile. David [1] https://www.enterprisedb.com/postgres-tutorials/using-forceparallelmode-correctly-postgresql
Re: Unique index prohibits partial aggregates
On Wed, 29 Jun 2022 at 00:45, Bos, Fred wrote: > Finally, is there a way to force postgres to do the partial hash aggregate, > either by changing a setting or by influencing the expected amount of output > groups for each query? You could do something like: ALTER TABLE bhload_nohyp_noin ALTER COLUMN t SET (n_distinct = 200); ANALYZE bhload_nohyp_noin; Please be aware that this may have detrimental effects if you do any joins or group bys directly on this column. Otherwise, providing you don't have a unique index on that column, then it should trick the planner into thinking there will be fewer groups than it currently thinks there will be, which will likely result in the parallel plan that you desire. David
Re: Multiple Indexes
On Thu, 7 Jul 2022 at 04:07, DAVID ROTH wrote: > I understand the planner can use multiple indexes to get the best plan. > Can someone point me a paper that explains how this works. I don't know of a paper, but if you're talking about using multiple indexes to scan a single relation in order to satisfy a condition such as; WHERE a = 5 OR b = 12; then the query planner is able to make use of "bitmap index scans". A bitmap index scan simply scans an index type which supports such scans and collects a set of ctids. For this example, providing there's a suitable index on the "a" column and another on the "b" column, the planner may choose to perform a bitmap index scan on the "a" index and another on the "b" index then perform a bitmap OR operation to obtain the intersecting ctids. The heap of the table can then be scanned to fetch the intersecting ctids. A ctid is the physical (more physiological) address of a tuple the heap of a table. David
Re: limits, indexes, views and query planner
On Mon, 15 Aug 2022 at 20:30, Marc Mamin wrote: > in the example below, we can see that the view test_ab prevents the usage of > the index to retrieve the top last rows. > This is a special case, as the where clause excludes data from the second > table, and the explain output do not references it at all. > I wonder if the planner could be able to exclude the table_b earlier in its > plan and to possibly fallback to a plan equivalent to the first one. > with a view on a single table (test_av), the index is used. I think you might be confusing UNION and UNION ALL. PostgreSQL is under no obligation to uniquify the final result in your version of the view with a single table, but it is when you UNION both tables together, regardless of if one of the scans causes one of the union branches to be removed or not. Having said that, the planner still could do a better job of this as it could have done Limit -> Unique -> Merge Append -> Index Scan. I do have some work in progress code to rewrite the union planner so that it able to request pre-sorted input to allow Merge Append to be used rather than Unique -> Sort. I'm not sure if it'll help this case or not. David
Re: ***SPAM*** Re: Can I get the number of results plus the results with a single query?
On Tue, 16 Aug 2022 at 21:15, Walter Dörwald wrote: > select count(*) over (), e.* from email.email e; Depending on the complexity of the query, putting the count(*) as a subquery in the SELECT clause might execute more quickly. i.e. select (select count(*) from email.email) c, * from email.emails; A non-correlated subquery will become an "initplan", which will execute only once. The reason the window aggregate might be slower is due to the fact that internally PostgreSQL will store tuples in the window frame in a tuplestore. In this example, all rows will be put in that tuple store at once. These tuple stores will spill to disk when they exceed work_mem. On the other hand, the window aggregate version could be faster if fetching each tuple from the outer query was sufficiently more expensive than storing it in a tuple store. That could easily happen if the query contained expensive joins or many rows were filtered out. That could be more expensive as the subquery version would have to do that twice whereas the window agg version would only have to do that once. David
Re: Different execution plan between PostgreSQL 8.2 and 12.5
On Thu, 18 Aug 2022 at 15:32, Tom Lane wrote: > The 12.5 plan looks like it thinks that the join condition is not > hashable --- and probably not mergeable as well, else it would have > done a mergejoin. This is odd if we assume that the lower() > outputs are just text. But you haven't said anything about the > data types involved, nor what locale setting you're using, nor > what nondefault settings or extensions you might be using, so > speculation about the cause would just be speculation. In addition to that, I couldn't help notice that the quoted SQL does not seem to belong to the explain. The EXPLAIN has a Limit node, but the query does not. I'm assuming this isn't due to the relations being views since we don't pull up subqueries with a LIMIT. The costs for the 12.5 are cheaper than 8.4's, so I imagine the more likely cause is the planner favouring an early startup plan. It's probably more likely that lower() is providing the planner with bad estimates and there's likely far less than the expected rows, resulting in the LIMIT 1 being a much larger proportion of the total rows than the planner expects. David
Re: Restriction on table partition expressions
On Fri, 26 Aug 2022 at 03:08, James Vanns wrote: > CREATE TABLE foobar( > id BIGINT NOT NULL PRIMARY KEY, > baz VARCHAR NULL DEFAULT NULL > ) PARTITION BY HASH(my_func(id)); > > Error: primary key constraints cannot be used when partition keys > include expressions. > I couldn't find much, if anything, about using expressions in table > partitions let alone describing the restriction. Can anyone enlighten > me? Or point me to what I've missed! Also, is there a chance that this > limitation will be relaxed in the future? This is mentioned in [1]: "Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions." Maybe that could be more clear and mention that all the primary key columns must be present and not be part of a function call or expression. David [1] https://www.postgresql.org/docs/13/ddl-partitioning.html
Re: Restriction on table partition expressions
On Fri, 26 Aug 2022 at 03:08, James Vanns wrote: > Also, is there a chance that this > limitation will be relaxed in the future? (forgot to answer this part) Certainly not in the near future, I'm afraid. It would require allowing a single index to exist over multiple tables. There has been discussions about this in the past and the general thoughts are that if you have a single index over all partitions, then it massively detracts from the advantages of partitioning. With partitioning, you can DETACH or DROP a partition and get rid of all the data quickly in a single metadata operation. If you have an index over all partitions then that operation is no longer a metadata-only operation. It suddenly needs to go and remove or invalidate all records pointing to the partition you want to detach/drop. David