Re: Index on (fixed size) bytea value
> > > Then you would ALTER the column and SET STORAGE MAIN, so that it does not > ever use TOAST. > > The size limit for a row would then be 8kB minus page header minus row > header, which > should be somewhere in the vicinity of 8140 bytes. > > If you want your block size to be a power of two, the limit would be 4kB, > which would waste > almost half your storage space. > Oh I see. So if I want to save some space for future columns, then storing about 7500 bytes in the "block bytea" column would be close to optimal, utilizing more than 90% of the block space. I guess that the fillfactor setting will have no effect on this table, and it does not matter if I set it or not.
Re: Merge David and Goliath tables efficiently
> This is absolutely expected. If you partition by hash (id, part_key), > you can't join on (id) and expect partitionwise join to work. To > quote > the enable_partitionwise_join documentation [1]: > > Enables or disables the query planner's use of partitionwise > join, > which allows a join between partitioned tables to be performed by > joining the matching partitions. Partitionwise join currently > applies only when the join conditions include all the partition > keys, which must be of the same data type and have one-to-one > matching sets of child partitions. > > So the fact that > > merge into goliath using david on david.id = goliath.id > when matched then update set val = david.val > when not matched then insert (id, val) values (david.id, > david.val); > > does not work is absolutely expected. You need to join on part_col > too. Definitely this makes sense to add the part_col in the join columns. Also it helps the planner to choose a better plan, since now it goes with per partition nested loop without having to trick the costs (either enable_hashjoin/random_page_cost), with my current workload so far. Thanks you goliath -- david
Re: Merge David and Goliath tables efficiently
On 6/20/23 12:02, nicolas paris wrote: >... > > Definitely this makes sense to add the part_col in the join columns. > Also it helps the planner to choose a better plan, since now it goes > with per partition nested loop without having to trick the costs > (either enable_hashjoin/random_page_cost), with my current workload so > far. > Right. With non-partitionwise join the nestloop inner lookup has to do indexscan on every partition (it can't decide which of the partitions will have a match, and for costing we assume there's at least 1 row in each lookup). Which essentially amplifies the amount of random I/O by a factor of 100x (or whatever the number of partitions is). That is, instead of doing 100x nested loops like this: -> Nested Loop Left Join (cost=0.29..33.42 rows=8 width=47) -> Seq Scan on david_98 david_99 (cost=0.00..1.08 -> Index Scan using goliath_98_id_part_col_idx on Index Cond: ((id = david_99.id) AND ...) we end up doing one nested loop with an inner lookup like this -> Append (cost=0.29..557.63 rows=100 width=14) -> Index Scan using ... goliath_1 (cost=0.29..5.57 ... Index Cond: (id = david.id) ... And this is per-loop, of which there'll be 500 (because the small david table has 500 rows). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
