Re: Index on (fixed size) bytea value

2023-06-20 Thread Les
>
>
> Then you would ALTER the column and SET STORAGE MAIN, so that it does not
> ever use TOAST.
>
> The size limit for a row would then be 8kB minus page header minus row
> header, which
> should be somewhere in the vicinity of 8140 bytes.
>
> If you want your block size to be a power of two, the limit would be 4kB,
> which would waste
> almost half your storage space.
>

Oh I see. So if I want to save some space for future columns, then storing
about 7500 bytes in the "block bytea" column would be close to optimal,
utilizing more than 90% of the block space. I guess that the fillfactor
setting will have no effect on this table, and it does not matter if I set
it or not.


Re: Merge David and Goliath tables efficiently

2023-06-20 Thread nicolas paris
> This is absolutely expected. If you partition by hash (id, part_key),
> you can't join on (id) and expect partitionwise join to work. To
> quote
> the enable_partitionwise_join documentation [1]:
> 
>     Enables or disables the query planner's use of partitionwise
> join,
>     which allows a join between partitioned tables to be performed by
>     joining the matching partitions. Partitionwise join currently
>     applies only when the join conditions include all the partition
>     keys, which must be of the same data type and have one-to-one
>     matching sets of child partitions.
> 
> So the fact that
> 
>     merge into goliath using david on david.id = goliath.id
>     when matched then update set val = david.val
>     when not matched then insert (id, val) values (david.id,
> david.val);
> 
> does not work is absolutely expected. You need to join on part_col
> too.

Definitely this makes sense to add the part_col in the join columns.
Also it helps the planner to choose a better plan, since now it goes
with per partition nested loop without having to trick the costs
(either enable_hashjoin/random_page_cost), with my current workload so
far.



Thanks you goliath


-- david




Re: Merge David and Goliath tables efficiently

2023-06-20 Thread Tomas Vondra
On 6/20/23 12:02, nicolas paris wrote:
>...
>
> Definitely this makes sense to add the part_col in the join columns.
> Also it helps the planner to choose a better plan, since now it goes
> with per partition nested loop without having to trick the costs
> (either enable_hashjoin/random_page_cost), with my current workload so
> far.
>

Right. With non-partitionwise join the nestloop inner lookup has to do
indexscan on every partition (it can't decide which of the partitions
will have a match, and for costing we assume there's at least 1 row in
each lookup). Which essentially amplifies the amount of random I/O by a
factor of 100x (or whatever the number of partitions is).

That is, instead of doing 100x nested loops like this:

->  Nested Loop Left Join  (cost=0.29..33.42 rows=8 width=47)
  ->  Seq Scan on david_98 david_99  (cost=0.00..1.08
  ->  Index Scan using goliath_98_id_part_col_idx on
Index Cond: ((id = david_99.id) AND ...)

we end up doing one nested loop with an inner lookup like this

->  Append  (cost=0.29..557.63 rows=100 width=14)
 ->  Index Scan using ... goliath_1  (cost=0.29..5.57 ...
 Index Cond: (id = david.id)
 ...

And this is per-loop, of which there'll be 500 (because the small david
table has 500 rows).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company