Dear Sawada-san,
(Sending again because blocked by some rules)
I ran the performance testing independently for the 0001 patch. Overall
performance looked
very nice, new function spent O(1) time based on the total number of tables.
It seems good enough.
Source code:
----------------
HEAD (4287c50f) + v4-0001 patch.
Setup:
---------
A database cluster was set up with shared_buffers=100GB. Several tables were
defined on the public schema, and same number of tables were on the sch1.
Total number of tables were {50, 500, 5000, 50000}.
A publication included a schema sch1 and all public tables individually.
Attached script setup the same. The suffix is changed to .txt to pass the rule.
Workload Run:
--------------------
I ran two types of SQLs and measured the execution time via \timing metacommand.
Cases were emulated which tablesync worker would do.
Case 1: old SQL
```
SELECT DISTINCT
(CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)
THEN NULL ELSE gpt.attrs END)
FROM pg_publication p,
LATERAL pg_get_publication_tables(p.pubname) gpt,
pg_class c
WHERE gpt.relid = 17885 AND c.oid = gpt.relid
AND p.pubname IN ( 'pub' );
```
Case 2: new SQL
```
SELECT DISTINCT
(CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)
THEN NULL ELSE gpt.attrs END)
FROM pg_publication p,
LATERAL pg_get_publication_tables(p.pubname, 16535) gpt,
pg_class c
WHERE c.oid = gpt.relid
AND p.pubname IN ( 'pub' );
```
Result Observations:
---------------
Attached bar graph shows the result. A logarithmic scale is used for the
execution
time (y-axis) to see both small/large scale case. The spent time became
approximately
10x longer for 500->5000, and 5000->50000, in case of old SQL is used.
Apart from that, the spent time for the new SQL is mostly the stable based on
the
number of tables.
Detailed Result:
--------------
Each cell are the median of 10 runs.
Total tables Execution time for the old SQL was done [ms] Execution time
for the old SQL was done [ms]
50 5.77 4.19
500 15.75 4.28
5000 120.39 4.22
50000 1741.89 4.60
500000 73287.16 4.95
Also, here is a small code comment. I think we can have an Assert at the
begining of the pg_get_publication_tables(), something like below.
```
@@ -1392,6 +1392,9 @@ pg_get_publication_tables(FunctionCallInfo fcinfo,
ArrayType *pubnames,
FuncCallContext *funcctx;
List *table_infos = NIL;
+ Assert((pubnames && (!pubname && !OidIsValid(target_relid))) ||
+ (!pubnames && (pubname && OidIsValid(target_relid))));
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED
#!/bin/bash
####################
### Declarations ###
####################
## Publisher-related params
PORT_PUB=6633
DATA_PUB=data_pub
LOG_PUB=pub.log
## Number of runs
NUMRUN=1
## Measurement params
NUMTABLES=25000
# Setup an instance with above parameters.
function setup () {
################
### clean up ###
################
pg_ctl stop -D $DATA_PUB -w
rm -rf $DATA_PUB $LOG_PUB
#######################
### setup publisher ###
#######################
initdb -D data_pub -U postgres
cat << EOF >> data_pub/postgresql.conf
port=$PORT_PUB
autovacuum = false
shared_buffers = '100GB'
max_wal_size = 20GB
min_wal_size = 10GB
wal_level = logical
EOF
pg_ctl -D $DATA_PUB start -w -l $LOG_PUB
(
echo "CREATE SCHEMA sch1;"
echo "SELECT 'CREATE TABLE tab_' || generate_series(1, $NUMTABLES) ||
'(id int primary key)' ; \gexec"
echo "SELECT 'CREATE TABLE sch1.tab_' || generate_series(1, $NUMTABLES)
|| '(id int primary key)' ; \gexec"
echo "CREATE PUBLICATION pub FOR TABLES IN SCHEMA sch1;"
echo "SELECT 'ALTER PUBLICATION pub ADD TABLE tab_' ||
generate_series(1, $NUMTABLES) ; \gexec"
) | psql -U postgres -p $PORT_PUB
}
setup