> No. The subplan estimates are for the number of rows produced by one
> execution of the subplan, ie the numbers of "accounts" or "contracts"
> rows that match those inner WHERE conditions. This has very little
> a-priori relationship to the number of "transactions" rows that will
> satisfy the outer WHERE condition. If we knew that transactions.account
> and transactions.contract were unique columns, then we might be able
> to say that there shouldn't be more than one outer match per subplan
> result row ... but you didn't say that, and it seems unlikely.
Thanks Tom, I understand your point.
I don't want to waste your time but maybe there is room for improvement as both
"account" and "contract" are highly distinct and the individual subplan
estimates are quite accurate:
Seq Scan on transactions (cost=67.81..171458.63 rows=1301316 width=1206)
(actual time=69.418..917.594 rows=112 loops=1)
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
Rows Removed by Filter: 1792937
SubPlan 1
-> Bitmap Heap Scan on accounts (cost=33.96..61.76 rows=46 width=4)
(actual time=3.053..3.292 rows=111 loops=1)
Recheck Cond: ((name)::text ~~* '%test%'::text)
Rows Removed by Index Recheck: 4
Heap Blocks: exact=104
-> Bitmap Index Scan on s_accounts (cost=0.00..33.95 rows=46
width=0) (actual time=0.505..0.505 rows=118 loops=1)
Index Cond: ((name)::text ~~* '%test%'::text)
SubPlan 2
-> Seq Scan on contracts (cost=0.00..5.93 rows=5 width=4) (actual
time=2.531..2.836 rows=4 loops=1)
Filter: ((name)::text ~~* '%test%'::text)
Rows Removed by Filter: 272
For comparison here are the plans for the queries with the individual where
clauses:
SELECT * FROM "transactions" WHERE "account" IN (SELECT "ID" FROM "accounts"
WHERE "name" ~~* '%test%')
Nested Loop (cost=34.38..488.93 rows=155 width=1206) (actual time=0.599..1.393
rows=112 loops=1)
-> Bitmap Heap Scan on accounts (cost=33.96..61.76 rows=46 width=4) (actual
time=0.541..0.796 rows=111 loops=1)
Recheck Cond: ((name)::text ~~* '%test%'::text)
Rows Removed by Index Recheck: 4
Heap Blocks: exact=104
-> Bitmap Index Scan on s_accounts (cost=0.00..33.95 rows=46 width=0)
(actual time=0.521..0.521 rows=118 loops=1)
Index Cond: ((name)::text ~~* '%test%'::text)
-> Index Scan using fk_transactions_account on transactions
(cost=0.43..9.08 rows=21 width=1206) (actual time=0.004..0.005 rows=1 loops=111)
Index Cond: (account = accounts."ID")
SELECT * FROM "transactions" WHERE "contract" IN (SELECT "ID" FROM "contracts"
WHERE "name" ~~* '%test%')
Nested Loop (cost=3.76..10.10 rows=31662 width=1206) (actual time=0.082..0.082
rows=0 loops=1)
-> Bitmap Heap Scan on contracts (cost=3.64..5.74 rows=5 width=4) (actual
time=0.069..0.075 rows=4 loops=1)
Recheck Cond: ((name)::text ~~* '%test%'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on s_contracts (cost=0.00..3.64 rows=5 width=0)
(actual time=0.060..0.060 rows=4 loops=1)
Index Cond: ((name)::text ~~* '%test%'::text)
-> Index Scan using fk_transactions_contract on transactions
(cost=0.12..0.86 rows=1 width=1206) (actual time=0.001..0.001 rows=0 loops=4)
Index Cond: (contract = contracts."ID")
The statistics for the columns are:
SELECT attname, null_frac, n_distinct from pg_stats WHERE tablename =
'transactions' AND attname IN ('account', 'contract')
transactions.account: null_frac=0.025 n_distinct=80277
transactions.contract: null_frac=1 n_distinct=0 (there are basically
no non-null values for field "contract" in transactions)
According to pg_class.reltuples the table "transactions" has 1735088 rows.
I'd naively expect the selectivity for an OR of those two hashed subplans given
uniform distribution to be:
rows_total =
((rows_transactions * (1 - null_frac_account)) / n_distinct_account) *
expected_rows_from_subplan1 +
((rows_transactions * (1 - null_frac_contract)) / n_distinct_contract)
* expected_rows_from_subplan2
=> rows_total =
((1735088 * (1 - 0.025)) / 80277) * 46 +
((1735088 * (1 - 1)) / 0) * 5
=> rows_total = 969 + 0 /* no non-null values for contract field */
Please forgive the sloppy math but something along this line could be promising.
Btw, I don't quite understand why the nested loop on contract only is expected
to yield 31662 rows, when the null_frac of field transactions.contract is 1.
Shouldn't that indicate zero rows or some kind of default minimum estimate for
that query?
Thanks again!
Benjamin Coutu