More records after sort

2021-07-20 Thread Nicolas Seinlet
Hello,

I'm facing a strange behaviour and I don't understand why. From a wider query 
plan, https://explain.dalibo.com/plan/7yh ,a table is scanned and filter based 
on its primary key, returning 98 records. Then, those records are sorted on 
another field, and the output is 758,247,643 records.

What can make PostgreSQL to generate records while sorting? What can I do to 
understand why this is occuring?

   ->  Sort  (cost=228.52..228.76 rows=98 width=16) 
(actual time=1.502..31540.651 rows=758247643 loops=1)
 Sort Key: i.currency_id
 Sort Method: quicksort  Memory: 29kB
 ->  Index Scan using account_invoice_pkey on 
account_invoice i  (cost=0.29..225.28 rows=98 width=16) (actual 
time=0.035..1.481 rows=98 loops=1)
   Index Cond: (id = ANY 
('{3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,11736,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627}'::integer[]))

I can of course describe data model or full query.

Thanks again for reading me,

Nicolas.

publickey - nicolas@seinlet.com - 0xCAEB7FAF.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: More records after sort

2021-07-20 Thread Tom Lane
Nicolas Seinlet  writes:
> I'm facing a strange behaviour and I don't understand why. From a wider query 
> plan, https://explain.dalibo.com/plan/7yh ,a table is scanned and filter 
> based on its primary key, returning 98 records. Then, those records are 
> sorted on another field, and the output is 758,247,643 records.

You haven't showed us the whole plan; tsk tsk.

However, I'm going to bet that this sort is the inner input of a merge
join, and that the apparent "extra" tuples are caused by the merge backing
up and rescanning the sort result to deal with duplicate join keys in its
outer input.  There must be a heck of a lot of duplicates.  The planner
would typically avoid using merge join if it knew that, so I'm wondering
if you have up-to-date statistics for the outer side of the join.

regards, tom lane




Re: More records after sort

2021-07-20 Thread Nicolas Seinlet
On Tuesday, July 20th, 2021 at 16:34, Tom Lane  wrote:

> Nicolas Seinlet nico...@seinlet.com writes:
> 

> > I'm facing a strange behaviour and I don't understand why. From a wider 
> > query plan, https://explain.dalibo.com/plan/7yh ,a table is scanned and 
> > filter based on its primary key, returning 98 records. Then, those records 
> > are sorted on another field, and the output is 758,247,643 records.
> 

> You haven't showed us the whole plan; tsk tsk.
> 

> However, I'm going to bet that this sort is the inner input of a merge
> 

> join, and that the apparent "extra" tuples are caused by the merge backing
> 

> up and rescanning the sort result to deal with duplicate join keys in its
> 

> outer input. There must be a heck of a lot of duplicates. The planner
> 

> would typically avoid using merge join if it knew that, so I'm wondering
> 

> if you have up-to-date statistics for the outer side of the join.
> 

> regards, tom lane

Many thanks for the fast response. The full query plan is below, and the Dalibo 
link points to it as well. You're right, it's a merge join. One table doesn't 
have up-to-date statistics, because the table is filled during the transaction 
involving this query.

A workaround we found, but we don't understand why and it's frustrating, is to 
remove a join from the query (the one with currency table) and replace it with 
a new column in the table for which the sort generates records. the currency 
table has up to date statistics, is vacuumed, ...

The table with no up-to-date statistics is involved in a where clause:
AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id)
AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id)

Query:
INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond)
SELECT il.id, ml.id, 48
FROM account_invoice_line il
JOIN account_invoice i ON i.id = il.invoice_id
JOIN account_move m ON m.id = i.move_id
JOIN account_move_line ml ON ml.move_id = m.id
JOIN res_company comp ON comp.id = i.company_id
WHERE il.display_type IS NULL
  AND ml.tax_line_id IS NULL
  AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE 
invl_id=il.id)
  AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE 
aml_id=ml.id)
  AND i.id = 
ANY(ARRAY[3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,11736,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627])
  AND il.account_id = ml.account_id AND (
 ARRAY(SELECT r.tax_id
 FROM account_invoice_line_tax r
WHERE r.invoice_line_id = il.id
 ORDER BY r.tax_id)
 =
 ARRAY(SELECT r.account_tax_id
 FROM account_move_line_account_tax_rel r
WHERE r.account_move_line_id = ml.id
 ORDER BY r.account_tax_id)
) AND
ROUND(il.price_subtotal - ml._mig_124_precomputed_amount,
  i.decimal_places) = 0.0
 AND il.product_id = ml.product_id;

Plan:
-
--
 Insert on invl_aml_mapping_temp  (cost=241.69..505.08 rows=1 width=12) (actual 
time=185635.849..185635.883 rows=0 loops=1)
   ->  Nested Loop Anti Join  (cost=241.69..505.08 rows=1 width=12) (actual 
time=185635.847..185635.880 rows=0 loops=1)
 ->  Nested Loop  (cost=241.27..504.61 rows=1 width=8) (actual 
time=185635.846..185635.873 rows=0 loops=1)
   ->  Nested Loop  (cost=241.15..504.43 rows=1 width=12) (actual 
time=185635.839..185635.866 rows=0 loops=1)
 Join Filter: (i.move_id = m.id)
 ->  Merge Join  (cost=240.73..503.98 rows=1 width=20) 
(actual time=185635.815..185635.836 rows=0 loops=1)
   Merge Cond: (res_currency.id = i.currency_id)
   Join Filter: ((il.invoice_id = i.id) AN

Re: More records after sort

2021-07-20 Thread Tom Lane
Nicolas Seinlet  writes:
> Many thanks for the fast response. The full query plan is below, and the 
> Dalibo link points to it as well. You're right, it's a merge join. One table 
> doesn't have up-to-date statistics, because the table is filled during the 
> transaction involving this query.

Sure looks like incorporating an ANALYZE into the transaction would help
matters.  This misestimation:

>->  Seq Scan on 
> account_invoice_line il  (cost=0.00..9885.48 rows=752 width=22) (actual 
> time=0.049..121.238 rows=150448 loops=1)
>  Filter: 
> (display_type IS NULL)

looks like it's really hurting you, and "ANALYZE account_invoice_line"
after loading that table ought to be enough to fix that.

regards, tom lane




Re: More records after sort

2021-07-20 Thread Rob Sargent
> Query:
> INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond)
>SELECT il.id, ml.id, 48
>FROM account_invoice_line il
>JOIN account_invoice i ON i.id = il.invoice_id
>JOIN account_move m ON m.id = i.move_id
>JOIN account_move_line ml ON ml.move_id = m.id
>JOIN res_company comp ON comp.id = i.company_id
>WHERE il.display_type IS NULL
>  AND ml.tax_line_id IS NULL
>  AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE 
> invl_id=il.id)
>  AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE 
> aml_id=ml.id)
>  AND i.id = 

Out of curiosity, would combining those two NOT EXISTS in one help? Ever? 
i.e. WHERE invl_id=il.id or aml_id = ml.id