delete statement returning too many results

2022-11-28 Thread Arlo Louis O'Keeffe
Hello everyone,

I am seeing weird behaviour of a delete statement that is returning more 
results than I am expecting.

This is the query:

DELETE FROM queue
WHERE
id IN (
SELECT id
FROM queue
ORDER BY id
LIMIT 1
FOR UPDATE
SKIP LOCKED
)
RETURNING *;

My understanding is that the limit in the sub-select should prevent this query 
from ever
returning more than one result. Sadly I am seeing cases where there is more 
than one result.

This repository has a Java setup that pretty reliably reproduces my issue:
https://github.com/ArloL/postgres-query-error-demo

I checked the docs for select and delete and couldn’t find any hint for cases
where the behaviour of limit might be surprising.

Am I missing something?

Thanks,

Arlo





Re: delete statement returning too many results

2022-12-04 Thread Arlo Louis O'Keeffe


> On 29. Nov 2022, at 18:35, Tom Lane  wrote:
> 
> Harmen  writes:
>> On Mon, Nov 28, 2022 at 12:11:53PM -0500, Tom Lane wrote:
>>> So basically it's unsafe to run the sub-select more than once,
>>> but the query as written leaves it up to the planner whether
>>> to do that.  I'd suggest rephrasing as [...]
> 
>> I'm not the original poster, but I do use similar constructions for simple
>> postgres queues. I've been trying for a while, but I don't understand where 
>> the
>> extra rows come from, or what's "silent" about SKIP LOCKED.
> 
> Sorry, I should not have blamed SKIP LOCKED in particular; this
> construction will misbehave with or without that.  The issue is with
> using SELECT FOR UPDATE inside a DELETE or UPDATE that then modifies
> the row that the subquery returned.  The next execution of the subquery
> will, or should, return a different row: either some not-deleted row,
> or the modified row.  So in this context, the result of the subquery
> is volatile.  The point of putting it in a MATERIALIZED CTE is to
> lock the result down regardless of that.
> 
>> Because we get different results depending on the plan postgres picks, I can
>> see two options: either the query is broken, or postgres is broken.
> 
> You can argue that the planner should treat volatile subqueries
> differently than it does today.  But the only reasonable way of
> tightening the semantics would be to force re-execution of such a
> subquery every time, even when it's not visibly dependent on the
> outer query.  That would be pretty bad for performance, and I doubt
> it would make the OP happy in this example, because what it would
> mean is that his query "fails" every time not just sometimes.
> (Because of that, I don't have too much trouble concluding that
> the query is broken, whether or not you feel that postgres is
> also broken.)
> 
> The bigger picture here is that we long ago decided that the planner
> should not inquire too closely into the volatility of subqueries,
> primarily because there are use-cases where people intentionally rely
> on them not to be re-executed.  As an example, these queries give
> different results:
> 
> regression=# select random() from generate_series(1,3);
>   random
> -
>  0.7637195395988317
> 0.09569374432524946
>   0.490132093120365
> (3 rows)
> 
> regression=# select (select random()) from generate_series(1,3);
>   random   
> 
> 0.9730230633436501
> 0.9730230633436501
> 0.9730230633436501
> (3 rows)
> 
> In the second case, the sub-select is deemed to be independent
> of the outer query and executed only once.  You can argue that
> if that's what you want you should be forced to put the sub-select
> in a materialized CTE to make that plain.  But we felt that that
> would make many more people unhappy than happy, so we haven't
> done it.  Maybe the question could be revisited once all PG
> versions lacking the MATERIALIZED syntax are long dead. 
> 
> regards, tom lane

Thanks for the thorough explanation. That seems very reasonable.

The CTE query works well for my use case.

Thanks!