Re: jsonb_set() strictness considered harmful to data

2019-10-18 Thread Mark Felder



On Fri, Oct 18, 2019, at 12:37, Ariadne Conill wrote:
> Hello,
> 
> I am one of the primary maintainers of Pleroma, a federated social
> networking application written in Elixir, which uses PostgreSQL in
> ways that may be considered outside the typical usage scenarios for
> PostgreSQL.
> 
> Namely, we leverage JSONB heavily as a backing store for JSON-LD
> documents[1].  We also use JSONB in combination with Ecto's "embedded
> structs" to store things like user preferences.
> 
> The fact that we can use JSONB to achieve our design goals is a
> testament to the flexibility PostgreSQL has.
> 
> However, in the process of doing so, we have discovered a serious flaw
> in the way jsonb_set() functions, but upon reading through this
> mailing list, we have discovered that this flaw appears to be an
> intentional design.[2]
> 
> A few times now, we have written migrations that do things like copy
> keys in a JSONB object to a new key, to rename them.  These migrations
> look like so:
> 
>update users set info=jsonb_set(info, '{bar}', info->'foo');
> 
> Typically, this works nicely, except for cases where evaluating
> info->'foo' results in an SQL null being returned.  When that happens,
> jsonb_set() returns an SQL null, which then results in data loss.[3]
> 
> This is not acceptable.  PostgreSQL is a database that is renowned for
> data integrity, but here it is wiping out data when it encounters a
> failure case.  The way jsonb_set() should fail in this case is to
> simply return the original input: it should NEVER return SQL null.
> 
> But hey, we've been burned by this so many times now that we'd like to
> donate a useful function to the commons, consider it a mollyguard for
> the real jsonb_set() function.
> 
> create or replace function safe_jsonb_set(target jsonb, path
> text[], new_value jsonb, create_missing boolean default true) returns
> jsonb as $$
> declare
>   result jsonb;
> begin
>   result := jsonb_set(target, path, coalesce(new_value,
> 'null'::jsonb), create_missing);
>   if result is NULL then
> return target;
>   else
> return result;
>   end if;
> end;
> $$ language plpgsql;
> 
> This safe_jsonb_set() wrapper should not be necessary.  PostgreSQL's
> own jsonb_set() should have this safety feature built in.  Without it,
> using jsonb_set() is like playing russian roulette with your data,
> which is not a reasonable expectation for a database renowned for its
> commitment to data integrity.
> 
> Please fix this bug so that we do not have to hack around this bug.
> It has probably ruined countless people's days so far.  I don't want
> to hear about how the function is strict, I'm aware it is strict, and
> that strictness is harmful.  Please fix the function so that it is
> actually safe to use.
> 
> [1]: JSON-LD stands for JSON Linked Data.  Pleroma has an "internal
> representation" that shares similar qualities to JSON-LD, so I use
> JSON-LD here as a simplification.
> 
> [2]: 
> https://www.postgresql.org/message-id/flat/qfkua9$2q0e$1...@blaine.gmane.org
> 
> [3]: https://git.pleroma.social/pleroma/pleroma/issues/1324 is an
> example of data loss induced by this issue.
> 
> Ariadne
>

This should be directed towards the hackers list, too.

What will it take to change the semantics of jsonb_set()? MySQL implements safe 
behavior here. It's a real shame Postgres does not. I'll offer a $200 bounty to 
whoever fixes it. I'm sure it's destroyed more than $200 worth of data and 
people's time by now, but it's something.


Kind regards,



-- 
  Mark Felder
  ports-secteam & portmgr alumni
  f...@freebsd.org




Re: jsonb_set() strictness considered harmful to data

2019-10-28 Thread Mark Felder



On Mon, Oct 28, 2019, at 08:52, Andrew Dunstan wrote:
> 
> For release 13+, I have given some more thought to what should be done.
> I think the bar for altering the behaviour of a function should be
> rather higher than we have in the present case, and the longer the
> function has been sanctioned by time the higher the bar should be.
> However, I think there is a case to be made for providing a non-strict
> jsonb_set type function. To advance th4e discussion, attached is a POC
> patch that does that. This can also be done as an extension, meaning
> that users of back branches could deploy it immediately. I've tested
> this against release 12, but I think it could go probably all the way
> back to 9.5. The new function is named jsonb_ set_lax, but I'm open to
> bikeshedding.
> 
> 

Thank you Andrew, and I understand the difficulty in making changes to 
functions that already exist in production deployments. An additional function 
like this would be helpful to many.


-- 
  Mark Felder
  ports-secteam & portmgr alumni
  f...@freebsd.org




ts_parse reports different between MacOS, FreeBSD/Linux

2020-12-22 Thread Mark Felder
Hello,

We have an application whose test suite fails on MacOS when running the search 
tests on unicode characters.

I've narrowed it down to the following:

macos=# select * from ts_parse('default','天');
 tokid | token
---+---
12 | 天
(1 row)

freebsd=# select * from ts_parse('default','天');
 tokid | token
---+---
 2 | 天
(1 row)


This has been bugging me for a while, but it's a test our devs using MacOS just 
ignores for now as we know it passes our CI/CD pipeline on FreeBSD/Linux. It 
seems if anyone is shipping an app on MacOS and bundling Postgres they're going 
to have a bad time with searching.


Please let me know if there's anything I can do to help. Will gladly test 
patches.



Thanks,



-- 
  Mark Felder
  ports-secteam & portmgr alumni
  f...@freebsd.org