Re:

2022-06-18 Thread Peter J. Holzer
On 2022-06-17 10:38:57 +0800, Abdul Qoyyuum wrote:
> Best if you just maintain the logs into a log file instead of a database
> table. 

That depends on what you want to use those logs for.

If you just want to look at them when something goes wrong, I agree:
Logs files are simple, fast to write, take little space (and usually
compress well), can be trimmed with a text editor and attached to a
ticket or sent by mail ...

But if you want to process them further (generate statistics or other
reports, anonymize them, combine them with data from other sources,
...), it is useful to store them in some structured format (aka a
"database"). That doesn't necessarily mean a relational database, it
could be a nosql database (for example, ElasticSearch is very popular
for storing logs), but if you are already using PostgreSQL in your
project, using it for logs suggests itself.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-18 Thread Alban Hertroys


> On 18 Jun 2022, at 2:14, Bryn Llewellyn  wrote:
> 
> I implemented two complementary functions:
> 
> —"no_null_keys()" checks that a "jsonb" value has no occurrences of « "some 
> key": null »
> 
> —"strip_null_keys()" removes « "some key": null » occurrences from a "jsonb" 
> value
> 
> The code checks with "no_null_keys()" that, as expected, no ingested JSON 
> document has an occurrence of « "some key": null ».
> 
> And it uses "strip_null_keys()" on the output of "to_jsonb()" — and, as 
> appropriate, any other built-in JSON function that produces a "jsonb" value.
> 
> It was straightforward to implement these two functions by using REGEXP 
> built-in functionality on the canonically formatted "text" value produced by 
> the "jsonb::text" typecast.

In my experience, using regular expressions applied to document formats tends 
to get you false positives. I’d be worried about books with titles similar to 
'How we wrote a regular expression to detect occurrences of "some key": null in 
our JSON documents', for example.

For stripping those null occurrences, you are aware of the 
json_strip_nulls(json) and jsonb_strip_nulls(jsonb) functions, right?

For detecting them on a recent PG, the @? operator or json_path_exists(json, 
jsonpath) functions would probably do the trick.
I am not too familiar with JSONPATH expressions, but I expect (it passed some 
preliminary testing) this would detect your nulls just fine, while taking JSON 
semantics into account:

jsonb_path_exists(document, '$.** ? (@ == null)'::jsonpath)

For PG-specifics on JSONPATH, see section 9.16.2 on: 
https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-OP-TABLE

A recursive query is another possible solution. It would probably perform far 
worse, but I find them more rewarding to write. Some people prefer Sodoku.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: ISBN (was: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?)

2022-06-18 Thread Bryn Llewellyn
> hjp-pg...@hjp.at wrote:
> 
>> Bryn wrote:
>> 
>> "isbn" — string
>> values must be unique across the entire set of documents (in other words, it
>> defines the unique business key); values must have this pattern:
>> 
>>  « ^[0-9]{3}-[0-9]{1}-[0-9]{2}-[0-9]{6}-[0-9]{1}$ »
> 
> Completely off-topic, but this regexp doesn't describe ISBNs. In ISBNs the 
> three middle subfields are all variable length. The first is a language code 
> (there are more than 10 languages in the world), the second identifies the 
> publisher (there are more than 100 publishers) and the third the book. For 
> example, "Mastering PostgreSQL 9.6" has the ISBN 978-1-78355-535-2.

Yes, I know. Sorry. I should have said that I simply wanted to illustrate a 
proof of concept for the notion, viable (only?) when you have incoming JSON 
documents with a well-defined schema, that this is idempotent:

  JSON → relational → JSON

And it's of note that PostgreSQL has had what you need to do the xform, in each 
direction, for a long time.

I simply typed up my "corpus" by hand. It didn’t matter that the ISBN format 
was a fairly tale. A book does have an ISBN and (as far as my P.o.C. is 
concerned) it's the unique business key. I should have cut out the REGEXP in my 
email and said something like "must conform to the specified format". But I was 
careless in my proof reading.

Thanks for not mentioning that books have publishers—so that "authors" isn't 
the only key that implies a Codd-and-Date many-to-many!



Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-18 Thread Bryn Llewellyn
> haram...@gmail.com wrote
> 
>> b...@yugabyte.com wrote:
>> 
>> I implemented two complementary functions:
>> 
>> —"no_null_keys()" checks that a "jsonb" value has no occurrences of « "some 
>> key": null »
>> 
>> —"strip_null_keys()" removes « "some key": null » occurrences from a "jsonb" 
>> value
>> 
>> The code checks with "no_null_keys()" that, as expected, no ingested JSON 
>> document has an occurrence of « "some key": null ».
>> 
>> And it uses "strip_null_keys()" on the output of "to_jsonb()" — and, as 
>> appropriate, any other built-in JSON function that produces a "jsonb" value.
>> 
>> It was straightforward to implement these two functions by using REGEXP 
>> built-in functionality on the canonically formatted "text" value produced by 
>> the "jsonb::text" typecast.
> 
> In my experience, using regular expressions applied to document formats tends 
> to get you false positives. I’d be worried about books with titles similar to:
> 
> «
> How we wrote a regular expression to detect occurrences of "some key": null 
> in our JSON documents
> »
> 
> For stripping those null occurrences, you are aware of the 
> json_strip_nulls(json) and jsonb_strip_nulls(jsonb) functions, right?
> 
> For detecting them on a recent PG, the @? operator or json_path_exists(json, 
> jsonpath) functions would probably do the trick.
> I am not too familiar with JSONPATH expressions, but I expect (it passed some 
> preliminary testing) this would detect your nulls just fine, while taking 
> JSON semantics into account:
> 
> jsonb_path_exists(document, '$.** ? (@ == null)'::jsonpath)
> 
> For PG-specifics on JSONPATH, see section 9.16.2: 
> https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-OP-TABLE
> 
> A recursive query is another possible solution. It would probably perform far 
> worse...

Thank you so much, Alban, for taking an interest in my « "some key": null » 
saga.

The "G", "E", "L", and "O" keys on my keyboard were broken. They're fixed now, 
and so "postgres strip keys with null value from jsonb object" got me to this 
(but, strangely, not to the PG docs):

docs.yugabyte.com/preview/api/ysql/datatypes/type_json/functions-operators/jsonb-strip-nulls/

This is embarrassing. I wrote that doc, along with the rest of the content in 
the "JSON data types and functionality" section just before COVID hit us. 
Apparently, life under lockdown has damaged my corpus callosum. I typed up my 
proof-of-concept code and emails with one half of my brain—and it was the other 
half that wrote that documentation.

So thanks for the memory jog. My excuse is that (as was the case with my 
malformed ISBN that Peter Holzer pointed out in a separate thread) I wanted 
just to show myself, as a proof-of-concept, that stripping nulls was 
feasible—so I gave it no more thought once I'd done that. But, I suppose, 
that's not excuse...

Anyway, my "strip_null_keys()" is already on the scrapheap. And the body of my 
"no_null_keys()" reduced to a single line:

create function no_null_keys(j in jsonb)
  returns boolean
  immutable
  language sql
as $body$
  select j = jsonb_strip_nulls(j);
$body$;

You might argue that I don't need to bother with the encapsulation. But it 
makes testing easier—and I'm trusting that inlining works as advertised.

Your point about false positives is well taken. So, just for sport:

create type t1 as (k int, v text);
create type t2 as (a int, b int, c t1, d t1, e text[]);

create function j()
  returns jsonb
  language plpgsql
as $body$
declare
  tconstant textnot null := 'How we wrote a regular expression to 
detect occurrences of « "some key": null » in our JSON documents!';
  c1   constant t1  not null := (17, t);
  c2   constant t1  not null := (29, null);
  arr  constant text[]  not null := array['x', null::text, t];
  rconstant t2  not null := (42, null, c1, c2, arr);
begin
  return to_jsonb(r);
end;
$body$;
select jsonb_pretty(j());

The output includes two occurrences of this:

"How we wrote a regular expression to detect occurrences of « \"some key\": 
null » in our JSON documents!"

I believe that the "jsonb" to "text" conversion never produces an isolated 
double-quote within the representation of an object key's value. I checked that 
my "strip_null_keys()" handled your example before consigning it to the 
scrapheap—and it didn't let me down. But it would be foolish to argue that 
there isn't some way to provoke a false positive.