Re: TEXT column > 1Gb

2023-04-12 Thread Mark Dilger



> On Apr 12, 2023, at 7:59 AM, Joe Carlson  wrote:
> 
> The use case is genomics. Extracting substrings is common. So going to 
> chunked storage makes sense.

Are you storing nucleotide sequences as text strings?  If using the simple 
4-character (A,C,G,T) alphabet, you can store four bases per byte.  If using a 
nucleotide code 16-character alphabet you can still get two bases per byte.  An 
amino acid 20-character alphabet can be stored 8 bases per 5 bytes, and so 
forth.  Such a representation might allow you to store sequences two or four 
times longer than the limit you currently hit, but then you are still at an 
impasse.  Would a factor or 2x or 4x be enough for your needs? 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Struggling with EXCLUDE USING gist

2021-06-04 Thread Mark Dilger



> On Jun 4, 2021, at 9:47 AM, Laura Smith  
> wrote:
> 
> CREATE TABLE test (
> t_val text not null,
> t_version text unique not null default gen_random_uuid() ,
> t_range tstzrange not null default tstzrange('-infinity','infinity'),
> EXCLUDE USING gist (t_val WITH=, t_range WITH &&) DEFERRABLE INITIALLY 
> DEFERRED
> );



> INSERT INTO test(t_val) values(p_val);

This will insert a t_range of ('-infinity','infinity'), won't it?  Wouldn't you 
want to instead insert with t_range starting around now() rather than starting 
at -infinity?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Struggling with EXCLUDE USING gist

2021-06-04 Thread Mark Dilger



> On Jun 4, 2021, at 11:55 AM, Laura Smith  
> wrote:
> 
> That seems to have done the trick. Thanks again Mark

Glad to hear it.  Good luck.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Do table-level CHECK constraints affect the query optimizer?

2021-06-29 Thread Mark Dilger



> On Jun 29, 2021, at 10:33 AM, Ron  wrote:
> 
> Prod is brand new.  Loaded on Saturday; we saw this problem on Sunday during 
> pre-acceptance.  Thus, while running ANALYZE was top of the list of Things To 
> Do, running VACUUM was low.
> 
> Is that a mistaken belief?

You might want to run VACUUM FREEZE and then retry your test query using 
EXPLAIN.  See if it switches to an index only scan after the VACUUM FREEZE.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Do table-level CHECK constraints affect the query optimizer?

2021-06-29 Thread Mark Dilger



> On Jun 29, 2021, at 11:02 AM, Ron  wrote:
> 
> What's an IOS?

An Index Only Scan.  See 
https://www.postgresql.org/docs/14/indexes-index-only-scans.html
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [E] Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Mark Dilger



> On Aug 20, 2021, at 9:52 AM, Tom Lane  wrote:
> 
> "a*" is easy.  "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.

The following queries take radically different time to run:

\timing
select regexp_replace(
  repeat('someone,one,one,one,one,one,one,', 60),
  '(?<=^|,)([^,]+)(?:,\1)+(?=$|,)',
  '\1', -- replacement
  'g'  -- apply globally (all matches)
);

Time: 16476.529 ms (00:16.477)

select regexp_replace(
  repeat('someone,one,one,one,one,one,one,', 60),
  '(?<=^|,)([^,]+)(?:,\1){5}(?=$|,)',
  '\1', -- replacement
  'g'  -- apply globally (all matches)
);

Time: 1.452 ms

The only difference in the patterns is the + vs. the {5}.  It looks to me like 
the first pattern should greedily match five ",one" matches and be forced to 
stop since ",someone" doesn't match, and the second pattern should grab the 
five ",one" matches it was told to grab and not try to grab the ",someone", but 
other than that, they should be performing the same work.  I don't see why the 
performance should be so different.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [E] Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Mark Dilger



> On Aug 20, 2021, at 12:51 PM, Miles Elam  wrote:
> 
> Unbounded ranges seem like a problem.

Seems so.  The problem appears to be in regcomp.c's repeat() function which 
handles {1,SOME} differently than {1,INF}

> Seems worth trying a range from 1 to N where you play around with N to find 
> your optimum performance/functionality tradeoff. {1,20} is like '+' but 
> clamps at 20.

For any such value (5, 20, whatever) there can always be a string with more 
repeated words than the number you've chosen, and the call to regexp_replace 
won't do what you want.  There is also an upper bound at work, because values 
above 255 will draw a regex compilation error.  So it seems worth a bit of work 
to determine why the regex engine has bad performance in these cases.

It sounds like the OP will be working around this problem by refactoring to 
call regexp_replace multiple times until all repeats are eradicated, but I 
don't think such workarounds should be necessary.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: gen_random_uuid key collision

2021-09-02 Thread Mark Dilger



> On Sep 2, 2021, at 4:25 PM, jesusthefrog  wrote:
> 
> Anyone have any thoughts on this?

I agree with Peter's suggestion upthread to run amcheck on the index, but if 
that comes back with no corruption, can you verify that there are no rules or 
triggers that might cause multiple copies of the rows to be inserted?  
Likewise, can you verify that you have no replication subscriptions that could 
be putting duplicates into the table?

Another idea that seems unlikely given your lack of trouble with other tables 
is that you might check whether you have any functions that reset the seed for 
your random generator.  I haven't looked specifically at your uuid generator, 
and I don't know if it gets nondeterministic randomness from /dev/random or 
similar, but deterministic random generators can be made to produce the same 
sequence again if the seed it reset.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mark Dilger



> On Oct 26, 2021, at 12:05 AM, Mitar  wrote:
> 
> Hi!
> 
> I have a trigger like:
> 
> CREATE TRIGGER update_trigger AFTER UPDATE ON my_table REFERENCING NEW
> TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE
> FUNCTION trigger_function;
> 
> I would like to test inside trigger_function if the table really
> changed. I have tried to do:
> 
> PERFORM * FROM ((TABLE old_table EXCEPT TABLE new_table) UNION ALL
> (TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1;
> IF FOUND THEN
>  ... changed ...
> END IF;
> 
> But this fails if the table contains a JSON field with the error:
> 
> could not identify an equality operator for type json
> 
> The table has an unique index column, if that helps.

I can't tell from your post if you want the trivial update to be performed, but 
if not, would it work to filter trivial updates as:

CREATE RULE filter_trivial_updates AS ON UPDATE TO my_table WHERE new.i = old.i 
AND new.j = old.j AND ... DO INSTEAD NOTHING;

You could replace the i, j, ... above with whichever columns you have, and 
specify the casts and equality operators you want for the json column (such as 
a cast to jsonb and equality.)

The advantage here, if you do it right, is that the trigger doesn't have to 
check whether the row has changed, because the trigger will only fire when a 
change has occurred.  You might try it and compare the performance against 
other solutions.  The general idea is shown here:

rules=# create table my_table (i integer, j json);
CREATE TABLE
rules=# insert into my_table
rules-#   select gs::integer, '{"key":1}'::json
rules-# from generate_series(1,3) gs;
INSERT 0 3
rules=# create function my_table_func () returns trigger as $$
rules$# begin
rules$#   raise warning '[old.i=%, old.j=%] => [new.i=%, new.j=%]',
rules$# old.i, old.j, new.i, new.j;
rules$#   return new;
rules$# end
rules$# $$ language plpgsql;
CREATE FUNCTION
rules=# create trigger my_table_trig before update on my_table
rules-#   for each row execute function my_table_func();
CREATE TRIGGER
rules=# update my_table set j = '{"key":2}'::jsonb;
WARNING:  [old.i=1, old.j={"key":1}] => [new.i=1, new.j={"key": 2}]
WARNING:  [old.i=2, old.j={"key":1}] => [new.i=2, new.j={"key": 2}]
WARNING:  [old.i=3, old.j={"key":1}] => [new.i=3, new.j={"key": 2}]
UPDATE 3
rules=# create rule filter_trivial_updates as on update to my_table
rules-#   where new.i = old.i
rules-# and new.j::jsonb = old.j::jsonb
rules-#   do instead nothing;
CREATE RULE
rules=# update my_table set j = '{"key":2}'::jsonb;
UPDATE 0
rules=# update my_table set j = '{"key":3}'::jsonb;
WARNING:  [old.i=1, old.j={"key": 2}] => [new.i=1, new.j={"key": 3}]
WARNING:  [old.i=2, old.j={"key": 2}] => [new.i=2, new.j={"key": 3}]
WARNING:  [old.i=3, old.j={"key": 2}] => [new.i=3, new.j={"key": 3}]
UPDATE 3

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mark Dilger



> On Oct 26, 2021, at 1:34 PM, Mitar  wrote:
> 
> Hi!
> 
> On Tue, Oct 26, 2021 at 10:17 PM Mark Dilger
>  wrote:
>> I can't tell from your post if you want the trivial update to be performed, 
>> but if not, would it work to filter trivial updates as:
> 
> No, I want to skip trivial updates (those which have not changed
> anything). But my trigger is per statement, not per row. So I do not
> think your approach works there? So this is why I am then making a
> more complicated check inside the trigger itself.

The trigger "my_table_trig" in the example is a per row trigger, but it exists 
only to demonstrate that the rule has filtered out the appropriate rows.  You 
can use the rule "my_table_rule" as written and a per statement trigger, as 
here:

rules=# create table my_table (i integer, j json);
CREATE TABLE
rules=# insert into my_table
rules-#   select gs::integer, '{"key":1}'::json
rules-# from generate_series(1,3) gs;
INSERT 0 3
rules=# create function my_table_func () returns trigger as $$
rules$# declare
rules$#   have_rows boolean;
rules$# begin
rules$#   select true into have_rows from old_values limit 1;
rules$#   if have_rows then
rules$# raise warning 'rows have changed';
rules$#   else
rules$# raise warning 'no rows changed';
rules$#   end if;
rules$#   return null;
rules$# end
rules$# $$ language plpgsql;
CREATE FUNCTION
rules=# create trigger my_table_trig after update on my_table
rules-#   referencing old table as old_values
rules-#   for each statement
rules-#   execute function my_table_func();
CREATE TRIGGER
rules=# update my_table set j = '{"key":2}'::jsonb;
2021-10-26 13:51:58.139 PDT [34352] WARNING:  rows have changed
2021-10-26 13:51:58.139 PDT [34352] CONTEXT:  PL/pgSQL function my_table_func() 
line 7 at RAISE
WARNING:  rows have changed
UPDATE 3
rules=# create rule filter_trivial_updates as on update to my_table
rules-#   where new.i = old.i
rules-# and new.j::jsonb = old.j::jsonb
rules-#   do instead nothing;
CREATE RULE
rules=# update my_table set j = '{"key":2}'::jsonb;
2021-10-26 13:51:58.143 PDT [34352] WARNING:  no rows changed
2021-10-26 13:51:58.143 PDT [34352] CONTEXT:  PL/pgSQL function my_table_func() 
line 9 at RAISE
WARNING:  no rows changed
UPDATE 0
rules=# update my_table set j = '{"key":3}'::jsonb;
2021-10-26 13:51:58.143 PDT [34352] WARNING:  rows have changed
2021-10-26 13:51:58.143 PDT [34352] CONTEXT:  PL/pgSQL function my_table_func() 
line 7 at RAISE
WARNING:  rows have changed
UPDATE 3

Note that there is a performance cost to storing the old rows using the 
REFERENCING clause of the trigger, so you'll want to think about all your 
various options and decide between them.  I am not in a position to make 
performance recommendations for your schema.  However, if updates tend to be 
target at small sets of rows, and if the rule is used to further filter out 
trivial updates, this might be cheap.

Note that I used equality and inequality rather than IS DISTINCT FROM and IS 
NOT DISTINCT FROM in the design, but you should think about how NULL values 
(old, new, or both) will behave in the solution you choose.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mark Dilger



> On Oct 26, 2021, at 3:39 PM, Mitar  wrote:
> 
> On Tue, Oct 26, 2021 at 10:55 PM Mark Dilger
>  wrote:
>> Note that there is a performance cost to storing the old rows using the 
>> REFERENCING clause of the trigger
> 
> Yea, by moving the trivial update check to a rule, I need REFERENCING
> only to see if there were any changes at all. This seems a bit
> excessive. Is there a way to check if any rows have been affected by
> an UPDATE inside a per statement trigger without using REFERENCING?

I felt the same way about it, but after glancing quickly through the code and 
docs nothing jumped out.  The information is clearly available, as it gets 
returned at the end of the UPDATE statement in the "UPDATE 0" OR "UPDATE 3", 
but I don't see how to access that from the trigger.  I might have to submit a 
patch for that if nobody else knows a way to get it.  (Hopefully somebody will 
respond with the answer...?)

> I also notice that you check if a table has any rows with:
> 
> SELECT true INTO have_rows FROM old_values LIMIT 1;
> IF have_rows THEN ...
> 
> Is this just a question of style or is this a better approach than my:
> 
> PERFORM * FROM old_values LIMIT 1;
> IF FOUND THEN ...

There is no reason to prefer my spelling of that over yours.  I didn't put much 
thought into it, but rather just wrote it the first way that occurred to me.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Determining if a table really changed in a trigger

2021-10-26 Thread Mark Dilger



> On Oct 26, 2021, at 4:01 PM, Michael Lewis  wrote:
> 
> Does this perform differently from suppress_redundant_updates_trigger?
> 
> https://www.postgresql.org/docs/current/functions-trigger.html

If Mitar finds that suppress_redundant_updates_trigger is sufficient, that may 
be a simpler solution.  Thanks for mentioning it.

The suppress_redundant_updates_trigger uses memcmp on the old and new rows.  I 
don't know if memcmp will be sufficient in this case, since json can be binary 
unequal and yet turn out to be equal once cast to jsonb.  I was using the rule 
and casting the json column to jsonb before comparing for equality.
 
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Mark Dilger



> On Nov 9, 2021, at 8:02 AM, Avi Weinberg  wrote:
> 
>   • A third column is added to table with default value on publisher 
> side, but without default value on subscriber side
>   • The default value column has value for existing rows on publisher, 
> but null on the subscriber side. 

See https://www.postgresql.org/docs/14/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN

The important part is this TIP:

From PostgreSQL 11, adding a column with a constant default value no longer 
means that each row of the table needs to be updated when the ALTER TABLE 
statement is executed. Instead, the default value will be returned the next 
time the row is accessed, and applied when the table is rewritten, making the 
ALTER TABLE very fast even on large tables.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Mark Dilger



> On Nov 9, 2021, at 1:24 PM, Michael Lewis  wrote:
> 
> Is there any advantage to not defining the default on the replica? If it is 
> not a static value and the publishing database will trigger row updates, I 
> could see waiting to set the default until after the table re-write is done, 
> but otherwise there doesn't seem to be any benefit to skipping column 
> defaults on subscribers.

That's a schema design decision.  I can't really make recommendations in the 
abstract.  As a general rule, I would expect that the DDL executed on the 
publisher (such as the ALTER TABLE..ADD COLUMN..DEFAULT..) would be the same as 
that executed on the subscriber, unless there is a particular reason to want 
different behavior on the subscriber.

Wanting different behavior is not unreasonable.  For example, if the subscriber 
exists merely to archive data from the publisher, the subscriber might not 
bother creating indexes over that data, even if such indexes exist on the 
publisher. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company