Re: TEXT column > 1Gb
> On Apr 12, 2023, at 7:59 AM, Joe Carlson wrote: > > The use case is genomics. Extracting substrings is common. So going to > chunked storage makes sense. Are you storing nucleotide sequences as text strings? If using the simple 4-character (A,C,G,T) alphabet, you can store four bases per byte. If using a nucleotide code 16-character alphabet you can still get two bases per byte. An amino acid 20-character alphabet can be stored 8 bases per 5 bytes, and so forth. Such a representation might allow you to store sequences two or four times longer than the limit you currently hit, but then you are still at an impasse. Would a factor or 2x or 4x be enough for your needs? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Struggling with EXCLUDE USING gist
> On Jun 4, 2021, at 9:47 AM, Laura Smith > wrote: > > CREATE TABLE test ( > t_val text not null, > t_version text unique not null default gen_random_uuid() , > t_range tstzrange not null default tstzrange('-infinity','infinity'), > EXCLUDE USING gist (t_val WITH=, t_range WITH &&) DEFERRABLE INITIALLY > DEFERRED > ); > INSERT INTO test(t_val) values(p_val); This will insert a t_range of ('-infinity','infinity'), won't it? Wouldn't you want to instead insert with t_range starting around now() rather than starting at -infinity? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Struggling with EXCLUDE USING gist
> On Jun 4, 2021, at 11:55 AM, Laura Smith > wrote: > > That seems to have done the trick. Thanks again Mark Glad to hear it. Good luck. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Do table-level CHECK constraints affect the query optimizer?
> On Jun 29, 2021, at 10:33 AM, Ron wrote: > > Prod is brand new. Loaded on Saturday; we saw this problem on Sunday during > pre-acceptance. Thus, while running ANALYZE was top of the list of Things To > Do, running VACUUM was low. > > Is that a mistaken belief? You might want to run VACUUM FREEZE and then retry your test query using EXPLAIN. See if it switches to an index only scan after the VACUUM FREEZE. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Do table-level CHECK constraints affect the query optimizer?
> On Jun 29, 2021, at 11:02 AM, Ron wrote: > > What's an IOS? An Index Only Scan. See https://www.postgresql.org/docs/14/indexes-index-only-scans.html — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [E] Regexp_replace bug / does not terminate on long strings
> On Aug 20, 2021, at 9:52 AM, Tom Lane wrote: > > "a*" is easy. "(a*)\1" is less easy --- if you let the a* consume the > whole string, you will not get a match, even though one is possible. > In general, backrefs create a mess in what would otherwise be a pretty > straightforward concept :-(. The following queries take radically different time to run: \timing select regexp_replace( repeat('someone,one,one,one,one,one,one,', 60), '(?<=^|,)([^,]+)(?:,\1)+(?=$|,)', '\1', -- replacement 'g' -- apply globally (all matches) ); Time: 16476.529 ms (00:16.477) select regexp_replace( repeat('someone,one,one,one,one,one,one,', 60), '(?<=^|,)([^,]+)(?:,\1){5}(?=$|,)', '\1', -- replacement 'g' -- apply globally (all matches) ); Time: 1.452 ms The only difference in the patterns is the + vs. the {5}. It looks to me like the first pattern should greedily match five ",one" matches and be forced to stop since ",someone" doesn't match, and the second pattern should grab the five ",one" matches it was told to grab and not try to grab the ",someone", but other than that, they should be performing the same work. I don't see why the performance should be so different. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [E] Regexp_replace bug / does not terminate on long strings
> On Aug 20, 2021, at 12:51 PM, Miles Elam wrote: > > Unbounded ranges seem like a problem. Seems so. The problem appears to be in regcomp.c's repeat() function which handles {1,SOME} differently than {1,INF} > Seems worth trying a range from 1 to N where you play around with N to find > your optimum performance/functionality tradeoff. {1,20} is like '+' but > clamps at 20. For any such value (5, 20, whatever) there can always be a string with more repeated words than the number you've chosen, and the call to regexp_replace won't do what you want. There is also an upper bound at work, because values above 255 will draw a regex compilation error. So it seems worth a bit of work to determine why the regex engine has bad performance in these cases. It sounds like the OP will be working around this problem by refactoring to call regexp_replace multiple times until all repeats are eradicated, but I don't think such workarounds should be necessary. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: gen_random_uuid key collision
> On Sep 2, 2021, at 4:25 PM, jesusthefrog wrote: > > Anyone have any thoughts on this? I agree with Peter's suggestion upthread to run amcheck on the index, but if that comes back with no corruption, can you verify that there are no rules or triggers that might cause multiple copies of the rows to be inserted? Likewise, can you verify that you have no replication subscriptions that could be putting duplicates into the table? Another idea that seems unlikely given your lack of trouble with other tables is that you might check whether you have any functions that reset the seed for your random generator. I haven't looked specifically at your uuid generator, and I don't know if it gets nondeterministic randomness from /dev/random or similar, but deterministic random generators can be made to produce the same sequence again if the seed it reset. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Determining if a table really changed in a trigger
> On Oct 26, 2021, at 12:05 AM, Mitar wrote: > > Hi! > > I have a trigger like: > > CREATE TRIGGER update_trigger AFTER UPDATE ON my_table REFERENCING NEW > TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE > FUNCTION trigger_function; > > I would like to test inside trigger_function if the table really > changed. I have tried to do: > > PERFORM * FROM ((TABLE old_table EXCEPT TABLE new_table) UNION ALL > (TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1; > IF FOUND THEN > ... changed ... > END IF; > > But this fails if the table contains a JSON field with the error: > > could not identify an equality operator for type json > > The table has an unique index column, if that helps. I can't tell from your post if you want the trivial update to be performed, but if not, would it work to filter trivial updates as: CREATE RULE filter_trivial_updates AS ON UPDATE TO my_table WHERE new.i = old.i AND new.j = old.j AND ... DO INSTEAD NOTHING; You could replace the i, j, ... above with whichever columns you have, and specify the casts and equality operators you want for the json column (such as a cast to jsonb and equality.) The advantage here, if you do it right, is that the trigger doesn't have to check whether the row has changed, because the trigger will only fire when a change has occurred. You might try it and compare the performance against other solutions. The general idea is shown here: rules=# create table my_table (i integer, j json); CREATE TABLE rules=# insert into my_table rules-# select gs::integer, '{"key":1}'::json rules-# from generate_series(1,3) gs; INSERT 0 3 rules=# create function my_table_func () returns trigger as $$ rules$# begin rules$# raise warning '[old.i=%, old.j=%] => [new.i=%, new.j=%]', rules$# old.i, old.j, new.i, new.j; rules$# return new; rules$# end rules$# $$ language plpgsql; CREATE FUNCTION rules=# create trigger my_table_trig before update on my_table rules-# for each row execute function my_table_func(); CREATE TRIGGER rules=# update my_table set j = '{"key":2}'::jsonb; WARNING: [old.i=1, old.j={"key":1}] => [new.i=1, new.j={"key": 2}] WARNING: [old.i=2, old.j={"key":1}] => [new.i=2, new.j={"key": 2}] WARNING: [old.i=3, old.j={"key":1}] => [new.i=3, new.j={"key": 2}] UPDATE 3 rules=# create rule filter_trivial_updates as on update to my_table rules-# where new.i = old.i rules-# and new.j::jsonb = old.j::jsonb rules-# do instead nothing; CREATE RULE rules=# update my_table set j = '{"key":2}'::jsonb; UPDATE 0 rules=# update my_table set j = '{"key":3}'::jsonb; WARNING: [old.i=1, old.j={"key": 2}] => [new.i=1, new.j={"key": 3}] WARNING: [old.i=2, old.j={"key": 2}] => [new.i=2, new.j={"key": 3}] WARNING: [old.i=3, old.j={"key": 2}] => [new.i=3, new.j={"key": 3}] UPDATE 3 — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Determining if a table really changed in a trigger
> On Oct 26, 2021, at 1:34 PM, Mitar wrote: > > Hi! > > On Tue, Oct 26, 2021 at 10:17 PM Mark Dilger > wrote: >> I can't tell from your post if you want the trivial update to be performed, >> but if not, would it work to filter trivial updates as: > > No, I want to skip trivial updates (those which have not changed > anything). But my trigger is per statement, not per row. So I do not > think your approach works there? So this is why I am then making a > more complicated check inside the trigger itself. The trigger "my_table_trig" in the example is a per row trigger, but it exists only to demonstrate that the rule has filtered out the appropriate rows. You can use the rule "my_table_rule" as written and a per statement trigger, as here: rules=# create table my_table (i integer, j json); CREATE TABLE rules=# insert into my_table rules-# select gs::integer, '{"key":1}'::json rules-# from generate_series(1,3) gs; INSERT 0 3 rules=# create function my_table_func () returns trigger as $$ rules$# declare rules$# have_rows boolean; rules$# begin rules$# select true into have_rows from old_values limit 1; rules$# if have_rows then rules$# raise warning 'rows have changed'; rules$# else rules$# raise warning 'no rows changed'; rules$# end if; rules$# return null; rules$# end rules$# $$ language plpgsql; CREATE FUNCTION rules=# create trigger my_table_trig after update on my_table rules-# referencing old table as old_values rules-# for each statement rules-# execute function my_table_func(); CREATE TRIGGER rules=# update my_table set j = '{"key":2}'::jsonb; 2021-10-26 13:51:58.139 PDT [34352] WARNING: rows have changed 2021-10-26 13:51:58.139 PDT [34352] CONTEXT: PL/pgSQL function my_table_func() line 7 at RAISE WARNING: rows have changed UPDATE 3 rules=# create rule filter_trivial_updates as on update to my_table rules-# where new.i = old.i rules-# and new.j::jsonb = old.j::jsonb rules-# do instead nothing; CREATE RULE rules=# update my_table set j = '{"key":2}'::jsonb; 2021-10-26 13:51:58.143 PDT [34352] WARNING: no rows changed 2021-10-26 13:51:58.143 PDT [34352] CONTEXT: PL/pgSQL function my_table_func() line 9 at RAISE WARNING: no rows changed UPDATE 0 rules=# update my_table set j = '{"key":3}'::jsonb; 2021-10-26 13:51:58.143 PDT [34352] WARNING: rows have changed 2021-10-26 13:51:58.143 PDT [34352] CONTEXT: PL/pgSQL function my_table_func() line 7 at RAISE WARNING: rows have changed UPDATE 3 Note that there is a performance cost to storing the old rows using the REFERENCING clause of the trigger, so you'll want to think about all your various options and decide between them. I am not in a position to make performance recommendations for your schema. However, if updates tend to be target at small sets of rows, and if the rule is used to further filter out trivial updates, this might be cheap. Note that I used equality and inequality rather than IS DISTINCT FROM and IS NOT DISTINCT FROM in the design, but you should think about how NULL values (old, new, or both) will behave in the solution you choose. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Determining if a table really changed in a trigger
> On Oct 26, 2021, at 3:39 PM, Mitar wrote: > > On Tue, Oct 26, 2021 at 10:55 PM Mark Dilger > wrote: >> Note that there is a performance cost to storing the old rows using the >> REFERENCING clause of the trigger > > Yea, by moving the trivial update check to a rule, I need REFERENCING > only to see if there were any changes at all. This seems a bit > excessive. Is there a way to check if any rows have been affected by > an UPDATE inside a per statement trigger without using REFERENCING? I felt the same way about it, but after glancing quickly through the code and docs nothing jumped out. The information is clearly available, as it gets returned at the end of the UPDATE statement in the "UPDATE 0" OR "UPDATE 3", but I don't see how to access that from the trigger. I might have to submit a patch for that if nobody else knows a way to get it. (Hopefully somebody will respond with the answer...?) > I also notice that you check if a table has any rows with: > > SELECT true INTO have_rows FROM old_values LIMIT 1; > IF have_rows THEN ... > > Is this just a question of style or is this a better approach than my: > > PERFORM * FROM old_values LIMIT 1; > IF FOUND THEN ... There is no reason to prefer my spelling of that over yours. I didn't put much thought into it, but rather just wrote it the first way that occurred to me. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Determining if a table really changed in a trigger
> On Oct 26, 2021, at 4:01 PM, Michael Lewis wrote: > > Does this perform differently from suppress_redundant_updates_trigger? > > https://www.postgresql.org/docs/current/functions-trigger.html If Mitar finds that suppress_redundant_updates_trigger is sufficient, that may be a simpler solution. Thanks for mentioning it. The suppress_redundant_updates_trigger uses memcmp on the old and new rows. I don't know if memcmp will be sufficient in this case, since json can be binary unequal and yet turn out to be equal once cast to jsonb. I was using the rule and casting the json column to jsonb before comparing for equality. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value
> On Nov 9, 2021, at 8:02 AM, Avi Weinberg wrote: > > • A third column is added to table with default value on publisher > side, but without default value on subscriber side > • The default value column has value for existing rows on publisher, > but null on the subscriber side. See https://www.postgresql.org/docs/14/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN The important part is this TIP: From PostgreSQL 11, adding a column with a constant default value no longer means that each row of the table needs to be updated when the ALTER TABLE statement is executed. Instead, the default value will be returned the next time the row is accessed, and applied when the table is rewritten, making the ALTER TABLE very fast even on large tables. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value
> On Nov 9, 2021, at 1:24 PM, Michael Lewis wrote: > > Is there any advantage to not defining the default on the replica? If it is > not a static value and the publishing database will trigger row updates, I > could see waiting to set the default until after the table re-write is done, > but otherwise there doesn't seem to be any benefit to skipping column > defaults on subscribers. That's a schema design decision. I can't really make recommendations in the abstract. As a general rule, I would expect that the DDL executed on the publisher (such as the ALTER TABLE..ADD COLUMN..DEFAULT..) would be the same as that executed on the subscriber, unless there is a particular reason to want different behavior on the subscriber. Wanting different behavior is not unreasonable. For example, if the subscriber exists merely to archive data from the publisher, the subscriber might not bother creating indexes over that data, even if such indexes exist on the publisher. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company