Primary keys and composite unique keys(basic question)
Hello, We have UUIDs in our tables which are primary keys. But in some cases we also identify a composite unique key apart from the primary key. My assumption is that there should be a unique key index created by us using the composite key. And when we fetch using this composite key instead of the primary key we have a performance boost due to the index. Is this right ? Are there more details I should read to understand this better ? Please point. If I should use the query planner to look at the statistics I will. . Thanks, Mohan
Re: Primary keys and composite unique keys(basic question)
I will cover the UUIDs first. They are indispensable to us. 1. The data is distributed over regions So we need the row to be unique. 2. This distributed data is sent to services as events. That is the application architecture. But we don't search using UUIDs always. Only when data from another distributed service is received we need them and in such cases we have to join using them. But for local data we can identify another composite unique key. Does PostgreSql create a unique index for us ? What about a FK that references this composite unique key ? Does it create a FK index ? Thank you. On Wed, Mar 31, 2021 at 7:42 PM Tom Lane wrote: > Mohan Radhakrishnan writes: > > We have UUIDs in our tables which are primary keys. But in > > some cases > > we also identify a composite unique key apart from the primary key. > > > My assumption is that there should be a unique key index created by us > > using the composite key. And when we fetch using this composite key > instead > > of the primary key we have a performance boost due to the index. > > You haven't provided a lot of detail, but use-a-UUID-as-a-primary-key > is often an antipattern. The UUIDs are quasi-random, meaning there's > no locality of reference in the primary key index, resulting in > inefficiency in searches and insertions. If the composite key you > mention has some actual relationship to your application's usage > patterns, it could be winning as a result of better locality of > access to that index. > > regards, tom lane >
Storing state machine
Hello, We have a workflow when we receive events into the service. But we don't have a way to choreograph or orchestrate the workflow. The services are all independent and receive and respond to events. Since there is no order imposed by the event queues I was thinking of storing a simple state machine in the table. 1. Order PENDING 2. Order line 1 PENDING 3. Order line 2 PENDING 4. Order line 1 PROCESSED 5. Order line 2 PROCESSED 6. Order PROCESSED Order and Order lines can be PROCESSED in any order. But at the end all lines should be PROCESSED and then the order is also PROCESSED. I won't be able to use any PostgreSql functions because we don't depend on those features. Are there any PostgreSql features that could support this pattern ? Is it just like any other eventually consistent pattern ? Thanks
Re: Storing state machine
>Your requirement statement is extremely simple and I suspect you have > glossed over some of the constraints/requirements, but based on what you > have written, your requirement seems to be trivial and easily satisfied > with basic database facilities. Yes. I wasn't specific. There are multiple instances of the service for availability. The events could reach any one of those. I think in this case isolation levels could be key ? An instance at any point in time can't read assuming all transactions are committed by all instances. The parent order's status is used to check easily instead of checking it children. Thanks. On Mon, Apr 19, 2021 at 12:21 PM Tim Cross wrote: > > Mohan Radhakrishnan writes: > > > Hello, > >We have a workflow when we receive events into the service. > But we don't have a way to choreograph or orchestrate the workflow. The > > services are all independent and receive and respond to events. > > > > Since there is no order imposed by the event queues I was thinking of > storing a simple state machine in the table. > > > > 1. Order PENDING > > 2. Order line 1 PENDING > > 3. Order line 2 PENDING > > 4. Order line 1 PROCESSED > > 5. Order line 2 PROCESSED > > 6. Order PROCESSED > > > > Order and Order lines can be PROCESSED in any order. But at the end all > lines should be PROCESSED and then the order is also PROCESSED. > > I won't be able to use any PostgreSql functions because we don't depend > on those features. > > > > Are there any PostgreSql features that could support this pattern ? Is > it just like any other eventually consistent pattern ? > > > > What you appear to have here is two entities - orders and order items. > An order entity has a 'state' (pending/processed) and is linked to > one or more order items which in turn have a state. > > The information about order state could be derived rather than actually > stored i.e. an order is pending if any of its order items are pending > and is processed if all of its order items are processed. At a minimum, > storing the order item state would be sufficient and a basic sql > statement would be able to tell you what the state of an order is. > > In general, you don't want to store duplicate or redundant information > as this can be a source of anomalies. (e.g. order state is not updated > to 'processed' when all items are processed or is updated to processed, > but then another item is added and for some reason, the state is not > switched back to pending etc). > > in general, it is usually a mistake or poor design to use one table to > represent different 'entities'. That is a 'bit bucket' approach which > really degrades the ability of the database to do what it is good at - > managing entities and their relationships. > > From the description you have provided, everything you need can be > easily managed with basic SQL statements - no need for functions or > stored procedures. All you would need is an SQL statement to insert a > new order item, an SQL statement to update the state of an item and a > SQL statement to report on the state of an order. > > Your requirement statement is extremely simple and I suspect you have > glossed over some of the constraints/requirements, but based on what you > have written, your requirement seems to be trivial and easily satisfied > with basic database facilities. > > -- > Tim Cross > > >
Re: Storing state machine
After further research this may be accomplished using a 'SAGA'. The other question I was researching is the ordering of these records by timestamptz. If I have to use a orchestrator then I will pick sets of 1 records, for example, ordered by timestamptz. Should I be looking at a certain type of index like the BRIN index for ordering timestamptz ? I have to exclude already processed records. Thanks. On Mon, Apr 19, 2021 at 6:40 PM Mohan Radhakrishnan < radhakrishnan.mo...@gmail.com> wrote: > >Your requirement statement is extremely simple and I suspect you have > > glossed over some of the constraints/requirements, but based on what you > > have written, your requirement seems to be trivial and easily satisfied > > with basic database facilities. > > Yes. I wasn't specific. There are multiple instances of the service for > availability. The > events could reach any one of those. I think in this case isolation levels > could be key ? > An instance at any point in time can't read assuming all transactions are > committed by > all instances. > > The parent order's status is used to check easily instead of checking it > children. > > Thanks. > > On Mon, Apr 19, 2021 at 12:21 PM Tim Cross wrote: > >> >> Mohan Radhakrishnan writes: >> >> > Hello, >> >We have a workflow when we receive events into the service. >> But we don't have a way to choreograph or orchestrate the workflow. The >> > services are all independent and receive and respond to events. >> > >> > Since there is no order imposed by the event queues I was thinking of >> storing a simple state machine in the table. >> > >> > 1. Order PENDING >> > 2. Order line 1 PENDING >> > 3. Order line 2 PENDING >> > 4. Order line 1 PROCESSED >> > 5. Order line 2 PROCESSED >> > 6. Order PROCESSED >> > >> > Order and Order lines can be PROCESSED in any order. But at the end all >> lines should be PROCESSED and then the order is also PROCESSED. >> > I won't be able to use any PostgreSql functions because we don't depend >> on those features. >> > >> > Are there any PostgreSql features that could support this pattern ? Is >> it just like any other eventually consistent pattern ? >> > >> >> What you appear to have here is two entities - orders and order items. >> An order entity has a 'state' (pending/processed) and is linked to >> one or more order items which in turn have a state. >> >> The information about order state could be derived rather than actually >> stored i.e. an order is pending if any of its order items are pending >> and is processed if all of its order items are processed. At a minimum, >> storing the order item state would be sufficient and a basic sql >> statement would be able to tell you what the state of an order is. >> >> In general, you don't want to store duplicate or redundant information >> as this can be a source of anomalies. (e.g. order state is not updated >> to 'processed' when all items are processed or is updated to processed, >> but then another item is added and for some reason, the state is not >> switched back to pending etc). >> >> in general, it is usually a mistake or poor design to use one table to >> represent different 'entities'. That is a 'bit bucket' approach which >> really degrades the ability of the database to do what it is good at - >> managing entities and their relationships. >> >> From the description you have provided, everything you need can be >> easily managed with basic SQL statements - no need for functions or >> stored procedures. All you would need is an SQL statement to insert a >> new order item, an SQL statement to update the state of an item and a >> SQL statement to report on the state of an order. >> >> Your requirement statement is extremely simple and I suspect you have >> glossed over some of the constraints/requirements, but based on what you >> have written, your requirement seems to be trivial and easily satisfied >> with basic database facilities. >> >> -- >> Tim Cross >> >> >>
BRIN index on timestamptz
Hi, I am planning to use as I search based on timestamptz fields. There are millions of records.I refer https://www.percona.com/blog/2019/07/16/brin-index-for-postgresql-dont-forget-the-benefits I execute this on the AWS RDS instance. Is there something in the plan I should pay attention to ? I notice the Execution Time. Thanks, Mohan INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + ( g || 'minute' ) :: interval, random() * 6, md5(g::text) FROM generate_series(1,800) as g; create index testtab_date_idx on testtab(date); "Gather (cost=1000.00..133475.57 rows=1 width=49) (actual time=848.040..862.638 rows=0 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Parallel Seq Scan on testtab (cost=0.00..132475.47 rows=1 width=49) (actual time=832.108..832.109 rows=0 loops=3)" "Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))" "Rows Removed by Filter: 267" "Planning Time: 0.238 ms" "Execution Time: 862.662 ms" explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791'; "Gather (cost=1000.00..133475.57 rows=1 width=49) (actual time=666.283..681.586 rows=0 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Parallel Seq Scan on testtab (cost=0.00..132475.47 rows=1 width=49) (actual time=650.661..650.661 rows=0 loops=3)" "Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))" "Rows Removed by Filter: 267" "Planning Time: 0.069 ms" "Execution Time: 681.617 ms" create index testtab_date_brin_idx on rm_owner.testtab using brin (date); explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791'; "Bitmap Heap Scan on testtab (cost=20.03..33406.84 rows=1 width=49) (actual time=0.143..0.143 rows=0 loops=1)" " Recheck Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))" " -> Bitmap Index Scan on "testtab_date_brin_idx " (cost=0.00..20.03 rows=12403 width=0) (actual time=0.141..0.141 rows=0 loops=1)" "Index Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))" "Planning Time: 0.126 ms" "Execution Time: 0.161 ms"
Re: BRIN index on timestamptz
>a) You need to do ANALYZE, otherwise >there are no statistics the optimizer >could use I execute and analyze. The actual timestamps I have are not random. I will order them chronologically. Thanks On Saturday, April 24, 2021, Tomas Vondra wrote: > > > On 4/23/21 10:31 AM, Mohan Radhakrishnan wrote: > >> Hi, >> I am planning to use as I search based on timestamptz fields. >> There are millions of records.I refer https://www.percona.com/blog/2 >> 019/07/16/brin-index-for-postgresql-dont-forget-the-benefits < >> https://www.percona.com/blog/2019/07/16/brin-index-for-post >> gresql-dont-forget-the-benefits> >> >> I execute this on the AWS RDS instance. Is there something in the plan I >> should pay attention to ? I notice the Execution Time. >> >> Thanks, >> Mohan >> >> INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + >> ( g || 'minute' ) :: interval, random() * 6, md5(g::text) FROM >> generate_series(1,800) as g; >> > > Two things: > > a) You need to do ANALYZE, otherwise there are no statistics the optimizer > could use (which is why the row estimates in the plans are entirely bogus). > > b) BRIN indexes don't work on random data, because the whole idea is about > eliminating large blocks of data (e.g. 1MB). But with random data that's > not going to happen, because each such range will match anything. Which is > why seqscan is a bit faster than when using BRIN index. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: BRIN index on timestamptz
> Why not use a btree index for the >timestamptz column? There are some capabilities our team lacks. Due to that autovacuum tuning mechanisms isn't considered at all. It may be in the future. I know about basic MVCC though. BRIN was an option as the characteristics you describe match the requirements. 1. Only recent rows are updated. One or two ENUMS, 4-5 states per ENUM for each timestamptz. 2.ENUMs are not indexed. Will that help too ? That is probably an unrelated question. Btree may be the default option. Thanks. On Saturday, April 24, 2021, Michael Lewis wrote: > > > On Sat, Apr 24, 2021, 1:27 AM Mohan Radhakrishnan < > radhakrishnan.mo...@gmail.com> wrote: > >> What's your question exactly? If you have confidence that correlation >> will remain high (insert only table, or occasional cluster/repack with >> cluster is done), then BRIN can be a good fit. If you do updates and >> deletes and new tuples (inserts and updates) come in and fill in those gaps >> left behind in early pages even though timestamp is high, then correlation >> will go down and brin will no longer be a good fit. >> >> Note- timestamp *with* timezone is recommended. >> >> The timestamptz isn't deleted or updated. It is only inserted. Another >> ENUM column will be updated. >> It looks like I should use brin. We also have other history tables like >> this. >> >> Thanks. >> > > That's not a correct conclusion. Reply all btw. > > Updating any value in the row means a new version of the row is inserted > and old one is marked expired and will be cleaned up by vacuum after no > transactions might need that row version (tuple). Research a bit about how > MVCC is implemented in Postgres. > > If those updates would only be on recent rows and autovacuum is tuned > aggressively to keep the maintenance task under control, then the > correlation may remain high as only recent rows are being updated. If the > updates go back a long ways, then BRIN is unlikely to be a good fit. Maybe > it still could be if table fillfactor is lowered a bit and the enum is not > indexed so you get mostly HOT (heap only tuple) updates, but I wouldn't > count on it. > > Why not use a btree index for the timestamptz column? > >
Re: BRIN index on timestamptz
Isn't a btree subject to these effects ? So when I update ENUMS for each timestamptz, btree indexes are less susceptible to the effects than BRIN indexes ? Thanks. On Sat, Apr 24, 2021 at 9:05 PM Mohan Radhakrishnan < radhakrishnan.mo...@gmail.com> wrote: > > Why not use a btree index for the >timestamptz column? > There are some capabilities our team lacks. Due to that autovacuum tuning > mechanisms isn't considered at all. It may be in the future. > > I know about basic MVCC though. BRIN was an option as the characteristics > you describe match the requirements. > > 1. Only recent rows are updated. One or two ENUMS, 4-5 states per ENUM > for each timestamptz. > 2.ENUMs are not indexed. Will that help too ? That is probably an > unrelated question. > > Btree may be the default option. > > Thanks. > > > On Saturday, April 24, 2021, Michael Lewis wrote: > >> >> >> On Sat, Apr 24, 2021, 1:27 AM Mohan Radhakrishnan < >> radhakrishnan.mo...@gmail.com> wrote: >> >>> What's your question exactly? If you have confidence that correlation >>> will remain high (insert only table, or occasional cluster/repack with >>> cluster is done), then BRIN can be a good fit. If you do updates and >>> deletes and new tuples (inserts and updates) come in and fill in those gaps >>> left behind in early pages even though timestamp is high, then correlation >>> will go down and brin will no longer be a good fit. >>> >>> Note- timestamp *with* timezone is recommended. >>> >>> The timestamptz isn't deleted or updated. It is only inserted. Another >>> ENUM column will be updated. >>> It looks like I should use brin. We also have other history tables like >>> this. >>> >>> Thanks. >>> >> >> That's not a correct conclusion. Reply all btw. >> >> Updating any value in the row means a new version of the row is inserted >> and old one is marked expired and will be cleaned up by vacuum after no >> transactions might need that row version (tuple). Research a bit about how >> MVCC is implemented in Postgres. >> >> If those updates would only be on recent rows and autovacuum is tuned >> aggressively to keep the maintenance task under control, then the >> correlation may remain high as only recent rows are being updated. If the >> updates go back a long ways, then BRIN is unlikely to be a good fit. Maybe >> it still could be if table fillfactor is lowered a bit and the enum is not >> indexed so you get mostly HOT (heap only tuple) updates, but I wouldn't >> count on it. >> >> Why not use a btree index for the timestamptz column? >> >>