from:"Mohan Radhakrishnan"

Primary keys and composite unique keys(basic question)

2021-03-31 Thread Mohan Radhakrishnan

Hello,
 We have UUIDs in our tables which are primary keys. But in
some cases
we also identify a composite unique key apart from the primary key.

My assumption is that there should be a unique key index created by us
using the composite key. And when we fetch using this composite key instead
of the primary key we have a performance boost due to the index.

Is this right ? Are there more details I should read to understand this
better ? Please point. If I should use the query planner to look at the
statistics I will.
.
Thanks,
Mohan

Re: Primary keys and composite unique keys(basic question)

2021-03-31 Thread Mohan Radhakrishnan

I will cover the UUIDs first. They are indispensable to us.
1. The data is distributed over regions So we need the row to be unique.
2. This distributed data is sent to services as events. That is the
application architecture.

But we don't search using UUIDs always. Only when data from another
distributed service
is received we need them and in such cases we have to join using them.

But for local data we can identify another composite unique key. Does
PostgreSql
create a unique index for us ? What about a FK that references this
composite
unique key ? Does it create a FK index ?

Thank you.

On Wed, Mar 31, 2021 at 7:42 PM Tom Lane  wrote:

> Mohan Radhakrishnan  writes:
> >  We have UUIDs in our tables which are primary keys. But in
> > some cases
> > we also identify a composite unique key apart from the primary key.
>
> > My assumption is that there should be a unique key index created by us
> > using the composite key. And when we fetch using this composite key
> instead
> > of the primary key we have a performance boost due to the index.
>
> You haven't provided a lot of detail, but use-a-UUID-as-a-primary-key
> is often an antipattern.  The UUIDs are quasi-random, meaning there's
> no locality of reference in the primary key index, resulting in
> inefficiency in searches and insertions.  If the composite key you
> mention has some actual relationship to your application's usage
> patterns, it could be winning as a result of better locality of
> access to that index.
>
> regards, tom lane
>

Storing state machine

2021-04-18 Thread Mohan Radhakrishnan

Hello,
   We have a workflow when we receive events into the service. But
we don't have a way to choreograph or orchestrate the workflow. The
services are all independent and receive and respond to events.

Since there is no order imposed by the event queues I was thinking of
storing a simple state machine in the table.

1. Order PENDING
2. Order line 1 PENDING
3. Order line 2 PENDING
4. Order line 1 PROCESSED
5. Order line 2 PROCESSED
6. Order PROCESSED

Order and Order lines can be PROCESSED in any order. But at the end all
lines should be  PROCESSED and then the order is also  PROCESSED.
I won't be able to use any PostgreSql functions because we don't depend on
those features.

Are there any PostgreSql features that could support this pattern ? Is it
just like any other eventually consistent pattern ?

Thanks

Re: Storing state machine

2021-04-19 Thread Mohan Radhakrishnan

  >Your requirement statement is extremely simple and I suspect you have
> glossed over some of the constraints/requirements, but based on what you
> have written, your requirement seems to be trivial and easily satisfied
> with basic database facilities.

Yes. I wasn't specific. There are multiple instances of the service for
availability. The
events could reach any one of those. I think in this case isolation levels
could be key ?
An instance at any point in time can't read assuming all transactions are
committed by
all instances.

The parent order's status is used to check easily instead of checking it
children.

Thanks.

On Mon, Apr 19, 2021 at 12:21 PM Tim Cross  wrote:

>
> Mohan Radhakrishnan  writes:
>
> > Hello,
> >We have a workflow when we receive events into the service.
> But we don't have a way to choreograph or orchestrate the workflow. The
> > services are all independent and receive and respond to events.
> >
> > Since there is no order imposed by the event queues I was thinking of
> storing a simple state machine in the table.
> >
> > 1. Order PENDING
> > 2. Order line 1 PENDING
> > 3. Order line 2 PENDING
> > 4. Order line 1 PROCESSED
> > 5. Order line 2 PROCESSED
> > 6. Order PROCESSED
> >
> > Order and Order lines can be PROCESSED in any order. But at the end all
> lines should be  PROCESSED and then the order is also  PROCESSED.
> > I won't be able to use any PostgreSql functions because we don't depend
> on those features.
> >
> > Are there any PostgreSql features that could support this pattern ? Is
> it just like any other eventually consistent pattern ?
> >
>
> What you appear to have here is two entities - orders and order items.
> An order entity has a 'state' (pending/processed) and is linked to
> one or more order items which in turn have a state.
>
> The information about order state could be derived rather than actually
> stored i.e. an order is pending if any of its order items are pending
> and is processed if all of its order items are processed. At a minimum,
> storing the order item state would be sufficient and a basic sql
> statement would be able to tell you what the state of an order is.
>
> In general, you don't want to store duplicate or redundant information
> as this can be a source of anomalies. (e.g. order state is not updated
> to 'processed' when all items are processed or is updated to processed,
> but then another item is added and for some reason, the state is not
> switched back to pending etc).
>
> in general, it is usually a mistake or poor design to use one table to
> represent different 'entities'. That is a 'bit bucket' approach which
> really degrades the ability of the database to do what it is good at -
> managing entities and their relationships.
>
> From the description you have provided, everything you need can be
> easily managed with basic SQL statements - no need for functions or
> stored procedures. All you would need is an SQL statement to insert a
> new order item, an SQL statement to update the state of an item and a
> SQL statement to report on the state of an order.
>
> Your requirement statement is extremely simple and I suspect you have
> glossed over some of the constraints/requirements, but based on what you
> have written, your requirement seems to be trivial and easily satisfied
> with basic database facilities.
>
> --
> Tim Cross
>
>
>

Re: Storing state machine

2021-04-19 Thread Mohan Radhakrishnan

After further research this may be accomplished using a 'SAGA'.
The other question I was researching is the ordering of these records by
timestamptz. If I have to use a
orchestrator then I will pick sets of 1 records, for example, ordered
by  timestamptz. Should I be looking at a certain
type of index like the BRIN index for ordering timestamptz ? I have to
exclude already processed records.

Thanks.

On Mon, Apr 19, 2021 at 6:40 PM Mohan Radhakrishnan <
radhakrishnan.mo...@gmail.com> wrote:

>   >Your requirement statement is extremely simple and I suspect you have
> > glossed over some of the constraints/requirements, but based on what you
> > have written, your requirement seems to be trivial and easily satisfied
> > with basic database facilities.
>
> Yes. I wasn't specific. There are multiple instances of the service for
> availability. The
> events could reach any one of those. I think in this case isolation levels
> could be key ?
> An instance at any point in time can't read assuming all transactions are
> committed by
> all instances.
>
> The parent order's status is used to check easily instead of checking it
> children.
>
> Thanks.
>
> On Mon, Apr 19, 2021 at 12:21 PM Tim Cross  wrote:
>
>>
>> Mohan Radhakrishnan  writes:
>>
>> > Hello,
>> >We have a workflow when we receive events into the service.
>> But we don't have a way to choreograph or orchestrate the workflow. The
>> > services are all independent and receive and respond to events.
>> >
>> > Since there is no order imposed by the event queues I was thinking of
>> storing a simple state machine in the table.
>> >
>> > 1. Order PENDING
>> > 2. Order line 1 PENDING
>> > 3. Order line 2 PENDING
>> > 4. Order line 1 PROCESSED
>> > 5. Order line 2 PROCESSED
>> > 6. Order PROCESSED
>> >
>> > Order and Order lines can be PROCESSED in any order. But at the end all
>> lines should be  PROCESSED and then the order is also  PROCESSED.
>> > I won't be able to use any PostgreSql functions because we don't depend
>> on those features.
>> >
>> > Are there any PostgreSql features that could support this pattern ? Is
>> it just like any other eventually consistent pattern ?
>> >
>>
>> What you appear to have here is two entities - orders and order items.
>> An order entity has a 'state' (pending/processed) and is linked to
>> one or more order items which in turn have a state.
>>
>> The information about order state could be derived rather than actually
>> stored i.e. an order is pending if any of its order items are pending
>> and is processed if all of its order items are processed. At a minimum,
>> storing the order item state would be sufficient and a basic sql
>> statement would be able to tell you what the state of an order is.
>>
>> In general, you don't want to store duplicate or redundant information
>> as this can be a source of anomalies. (e.g. order state is not updated
>> to 'processed' when all items are processed or is updated to processed,
>> but then another item is added and for some reason, the state is not
>> switched back to pending etc).
>>
>> in general, it is usually a mistake or poor design to use one table to
>> represent different 'entities'. That is a 'bit bucket' approach which
>> really degrades the ability of the database to do what it is good at -
>> managing entities and their relationships.
>>
>> From the description you have provided, everything you need can be
>> easily managed with basic SQL statements - no need for functions or
>> stored procedures. All you would need is an SQL statement to insert a
>> new order item, an SQL statement to update the state of an item and a
>> SQL statement to report on the state of an order.
>>
>> Your requirement statement is extremely simple and I suspect you have
>> glossed over some of the constraints/requirements, but based on what you
>> have written, your requirement seems to be trivial and easily satisfied
>> with basic database facilities.
>>
>> --
>> Tim Cross
>>
>>
>>

BRIN index on timestamptz

2021-04-23 Thread Mohan Radhakrishnan

Hi,
 I am planning to use as I search based on timestamptz fields.
There are millions of records.I refer
https://www.percona.com/blog/2019/07/16/brin-index-for-postgresql-dont-forget-the-benefits

I execute this on the AWS RDS instance. Is there something in the plan I
should pay attention to ? I notice the Execution Time.

Thanks,
Mohan

INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + (
g || 'minute' ) :: interval, random() * 6, md5(g::text) FROM
generate_series(1,800) as g;
create index testtab_date_idx  on testtab(date);

"Gather  (cost=1000.00..133475.57 rows=1 width=49) (actual
time=848.040..862.638 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Parallel Seq Scan on testtab  (cost=0.00..132475.47 rows=1 width=49)
(actual time=832.108..832.109 rows=0 loops=3)"
"Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without
time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without
time zone))"
"Rows Removed by Filter: 267"
"Planning Time: 0.238 ms"
"Execution Time: 862.662 ms"

explain analyze select * from public.testtab where date between '2019-08-08
14:40:47.974791' and '2019-08-08 14:50:47.974791';

"Gather  (cost=1000.00..133475.57 rows=1 width=49) (actual
time=666.283..681.586 rows=0 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Parallel Seq Scan on testtab  (cost=0.00..132475.47 rows=1 width=49)
(actual time=650.661..650.661 rows=0 loops=3)"
"Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without
time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without
time zone))"
"Rows Removed by Filter: 267"
"Planning Time: 0.069 ms"
"Execution Time: 681.617 ms"

create index testtab_date_brin_idx  on rm_owner.testtab using brin (date);

explain analyze select * from public.testtab where date between '2019-08-08
14:40:47.974791' and '2019-08-08 14:50:47.974791';

"Bitmap Heap Scan on testtab  (cost=20.03..33406.84 rows=1 width=49)
(actual time=0.143..0.143 rows=0 loops=1)"
"  Recheck Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without
time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without
time zone))"
"  ->  Bitmap Index Scan on "testtab_date_brin_idx "  (cost=0.00..20.03
rows=12403 width=0) (actual time=0.141..0.141 rows=0 loops=1)"
"Index Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp
without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp
without time zone))"
"Planning Time: 0.126 ms"
"Execution Time: 0.161 ms"

Re: BRIN index on timestamptz

2021-04-23 Thread Mohan Radhakrishnan

>a) You need to do ANALYZE, otherwise >there are no statistics the
optimizer >could use

I execute and analyze. The actual timestamps I have are not random. I will
order them chronologically.

Thanks

On Saturday, April 24, 2021, Tomas Vondra 
wrote:

>
>
> On 4/23/21 10:31 AM, Mohan Radhakrishnan wrote:
>
>> Hi,
>>   I am planning to use as I search based on timestamptz fields.
>> There are millions of records.I refer https://www.percona.com/blog/2
>> 019/07/16/brin-index-for-postgresql-dont-forget-the-benefits <
>> https://www.percona.com/blog/2019/07/16/brin-index-for-post
>> gresql-dont-forget-the-benefits>
>>
>> I execute this on the AWS RDS instance. Is there something in the plan I
>> should pay attention to ? I notice the Execution Time.
>>
>> Thanks,
>> Mohan
>>
>> INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP +
>> ( g || 'minute' ) :: interval, random() * 6, md5(g::text) FROM
>> generate_series(1,800) as g;
>>
>
> Two things:
>
> a) You need to do ANALYZE, otherwise there are no statistics the optimizer
> could use (which is why the row estimates in the plans are entirely bogus).
>
> b) BRIN indexes don't work on random data, because the whole idea is about
> eliminating large blocks of data (e.g. 1MB). But with random data that's
> not going to happen, because each such range will match anything. Which is
> why seqscan is a bit faster than when using BRIN index.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: BRIN index on timestamptz

2021-04-24 Thread Mohan Radhakrishnan

> Why not use a btree index for the >timestamptz column?
There are some capabilities our team lacks. Due to that  autovacuum tuning
mechanisms  isn't considered at all. It may be in the future.

I know about basic MVCC though. BRIN was an option as the characteristics
you describe match the requirements.

1. Only recent rows are updated. One or two ENUMS, 4-5 states per  ENUM for
each timestamptz.
2.ENUMs are not indexed. Will that  help too ? That is probably an
unrelated question.

Btree may be the default option.

Thanks.


On Saturday, April 24, 2021, Michael Lewis  wrote:

>
>
> On Sat, Apr 24, 2021, 1:27 AM Mohan Radhakrishnan <
> radhakrishnan.mo...@gmail.com> wrote:
>
>> What's your question exactly? If you have confidence that correlation
>> will remain high (insert only table, or occasional cluster/repack with
>> cluster is done), then BRIN can be a good fit. If you do updates and
>> deletes and new tuples (inserts and updates) come in and fill in those gaps
>> left behind in early pages even though timestamp is high, then correlation
>> will go down and brin will no longer be a good fit.
>>
>> Note- timestamp *with* timezone is recommended.
>>
>> The timestamptz isn't deleted or updated. It is only inserted. Another
>> ENUM column will be updated.
>> It looks like I should use brin. We also have other history tables like
>> this.
>>
>> Thanks.
>>
>
> That's not a correct conclusion. Reply all btw.
>
> Updating any value in the row means a new version of the row is inserted
> and old one is marked expired and will be cleaned up by vacuum after no
> transactions might need that row version (tuple). Research a bit about how
> MVCC is implemented in Postgres.
>
> If those updates would only be on recent rows and autovacuum is tuned
> aggressively to keep the maintenance task under control, then the
> correlation may remain high as only recent rows are being updated. If the
> updates go back a long ways, then BRIN is unlikely to be a good fit. Maybe
> it still could be if table fillfactor is lowered a bit and the enum is not
> indexed so you get mostly HOT (heap only tuple) updates, but I wouldn't
> count on it.
>
> Why not use a btree index for the timestamptz column?
>
>

Re: BRIN index on timestamptz

2021-04-26 Thread Mohan Radhakrishnan

Isn't a btree subject to these effects ? So when I update ENUMS for each
timestamptz, btree indexes are less susceptible
to the effects than BRIN indexes  ?

Thanks.

On Sat, Apr 24, 2021 at 9:05 PM Mohan Radhakrishnan <
radhakrishnan.mo...@gmail.com> wrote:

> > Why not use a btree index for the >timestamptz column?
> There are some capabilities our team lacks. Due to that  autovacuum tuning
> mechanisms  isn't considered at all. It may be in the future.
>
> I know about basic MVCC though. BRIN was an option as the characteristics
> you describe match the requirements.
>
> 1. Only recent rows are updated. One or two ENUMS, 4-5 states per  ENUM
> for each timestamptz.
> 2.ENUMs are not indexed. Will that  help too ? That is probably an
> unrelated question.
>
> Btree may be the default option.
>
> Thanks.
>
>
> On Saturday, April 24, 2021, Michael Lewis  wrote:
>
>>
>>
>> On Sat, Apr 24, 2021, 1:27 AM Mohan Radhakrishnan <
>> radhakrishnan.mo...@gmail.com> wrote:
>>
>>> What's your question exactly? If you have confidence that correlation
>>> will remain high (insert only table, or occasional cluster/repack with
>>> cluster is done), then BRIN can be a good fit. If you do updates and
>>> deletes and new tuples (inserts and updates) come in and fill in those gaps
>>> left behind in early pages even though timestamp is high, then correlation
>>> will go down and brin will no longer be a good fit.
>>>
>>> Note- timestamp *with* timezone is recommended.
>>>
>>> The timestamptz isn't deleted or updated. It is only inserted. Another
>>> ENUM column will be updated.
>>> It looks like I should use brin. We also have other history tables like
>>> this.
>>>
>>> Thanks.
>>>
>>
>> That's not a correct conclusion. Reply all btw.
>>
>> Updating any value in the row means a new version of the row is inserted
>> and old one is marked expired and will be cleaned up by vacuum after no
>> transactions might need that row version (tuple). Research a bit about how
>> MVCC is implemented in Postgres.
>>
>> If those updates would only be on recent rows and autovacuum is tuned
>> aggressively to keep the maintenance task under control, then the
>> correlation may remain high as only recent rows are being updated. If the
>> updates go back a long ways, then BRIN is unlikely to be a good fit. Maybe
>> it still could be if table fillfactor is lowered a bit and the enum is not
>> indexed so you get mostly HOT (heap only tuple) updates, but I wouldn't
>> count on it.
>>
>> Why not use a btree index for the timestamptz column?
>>
>>

Primary keys and composite unique keys(basic question)

Re: Primary keys and composite unique keys(basic question)

Storing state machine

Re: Storing state machine

Re: Storing state machine

BRIN index on timestamptz

Re: BRIN index on timestamptz

Re: BRIN index on timestamptz

Re: BRIN index on timestamptz

9 matches

Site Navigation

Mail list logo

Footer information