Re: Understanding the differences between Temporal tables, CDC and Time Series.

2021-11-10 Thread SQL Padawan



Hi Peter, and thanks for answering.


> > I'm unclear as to the "philosophical" distinctions here and would be
> > grateful if anybody could explain the diffence(s) between them?

> A time series database contains data records containing a time stamp

> A temporal database contains records that have a timestamp range


OK - so we have two stock "ticker" dbs - one a Time Series db (TS-db) and one a 
Temporal db (T-db).

All of the code below is on a fiddle here

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=207f38e5c7d6e7861c402c2c4926840c

CREATE TABLE ts_portfolio  -- time series
(
  txn_id  INTEGER NOT NULL,
  company TEXT NOT NULL,
  price   INTEGER NOT NULL,
  txn_ts  TIMESTAMP NOT NULL
);

with a few records


INSERT INTO ts_portfolio VALUES
(1234,   'DB', 10,  '07/11/21 00:12:00'),-- record inserted on 
07/11/21
(2345,   'DB', 20,  '08/11/21 00:12:00'),-- record inserted on 
08/11/21
(3456,   'DB', 30,  '09/11/21 00:12:00'),-- record inserted on 
09/11/21
(4567,   'DB', 40,  '10/11/21 00:12:00'),-- record inserted on 
10/11/21
(5678,   'DB', 50,  '11/11/21 00:12:00');-- record inserted on 
11/11/21


So, for example, in order to query the share price AS OF 00:00:00 on the 09/11

SELECT company, price
FROM ts_portfolio
WHERE company = 'DB'
AND txn_ts <= '09/11/21 00:00:00'
ORDER BY txn_ts DESC
LIMIT 1;

We pick up the price from the latest transaction either before or exactly on 
09/11/2021 00:00:00

Things become tricky when we want to see the prices on a given day: 09/11/2021

--
-- UNION better here? See EXPLAIN in fiddle - not important for this debate!
--

WITH cte1 AS
(
  SELECT txn_id AS t_start
  FROM ts_portfolio
  WHERE company = 'DB'
  AND txn_ts <= '09/11/21 00:00:00'
  ORDER BY txn_ts DESC
  LIMIT 1
), cte2 AS
(
  SELECT txn_id AS t_end
  FROM ts_portfolio
  WHERE company = 'DB'
  AND txn_ts <= '09/11/21 23:59:59.99'
  ORDER BY txn_ts DESC
  LIMIT 1
)
SELECT
  t.txn_id, t.company, t.price
FROM
  ts_portfolio t
WHERE txn_id BETWEEN (SELECT t_start FROM cte1) AND (SELECT t_end FROM cte2);


txn_id  company price
2345DB  20
3456DB  30

which is correct - from 08/11 midday till 09/11 midday, the price was 20 and 
then at midday 09/11 it rose to 30!


Becoming complex.

However, a Temporal (versioned) table would only require one extra field:

  txn_t_from TIMESTAMP NOT NULL,
  txn_t_to   TIMESTAMP NOT NULL  -- extra field - see fiddle for data inserts 
and updates!


Each INSERT (behind the scenes) is an INSERT and an UPDATE.

INSERT INTO tdb_portfolio VALUES
(1000, 'DB',  5, '01/01/1900','07/11/21 00:12:00'),  -- inserted in the 
past, updated 07/11 @ 12:00
(1234, 'DB', 10, '07/11/21 00:12:00', '08/11/21 00:12:00'),  -- record inserted 
on 07/11/21 - updated 08/11 @ 12:00
(2345, 'DB', 20, '08/11/21 00:12:00', '09/11/21 00:12:00'),  -- record inserted 
on 08/11/21 - and so on...
(3456, 'DB', 30, '09/11/21 00:12:00', '10/11/21 00:12:00'),  -- record inserted 
on 09/11/21
(4567, 'DB', 40, '10/11/21 00:12:00', '11/11/21 00:12:00'),  -- record inserted 
on 10/11/21
(5678, 'DB', 50, '11/11/21 00:12:00', '19/01/2038');  -- record inserted on 
11/11/21 - never updated (effectively INFINITY until next update)



The queries become simpler:

SELECT txn_id, company, price
FROM tdb_portfolio AS OF '09/11/2021 00:00:00';

and the complex one above collapses to:

SELECT txn_id, company, price FROM tdb_portfolio
FOR SYSTEM_TIME BETWEEN ('09/11/2021 00:00:00' AND '09/11/2021 
23:59:59.99');

or maybe if there was a DATE(SYSTEM_TIME) function, ... FOR DATE(SYSTEM_TIME) = 
'09/11/2021';


I'm assuming that (when properly introduced into PostgreSQL), there will be 
optimisations for these sorts of query.

Do temporal tables bring anything else "to the party" - the augmented 
functionality is a nice-to-have, but hardly earth-shattering?

Have I missed out on anything important?

Thx, SQLP!







Re: E.1.2. Migration to Version 14

2021-11-10 Thread Bruce Momjian
On Wed, Nov 10, 2021 at 12:30:37PM +0500, Дмитрий Иванов wrote:
> Hello!
> E.1.2 Migration to Version 14
>  array_append(), array_prepend(), array_cat(), array_position(), 
> array_positions(), array_remove(), array_replace(), and width_bucket()
> 
> Is this a complete list of functions? Or you need to remove all functions:
> Table 9.52. Array Functions
> 
> Do the functions need to be deleted and restored, or can they be recreated?
> I have a lot of them and they are called from other functions:
> 
> Uchet=# SELECT COUNT(pg_proc.proname) AS "COUNT"
> Uchet-#    FROM pg_proc
> Uchet-#   WHERE pg_proc.prosrc ~~ '%array%'::text AND pg_proc.pronamespace =
> 16443::oid;
>  COUNT
> ---
>     89
> (1 row)
> Trying to estimate labor costs and migration opportunities.
> Thanks!

Uh, I think they can just be recreated on the new server.  I think we
suggested delete then create so that the functions would not be called
accidentally on the new server before being recreated.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Logical Replication - Type messages?

2021-11-10 Thread Tom Lane
Stefen Hillman  writes:
> Currently, I process Relation messages to get most of the information I
> need to work with Inserts, Updates, and Deletes. However, for type
> information I'm currently using a SQL query to get the column type
> information. I wanted to use the Type messages, but I never see them. I see
> Begin, Commit, etc. but never Type.

>- When are the Type messages generated and sent by the server?
>- Is there a way to cause them to be generated?

Hmm.  AFAICS the Type messages are completely undocumented in
protocol.sgml, which is an oversight.  Looking at the code, they
do exist, but they are sent only when a Relation message refers
to a non-built-in type.  There's a presumption that built-in
types have stable OIDs that will be the same at publisher and
subscriber, so the replication traffic needn't tell the subscriber
what those OIDs mean.  Non-built-in types don't have stable OIDs,
so we want to tell the subscriber what those OIDs mean.

(Actually, it looks like the cutoff is FirstGenbkiObjectId, which
means that the Type message is only suppressed for hand-assigned
type OIDs, an even stricter rule than "built in".  So for testing
purposes you could try something like an array type; replicating
a table with an array column should give rise to Type messages.)

>- Are these type id > type mappings set in stone, or does PostgreSQL
>allow for a server to have custom types (I notice that namespace is one of
>the properties sent with the types)?

Some would say that custom types are THE defining feature of
Postgres, compared to other SQL implementations.

regards, tom lane




Re: E.1.2. Migration to Version 14

2021-11-10 Thread Дмитрий Иванов
Хорошо спасибо!
This does not give an understanding of the need to work with all the
functions (Table 9.52) of arrays or specified in the release.
I am working on version 12 of the main kernel. I didn’t see what I’ll get,
other than being up-to-date when upgrading to version 14. I faced
performance degradation when using the array aggregation functions.
I have improved performance by using functions that take arrays as
arguments.
CTEs give the same result, but are difficult to maintain.
If the declarative query structure were built with filtering before
aggregation, this trick would not be needed, but this is not the case in
version 12.
But this is not the case.
Based on the setting of the topic:
Re: Max array size

I have created functions that take an array for later aggregation of
subordinate tuples.
This gave an exponential dependence of performance on selectivity.
I want to understand what I will get from version 14 and at the moment the
benefits are illusory.




ср, 10 нояб. 2021 г. в 21:11, Bruce Momjian :

> On Wed, Nov 10, 2021 at 12:30:37PM +0500, Дмитрий Иванов wrote:
> > Hello!
> > E.1.2 Migration to Version 14
> >  array_append(), array_prepend(), array_cat(), array_position(),
> > array_positions(), array_remove(), array_replace(), and width_bucket()
> >
> > Is this a complete list of functions? Or you need to remove all
> functions:
> > Table 9.52. Array Functions
> >
> > Do the functions need to be deleted and restored, or can they be
> recreated?
> > I have a lot of them and they are called from other functions:
> >
> > Uchet=# SELECT COUNT(pg_proc.proname) AS "COUNT"
> > Uchet-#FROM pg_proc
> > Uchet-#   WHERE pg_proc.prosrc ~~ '%array%'::text AND
> pg_proc.pronamespace =
> > 16443::oid;
> >  COUNT
> > ---
> > 89
> > (1 row)
> > Trying to estimate labor costs and migration opportunities.
> > Thanks!
>
> Uh, I think they can just be recreated on the new server.  I think we
> suggested delete then create so that the functions would not be called
> accidentally on the new server before being recreated.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>


Pause streaming replication

2021-11-10 Thread Rita
Hello.

I am testing alerting on my primary and standby setup. I have async
replication working but I would like to temporarily pause it so the value
of 'state' isn't streaming. (select * from pg_stat_replication).

How can I do that?


-- 
--- Get your facts first, then you can distort them as you please.--


Re: Pause streaming replication

2021-11-10 Thread Ben Chobot

Rita wrote on 11/10/21 1:25 PM:

Hello.

I am testing alerting on my primary and standby setup. I have async 
replication working but I would like to temporarily pause it so the 
value of 'state' isn't streaming. (select * from pg_stat_replication).


How can I do that?


By reading the fine manual: 
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL-TABLE


The function you are looking for is pg_wal_replay_pause().




Re: Pause streaming replication

2021-11-10 Thread Rita
Yes, I have read the manual and seen this. It pauses the replication
(select pg_is_wal_replay_paused()). But on the primary, when I look at
pg_stat_replication, it still says 'streaming' in the state column. My
question was how do I get it from 'streaming'  to anything else?  (
https://www.postgresql.org/docs/11/monitoring-stats.html#PG-STAT-REPLICATION-VIEW)
?

I want to trigger an alert when 'streaming' isn't there. Or is there a
better way to do it?

On Wed, Nov 10, 2021 at 7:24 PM Ben Chobot  wrote:

> Rita wrote on 11/10/21 1:25 PM:
> > Hello.
> >
> > I am testing alerting on my primary and standby setup. I have async
> > replication working but I would like to temporarily pause it so the
> > value of 'state' isn't streaming. (select * from pg_stat_replication).
> >
> > How can I do that?
>
> By reading the fine manual:
>
> https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL-TABLE
>
> The function you are looking for is pg_wal_replay_pause().
>


-- 
--- Get your facts first, then you can distort them as you please.--


Re: Pause streaming replication

2021-11-10 Thread Ben Chobot

Rita wrote on 11/10/21 5:36 PM:
Yes, I have read the manual and seen this. It pauses the replication 
(select pg_is_wal_replay_paused()). But on the primary, when I look at 
pg_stat_replication, it still says 'streaming' in the state column. My 
question was how do I get it from 'streaming'  to anything else?  
(https://www.postgresql.org/docs/11/monitoring-stats.html#PG-STAT-REPLICATION-VIEW) 
?


I want to trigger an alert when 'streaming' isn't there. Or is there a 
better way to do it?


Oh yes, I'm sorry. I see now that was your original question, and my 
suggestion doesn't help you there.


But I would say that, in my experience with monitoring streaming 
replication, I haven't cared so much about if the replica is streaming 
vs. rebuilding, but rather how far behind it has fallen when it is 
supposedly streaming. Pausing replication lets you see what it will look 
like when a disk failure blocks your replica's writes, or client 
activity blocks wal replay, which are absolutely situations you want to 
monitor for.




Re: Pause streaming replication

2021-11-10 Thread Michael Paquier
On Wed, Nov 10, 2021 at 08:36:45PM -0500, Rita wrote:
> Yes, I have read the manual and seen this. It pauses the replication
> (select pg_is_wal_replay_paused()). But on the primary, when I look at
> pg_stat_replication, it still says 'streaming' in the state column. My
> question was how do I get it from 'streaming'  to anything else?  (
> https://www.postgresql.org/docs/11/monitoring-stats.html#PG-STAT-REPLICATION-VIEW)
> ?

pg_is_wal_replay_paused() pauses WAL *replay* at recovery, but it does
not stop the stream of WAL from the primary to its standbys.

> I want to trigger an alert when 'streaming' isn't there. Or is there a
> better way to do it?

If you want to have some automated test to check a given state of the
replication, you could use a combination of a SIGSTOP on the WAL
receiver of the standby and/or the WAL sender of the primary, with
some pg_terminate_backend() calls, just to throw one idea in the
bucket.
--
Michael


signature.asc
Description: PGP signature


Re: Pause streaming replication

2021-11-10 Thread Laurenz Albe
On Wed, 2021-11-10 at 20:36 -0500, Rita wrote:
> On Wed, Nov 10, 2021 at 7:24 PM Ben Chobot  wrote:
> >Rita wrote on 11/10/21 1:25 PM:
> > > Hello.
> > > 
> > > I am testing alerting on my primary and standby setup. I have async 
> > > replication working but I would like to temporarily pause it so the 
> > > value of 'state' isn't streaming. (select * from pg_stat_replication).
> > > 
> > > How can I do that?
> > 
> > By reading the fine manual: 
> > https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL-TABLE
> > 
> > The function you are looking for is pg_wal_replay_pause().
> 
> Yes, I have read the manual and seen this. It pauses the replication (select 
> pg_is_wal_replay_paused()).
> But on the primary, when I look at pg_stat_replication, it still says 
> 'streaming' in the
> state column. My question was how do I get it from 'streaming'  to anything 
> else?
> (https://www.postgresql.org/docs/11/monitoring-stats.html#PG-STAT-REPLICATION-VIEW)
>  ?
> 
> I want to trigger an alert when 'streaming' isn't there. Or is there a better 
> way to do it?

If the standby is no longer streaming, that certainly should trigger an alert.

But if you want to monitor replication delay, you should instead use a query 
like

   SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)
   FROM pg_stat_replication;

That will measure the replay delay in bytes, and you can alert if the value 
exceeds a certain amount.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com