RE: Get COUNT results from two different columns

2021-09-23 Thread SQL Padawan


Hi,


> I have two separate queries that work individually, returning a count from 
> each column.

> I would appreciate any pointers.

> new_sup,  COUNT(new_sup) AS new_sup_count

>     old_sup,  COUNT(old_sup) AS old_sup_count

At least some of your problem is here - in your UNION-ed table, this is one and 
the same field and therefore cannot have two different names.


HTH,

SQL Padawan






Re: Strange sequences - how to construct?

2021-10-27 Thread SQL Padawan



> --- example: lets get 3 sequences
>
> select next_sequence(), next_sequence(), next_sequence();
> --- inspect the table to see what happned
> select * from my_sequence;


Thanks for your input on this issue.

SQLP


Sent with ProtonMail Secure Email.




Model clause and

2021-10-28 Thread SQL Padawan
Good evening everybody.

Are there any plans to emulate Oracle MODEL clause in PostgreSQL?

If not, why not? It doesn't appear to have gained much traction even in the 
Oracle world?

Also, are there any plans to introduce the MATCH_RECOGNIZE functionality?

I read an interview with Dimitri and he said he'd really like to see this in 
Postgres?

I presume that VIRTUAL GENERATED columns are on the to-do list?

Just curious.

SQLP

Re: Model clause and

2021-10-28 Thread SQL Padawan
That's Dimitri Fontaine - I don't know him personally and didn't mean to be 
impolite by referring to him by his first name - typo really!

SQLP

Sent with [ProtonMail](https://protonmail.com/) Secure Email.

‐‐‐ Original Message ‐‐‐
On Thursday, October 28th, 2021 at 20:57, SQL Padawan 
 wrote:

> Good evening everybody.
>
> Are there any plans to emulate Oracle MODEL clause in PostgreSQL?
>
> If not, why not? It doesn't appear to have gained much traction even in the 
> Oracle world?
>
> Also, are there any plans to introduce the MATCH_RECOGNIZE functionality?
>
> I read an interview with Dimitri and he said he'd really like to see this in 
> Postgres?
>
> I presume that VIRTUAL GENERATED columns are on the to-do list?
>
> Just curious.
>
> SQLP

Settings for a new machine - some guidance sought.

2021-10-30 Thread SQL Padawan


Good afternoon everyone,

I have a new laptop - running version 13.4 - simple binary from EnterpriseDB - 
no install!

I plan to have a dual-boot machine - Windows 125 GB - have to be able to run 
Tableau which doesn't have a Linux desktop...

Linux - the rest...


 Spec =

 - 8 GB RAM
- 256 GB NVMe disk
 - Intel Core i5-1135G7 (11th Gen) Processor

The page here:
https://laptoping.com/cpus/product/intel-core-i5-1135g7/

tells me that this CPU has 

Number of Cores Quad-core / 2 computing threads per core

===

So, I input these parameters into pgtune.

and received output as follows - I'm not simply dumping the output here and 
expecting y'all to do all of the work! :-)

I looked up each parameter and what it does - I'm not into aimless 
button-twiddling!

I looked at:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server and the 
EnterpriseDB white papers:
Configuring and Tuning PostgreSQL and EDB Postgres Advanced Server for Window & 
Linux
and various other sites.

Windows

max_connections = 100 - not important - not a server - it's a single-user 
machine.

shared_buffers = 2GB
# Default - 128MB - recommendation is in line with Tuning Your PostgreSQL 
Server - 25% of RAM

effective_cache_size = 6GB
# Default - 4GB   - "aggressive" recommendation - this is fine - I want my 
server to be as performant as possible - I'm not pushed if my brower is 
sluggish!

maintenance_work_mem = 512MB
# Default - 64MB  - says it's OK to set it higher as only one maintenance 
operation can be happening at any one time due to locking


checkpoint_completion_target = 0.9
# Default - 0.5   - 0.9 the "useful maximum" - note to self - read up on links 
on the PostgreSQL tuning page!


wal_buffers = 16MB
# Default - 4MB (512 8kB units)  = 16MB "upper limit" (repeat note to self)

default_statistics_target = 100
# Default - 100 - seems OK - "reasonable starting point"

random_page_cost = 1.1
# Default - 4 - 4 is for HDDs - 1.1 is for SSDs? I've seen this written in 
blogs - reduce from 4 for SSDs because of lower SSD penalty for random I/O

= Puzzle 

work_mem = 2621kB
# Default - 4096kb 4MB - This one puzzles me - according to the PostgreSQL 
tuning page, it should be biggish - so as to avoid spill to disk.

>From the EnterpriseDB tuning document, I get a "good starting point" is:
((Total RAM - shared_buffers)/ (16 x CPU cores)) which is (according to my 
calculations): 50MB - so why 2MB?

==


min_wal_size = 1GB
# Default - 80MB - I've changed this setting before, and I think that larger 
settings are better? I'd go for 4GB min/8GBmax?

Maybe on a server where you want your recovery time to be v. small and your 
recovery point to be seconds ago, you might want to keep these smaller?

max_wal_size = 4GB # Default - 1GB

I'd go with 8? I'm still shaky on what exactly the balance is for these two?



max_worker_processes = 8
# Default - 8 - I have 8 cores - but there's considerable overhead with 
parallel queries? But, single-user machine - does this matter?

max_parallel_workers_per_gather = 4
# Default - 2 - half of max_worker_processes appears to be the consensus?
So, per query, a maximum of 4 workers will be assigned?


max_parallel_workers = 8
# Default - 8 - So, I can't under any circumstances have any more than 8 
parallel workers

max_parallel_maintenance_workers = 4
# Default - 2 - seems reasonable - only applies to VACUUM and CREATE INDEX 
(CONCURRENTLY)

PGTune for Linux gave one extra setting - (it's 0 - unsettable - on Windows)

>From here:
https://postgresqlco.nf/doc/en/param/effective_io_concurrency/

"SSDs and other memory-based storage can often process many concurrent 
requests, so the best value might be in the hundreds."

So, 200 seems reasonable.


I would be grateful if anyone could comment on the suitability or otherwise of 
any/some/all of these settings.

Good references/URLs/.pdfs/texts much appreciated.

Please let me know if you require any more information.

SQLP!


Sent with ProtonMail Secure Email.






Understanding the differences between Temporal tables, CDC and Time Series.

2021-11-09 Thread SQL Padawan
Good morning to everyone,

I'm looking into Temporal Tables (TT - i.e. System/Application time versioning) 
but I would like to understand the differences between TTs and two other 
features (which appear similar in many ways) - Change Data Capture and Time 
Series.

Are Temporal Tables (in a sense) a query framework framework around CDC?

And if that's the case, what then are Time Series - which are specifically 
designed to host (time) changing data?

Are Temporal Tables (again, in a sense) Time Series databases which only record 
changes in time(-stamps)?

I'm unclear as to the "philosophical" distinctions here and would be grateful 
if anybody could explain the diffence(s) between them?

I did look at this myself and found these posts on StackOverflow, but I don't 
feel as if they've fundamentally explained the difference(s).

https://stackoverflow.com/questions/39165412/why-we-require-temporal-table-in-sql-server-2016-as-we-have-cdc-or-ct

https://stackoverflow.com/questions/800331/why-do-we-need-a-temporal-database

Any ideas, discussion, references, URLs welcome.

Thx,

SQLP!

Sent with [ProtonMail](https://protonmail.com/) Secure Email.

Re: Understanding the differences between Temporal tables, CDC and Time Series.

2021-11-10 Thread SQL Padawan



Hi Peter, and thanks for answering.


> > I'm unclear as to the "philosophical" distinctions here and would be
> > grateful if anybody could explain the diffence(s) between them?

> A time series database contains data records containing a time stamp

> A temporal database contains records that have a timestamp range


OK - so we have two stock "ticker" dbs - one a Time Series db (TS-db) and one a 
Temporal db (T-db).

All of the code below is on a fiddle here

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=207f38e5c7d6e7861c402c2c4926840c

CREATE TABLE ts_portfolio  -- time series
(
  txn_id  INTEGER NOT NULL,
  company TEXT NOT NULL,
  price   INTEGER NOT NULL,
  txn_ts  TIMESTAMP NOT NULL
);

with a few records


INSERT INTO ts_portfolio VALUES
(1234,   'DB', 10,  '07/11/21 00:12:00'),-- record inserted on 
07/11/21
(2345,   'DB', 20,  '08/11/21 00:12:00'),-- record inserted on 
08/11/21
(3456,   'DB', 30,  '09/11/21 00:12:00'),-- record inserted on 
09/11/21
(4567,   'DB', 40,  '10/11/21 00:12:00'),-- record inserted on 
10/11/21
(5678,   'DB', 50,  '11/11/21 00:12:00');-- record inserted on 
11/11/21


So, for example, in order to query the share price AS OF 00:00:00 on the 09/11

SELECT company, price
FROM ts_portfolio
WHERE company = 'DB'
AND txn_ts <= '09/11/21 00:00:00'
ORDER BY txn_ts DESC
LIMIT 1;

We pick up the price from the latest transaction either before or exactly on 
09/11/2021 00:00:00

Things become tricky when we want to see the prices on a given day: 09/11/2021

--
-- UNION better here? See EXPLAIN in fiddle - not important for this debate!
--

WITH cte1 AS
(
  SELECT txn_id AS t_start
  FROM ts_portfolio
  WHERE company = 'DB'
  AND txn_ts <= '09/11/21 00:00:00'
  ORDER BY txn_ts DESC
  LIMIT 1
), cte2 AS
(
  SELECT txn_id AS t_end
  FROM ts_portfolio
  WHERE company = 'DB'
  AND txn_ts <= '09/11/21 23:59:59.99'
  ORDER BY txn_ts DESC
  LIMIT 1
)
SELECT
  t.txn_id, t.company, t.price
FROM
  ts_portfolio t
WHERE txn_id BETWEEN (SELECT t_start FROM cte1) AND (SELECT t_end FROM cte2);


txn_id  company price
2345DB  20
3456DB  30

which is correct - from 08/11 midday till 09/11 midday, the price was 20 and 
then at midday 09/11 it rose to 30!


Becoming complex.

However, a Temporal (versioned) table would only require one extra field:

  txn_t_from TIMESTAMP NOT NULL,
  txn_t_to   TIMESTAMP NOT NULL  -- extra field - see fiddle for data inserts 
and updates!


Each INSERT (behind the scenes) is an INSERT and an UPDATE.

INSERT INTO tdb_portfolio VALUES
(1000, 'DB',  5, '01/01/1900','07/11/21 00:12:00'),  -- inserted in the 
past, updated 07/11 @ 12:00
(1234, 'DB', 10, '07/11/21 00:12:00', '08/11/21 00:12:00'),  -- record inserted 
on 07/11/21 - updated 08/11 @ 12:00
(2345, 'DB', 20, '08/11/21 00:12:00', '09/11/21 00:12:00'),  -- record inserted 
on 08/11/21 - and so on...
(3456, 'DB', 30, '09/11/21 00:12:00', '10/11/21 00:12:00'),  -- record inserted 
on 09/11/21
(4567, 'DB', 40, '10/11/21 00:12:00', '11/11/21 00:12:00'),  -- record inserted 
on 10/11/21
(5678, 'DB', 50, '11/11/21 00:12:00', '19/01/2038');  -- record inserted on 
11/11/21 - never updated (effectively INFINITY until next update)



The queries become simpler:

SELECT txn_id, company, price
FROM tdb_portfolio AS OF '09/11/2021 00:00:00';

and the complex one above collapses to:

SELECT txn_id, company, price FROM tdb_portfolio
FOR SYSTEM_TIME BETWEEN ('09/11/2021 00:00:00' AND '09/11/2021 
23:59:59.99');

or maybe if there was a DATE(SYSTEM_TIME) function, ... FOR DATE(SYSTEM_TIME) = 
'09/11/2021';


I'm assuming that (when properly introduced into PostgreSQL), there will be 
optimisations for these sorts of query.

Do temporal tables bring anything else "to the party" - the augmented 
functionality is a nice-to-have, but hardly earth-shattering?

Have I missed out on anything important?

Thx, SQLP!







Re: Database Scalability

2021-12-01 Thread SQL Padawan



> > To my knowledge PostgreSQL doesn't support sharding, which is well and
> >
> > good because sharding is mostly useless, at least in my opinion.


> Not only does PostgreSQL natively support table partitioning (which is
>
> absolutely a form of sharding), there multiple well-regarded extensions
>
> that can help with sharding, all of which are orthogonal to how you can
>
> configure your application to use Postgres in the first place. So to say
>
> Postgres doesn't support sharding is misleading, at best.
>
> Also, the general concept of sharding to move your scaling challenges
>
> from vertical ones to horizontal ones has multiple self-evident
>
> advantages. If your work history has all happened to fit on a single
>
> server, then bully for you, but not everybody has it so easy.

It supports partitioning out of the box - not sharding where different tables 
reside on different machines!

CitusData and TimescaleDB provide sharding as extensions - both of which appear 
useful for TimeSeries data. There was PostgresXL which was a general sharding 
(multi-machine) solution that appears to have died.

SQLP!






Pgcrypto extension - decrypt(encrypt(... not returning original data?

2021-12-01 Thread SQL Padawan
I tried to use the pgcrypto extension.

from the manual.

https://www.postgresql.org/docs/14/pgcrypto.html

there are two functions - encrypt and decrypt - signatures as follows.

encrypt(data bytea, key bytea, type text) returns bytea
decrypt(data bytea, key bytea, type text) returns bytea

OK.

I try to run this -- see a fiddle

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d2c102f5fb6e62acb1c70a2dc755fdca

SELECT
encrypt('da'::BYTEA, 'pw'::BYTEA, 'bf'),
pg_typeof(encrypt('da'::BYTEA, 'pw'::BYTEA, 'bf')),
decrypt(encrypt('da'::BYTEA, 'pw'::BYTEA, 'bf'), 'pw', 'bf');

but I get this as my result.

encrypt pg_typeof   decrypt
\x54027d78b34ac951  bytea   \x6461

Why is my decrypt function not return the string 'da'? Have I not understand 
something important?

SQLP!