date:20220831

CREATE SUBSCRIPTION not picking up .pgpass while psql does

2022-08-31 Thread Kristjan Mustkivi

Hello,

I do not understand why CREATE SUBSCRIPTION does not pick up .pgpass
(when psql does):

root@pg.newtest:/# psql 'host=pg.oldtest port=5432 user=pg_replication
dbname=oldtest'
oldtest=# \q

root@pg.newtest:/# psql newtest postgres
newtest=# CREATE SUBSCRIPTION sub_pgupgrade CONNECTION
'host=pg.oldtest port=5432 user=pg_replication dbname=oldtest'
PUBLICATION pub_pgupgrade;
ERROR:  could not connect to the publisher: connection to server at
"pg.oldtest" (x.x.x.x), port 5432 failed: fe_sendauth: no password
supplied
newtest=#

oldtest is 11.15 (Debian 11.15-1.pgdg90+1)
newtest is 14.5 (Debian 14.5-1.pgdg110+1)

.pgpass is under /root (home inside the docker container) with 0600
permissions and owned by user "postgres". Even providing
passfile=/root/.pgpass in the connstring does not work.

.pgpass contains:
*:*:*:pg_replication:password

Any thoughts on how to debug much appreciated.

Best regards,
-- 
Kristjan Mustkivi

Email: kristjan.mustk...@gmail.com

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

2022-08-31 Thread hubert depesz lubaczewski

On Wed, Aug 31, 2022 at 04:03:31PM +0300, Kristjan Mustkivi wrote:
> Hello,
> 
> I do not understand why CREATE SUBSCRIPTION does not pick up .pgpass
> (when psql does):
> 
> root@pg.newtest:/# psql 'host=pg.oldtest port=5432 user=pg_replication
> dbname=oldtest'
> oldtest=# \q
> 
> root@pg.newtest:/# psql newtest postgres
> newtest=# CREATE SUBSCRIPTION sub_pgupgrade CONNECTION
> 'host=pg.oldtest port=5432 user=pg_replication dbname=oldtest'
> PUBLICATION pub_pgupgrade;
> ERROR:  could not connect to the publisher: connection to server at
> "pg.oldtest" (x.x.x.x), port 5432 failed: fe_sendauth: no password
> supplied
> newtest=#
> 
> oldtest is 11.15 (Debian 11.15-1.pgdg90+1)
> newtest is 14.5 (Debian 14.5-1.pgdg110+1)
> 
> .pgpass is under /root (home inside the docker container) with 0600
> permissions and owned by user "postgres". Even providing
> passfile=/root/.pgpass in the connstring does not work.

How/where you provide it?

Why would you assume that postgres (running from user postgres
presumably) would look for pgpass in /root/.pgpass?

postgres should have it in ~postgres/.pgpass
with proper ownership.

depesz

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

2022-08-31 Thread Kristjan Mustkivi

On Wed, Aug 31, 2022 at 4:07 PM hubert depesz lubaczewski
 wrote:

> How/where you provide it?
>
> Why would you assume that postgres (running from user postgres
> presumably) would look for pgpass in /root/.pgpass?
>
> postgres should have it in ~postgres/.pgpass
> with proper ownership.

Hi, depesz,

I use the dockerized version of Postgres (both old and new cases).

The docker-compose.yml file maps the .pgass file as volume (-
./v_pgpass:/root/.pgpass).

Inside the container the home is /root:
sudo docker exec -ti db bash
root@pg.newtest:/# cd ~
root@pg.newtest:~# pwd
/root
root@pg.newtest:~# ls -la .pgpass
-rw--- 1 postgres root 74 Aug 30 11:38 .pgpass

And as said, the psql utility has no problems finding the .pgass where
it is. If I lie to it about the pgpass location i.e by giving
passfile=/root/.pgpassx it will ask for password.

Why create subscription does not pick the .pgpass up like psql does, -
that is what I cannot figure out.

Cheers!
-- 
Kristjan Mustkivi

Email: kristjan.mustk...@gmail.com

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

2022-08-31 Thread hubert depesz lubaczewski

On Wed, Aug 31, 2022 at 04:26:22PM +0300, Kristjan Mustkivi wrote:
> And as said, the psql utility has no problems finding the .pgass where
> it is. If I lie to it about the pgpass location i.e by giving
> passfile=/root/.pgpassx it will ask for password.

of course it doesn't have problem, because you run it as root.
put subscription is run by pg backend, which runs as postgres user, not
root!

also - no pg tool cares about "passfile" env variable, so it is just
ignored.

Best regards,

depesz

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

2022-08-31 Thread Kristjan Mustkivi

On Wed, Aug 31, 2022 at 4:27 PM hubert depesz lubaczewski
 wrote:
>
> On Wed, Aug 31, 2022 at 04:26:22PM +0300, Kristjan Mustkivi wrote:
> > And as said, the psql utility has no problems finding the .pgass where
> > it is. If I lie to it about the pgpass location i.e by giving
> > passfile=/root/.pgpassx it will ask for password.
>
> of course it doesn't have problem, because you run it as root.
> put subscription is run by pg backend, which runs as postgres user, not
> root!
>
> also - no pg tool cares about "passfile" env variable, so it is just
> ignored.

Thank you, depesz!

After I mapped the volume as (-
./v_pgpass:/var/lib/postgresql/.pgpass) CREATE SUBSCRIPTION was able to find it.

Somehow my initial POC fooled me then I kept barking under the wrong tree.

Best Regards,
--
Kristjan Mustkivi

Email: kristjan.mustk...@gmail.com

Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword

2022-08-31 Thread Torge


Dear Postgres Devs,

I use Postgres everywhere I can and for me it is the by far best 
database system out there. Great job, thank you!


Now I would like to humbly propose a feature that gives an easy way to 
get a quick count estimate for any condition - index based or not - 
based on a random sample of rows, that does not require a custom 
function creation or complex SQL statement


The final syntax could be for example:

    SELECT count_estimate(2000) FROM my_table WHERE ...

or, since peerce in the chat pointed out that aggregate functions always 
only get hit after the WHERE condition is already evaluated:


    SAMPLE count(*) FROM my_table WHERE ... SAMPLESIZE 2000

The idea of this function is to estimate the total number of rows by 
checking a random sample size of them for the WHERE condition.


1. Get an as good as possible but quick estimate of max number of rows. 
(rowcount)

2. Select random sample size of all rows (samplesize)
3. check if where condition matches (hits)
4. calculate and return estimate using rowcount * hits / samplesize

The higher the sample size, the more precise is the count, so the 
developer can decide on the tradeoff between accuracy and speed.
Ofc this has it's limitations if the sample size is so small, or the 
actual results so few, that only a few hits - if any at all - make it 
into it.
So maybe a min_hits value should be considered as well, so the sample 
size is increased until at least that many hits are found or an exact 
count was produced (or until an optional max_samplesize was reached)


Typical scenario: highly flexible searching/filtering data with paging 
mechanism and "total results" preview


I think this would find a lot of fans. When once more hitting the famous 
count(*) performance issue I read again a lot of comments, 
stackoverflows and mail threads discussing it, possible workarounds and 
solutions, one more complex than the other.
I ended up writing my own SQL statement that achieves the above and I 
think it meets the requirements of many who started discussions, but I 
really would love to see this become a supported feature so it is easier 
to use and most likely far more performant.



Greetings,
 Torge.




PS:

Here my current SQL level statement to achieve this. Due to limitations 
I couldn't work around, it works slightly different. Instead of the row 
count, it uses the max ID of the table, which ofc has to exist to even work.
Using reltuples was suggested, but in my system it is already often off 
by more than 10% and I don't know how to select random rows in that case 
(efficiently) while I can easily generate random IDs in the space of 
possible IDs.


My statement returns an estimate in ~ 1 second on a 100M table with a 
not index supported WHERE matching 10M entries that is roughly 5% off 
while an exact count takes over 20 seconds


WITH vars AS (
    SELECT (SELECT my_id FROM my_table ORDER BY my_id DESC LIMIT 1) AS 
max_id,  --the maximum ID currently used in this table
    2000 AS sample_size --The number of entries to sample. The higher 
the more precise the estimate

),
random_rows AS ( --Create a temporary table with sample_size number of 
random rows

    SELECT *
    FROM my_table
    WHERE my_id = ANY (ARRAY(
    SELECT (random() * (SELECT max_id FROM vars))::int --Select a 
random id from 0 to max_id
    FROM GENERATE_SERIES(1, (SELECT sample_size FROM vars)) 
--generate sample_size number of rows

    ))
)
SELECT (SELECT max_id FROM vars) * count(*) / (SELECT sample_size from 
vars)::int AS estimated_count

FROM random_rows
WHERE ... --Any where condition possible

There was a concern mentioned in chat that the ID might have holes, 
especially as the sequence is not reset on failed transactions or when 
stuff is deleted. But this should not matter much, it reduces the 
accuracy, but could be countered by a bigger sample size. Also a min_id 
could easily be added in cases where old rows get deleted.


Example:

table has 100M entries
Condition would match 10M entries
highest ID is 300.000.000
so only 1/3rd of IDs really exist in the table.
Sample size: 3000

We try to fetch 3000 rows by probing random IDs resulting in roughly 
1000 actual rows, lets say 1012

We compare those rows against our WHERE condition and match 96

The resulting estimate is maxId * hits / samplesize = 300.000.000 * 96 / 
3000 = 9.600.000


The holes, no matter where they are, while reducing the precision, are 
not a general problem.
This becomes clearer if we first calculate the estimated number of 
existing rows based on the data and then continue from there, which will 
in the end return the same result


actualRowEstimate = maxId * fetchedRows / samplesize = 300.000.000 * 
1012 / 3000 = 101.200.000
estimate = actualRowEstimate * hits / fetchedRows = 101.200.000 * 96 / 
1012 = 9.600.000  => same number


Because:
estimate = actualRowEstimate * hits / fetchedRows
= (maxId * fetchedRows / samplesize) * hits / fetchedRows
= maxId *

Re: Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword

2022-08-31 Thread Tom Lane

Torge  writes:
> Now I would like to humbly propose a feature that gives an easy way to 
> get a quick count estimate for any condition - index based or not - 
> based on a random sample of rows, that does not require a custom 
> function creation or complex SQL statement

Can't you do that already using TABLESAMPLE?  For example, to
use a 1% sample:

select count(*) * 100 from mytab tablesample system(1) where ;

You do have to remember to multiply by the factor corresponding
to your sample rate, but aside from that annoyance this (a)
already exists, (b) is SQL-standard, and (c) can be adapted to
a lot of other kinds of analysis besides plain count(*).

regards, tom lane

Re: Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword

2022-08-31 Thread Torge


I am not aware of this, sounds like what I want. Will look into it. Thanks!

Am 01.09.22 um 03:47 schrieb Tom Lane:

Torge  writes:

Now I would like to humbly propose a feature that gives an easy way to
get a quick count estimate for any condition - index based or not -
based on a random sample of rows, that does not require a custom
function creation or complex SQL statement

Can't you do that already using TABLESAMPLE?  For example, to
use a 1% sample:

select count(*) * 100 from mytab tablesample system(1) where ;

You do have to remember to multiply by the factor corresponding
to your sample rate, but aside from that annoyance this (a)
already exists, (b) is SQL-standard, and (c) can be adapted to
a lot of other kinds of analysis besides plain count(*).

regards, tom lane

CREATE SUBSCRIPTION not picking up .pgpass while psql does

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does

Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword

Re: Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword

Re: Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword

8 matches

Site Navigation

Mail list logo

Footer information