CREATE SUBSCRIPTION not picking up .pgpass while psql does
Hello, I do not understand why CREATE SUBSCRIPTION does not pick up .pgpass (when psql does): root@pg.newtest:/# psql 'host=pg.oldtest port=5432 user=pg_replication dbname=oldtest' oldtest=# \q root@pg.newtest:/# psql newtest postgres newtest=# CREATE SUBSCRIPTION sub_pgupgrade CONNECTION 'host=pg.oldtest port=5432 user=pg_replication dbname=oldtest' PUBLICATION pub_pgupgrade; ERROR: could not connect to the publisher: connection to server at "pg.oldtest" (x.x.x.x), port 5432 failed: fe_sendauth: no password supplied newtest=# oldtest is 11.15 (Debian 11.15-1.pgdg90+1) newtest is 14.5 (Debian 14.5-1.pgdg110+1) .pgpass is under /root (home inside the docker container) with 0600 permissions and owned by user "postgres". Even providing passfile=/root/.pgpass in the connstring does not work. .pgpass contains: *:*:*:pg_replication:password Any thoughts on how to debug much appreciated. Best regards, -- Kristjan Mustkivi Email: kristjan.mustk...@gmail.com
Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does
On Wed, Aug 31, 2022 at 04:03:31PM +0300, Kristjan Mustkivi wrote: > Hello, > > I do not understand why CREATE SUBSCRIPTION does not pick up .pgpass > (when psql does): > > root@pg.newtest:/# psql 'host=pg.oldtest port=5432 user=pg_replication > dbname=oldtest' > oldtest=# \q > > root@pg.newtest:/# psql newtest postgres > newtest=# CREATE SUBSCRIPTION sub_pgupgrade CONNECTION > 'host=pg.oldtest port=5432 user=pg_replication dbname=oldtest' > PUBLICATION pub_pgupgrade; > ERROR: could not connect to the publisher: connection to server at > "pg.oldtest" (x.x.x.x), port 5432 failed: fe_sendauth: no password > supplied > newtest=# > > oldtest is 11.15 (Debian 11.15-1.pgdg90+1) > newtest is 14.5 (Debian 14.5-1.pgdg110+1) > > .pgpass is under /root (home inside the docker container) with 0600 > permissions and owned by user "postgres". Even providing > passfile=/root/.pgpass in the connstring does not work. How/where you provide it? Why would you assume that postgres (running from user postgres presumably) would look for pgpass in /root/.pgpass? postgres should have it in ~postgres/.pgpass with proper ownership. depesz
Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does
On Wed, Aug 31, 2022 at 4:07 PM hubert depesz lubaczewski wrote: > How/where you provide it? > > Why would you assume that postgres (running from user postgres > presumably) would look for pgpass in /root/.pgpass? > > postgres should have it in ~postgres/.pgpass > with proper ownership. Hi, depesz, I use the dockerized version of Postgres (both old and new cases). The docker-compose.yml file maps the .pgass file as volume (- ./v_pgpass:/root/.pgpass). Inside the container the home is /root: sudo docker exec -ti db bash root@pg.newtest:/# cd ~ root@pg.newtest:~# pwd /root root@pg.newtest:~# ls -la .pgpass -rw--- 1 postgres root 74 Aug 30 11:38 .pgpass And as said, the psql utility has no problems finding the .pgass where it is. If I lie to it about the pgpass location i.e by giving passfile=/root/.pgpassx it will ask for password. Why create subscription does not pick the .pgpass up like psql does, - that is what I cannot figure out. Cheers! -- Kristjan Mustkivi Email: kristjan.mustk...@gmail.com
Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does
On Wed, Aug 31, 2022 at 04:26:22PM +0300, Kristjan Mustkivi wrote: > And as said, the psql utility has no problems finding the .pgass where > it is. If I lie to it about the pgpass location i.e by giving > passfile=/root/.pgpassx it will ask for password. of course it doesn't have problem, because you run it as root. put subscription is run by pg backend, which runs as postgres user, not root! also - no pg tool cares about "passfile" env variable, so it is just ignored. Best regards, depesz
Re: CREATE SUBSCRIPTION not picking up .pgpass while psql does
On Wed, Aug 31, 2022 at 4:27 PM hubert depesz lubaczewski wrote: > > On Wed, Aug 31, 2022 at 04:26:22PM +0300, Kristjan Mustkivi wrote: > > And as said, the psql utility has no problems finding the .pgass where > > it is. If I lie to it about the pgpass location i.e by giving > > passfile=/root/.pgpassx it will ask for password. > > of course it doesn't have problem, because you run it as root. > put subscription is run by pg backend, which runs as postgres user, not > root! > > also - no pg tool cares about "passfile" env variable, so it is just > ignored. Thank you, depesz! After I mapped the volume as (- ./v_pgpass:/var/lib/postgresql/.pgpass) CREATE SUBSCRIPTION was able to find it. Somehow my initial POC fooled me then I kept barking under the wrong tree. Best Regards, -- Kristjan Mustkivi Email: kristjan.mustk...@gmail.com
Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword
Dear Postgres Devs, I use Postgres everywhere I can and for me it is the by far best database system out there. Great job, thank you! Now I would like to humbly propose a feature that gives an easy way to get a quick count estimate for any condition - index based or not - based on a random sample of rows, that does not require a custom function creation or complex SQL statement The final syntax could be for example: SELECT count_estimate(2000) FROM my_table WHERE ... or, since peerce in the chat pointed out that aggregate functions always only get hit after the WHERE condition is already evaluated: SAMPLE count(*) FROM my_table WHERE ... SAMPLESIZE 2000 The idea of this function is to estimate the total number of rows by checking a random sample size of them for the WHERE condition. 1. Get an as good as possible but quick estimate of max number of rows. (rowcount) 2. Select random sample size of all rows (samplesize) 3. check if where condition matches (hits) 4. calculate and return estimate using rowcount * hits / samplesize The higher the sample size, the more precise is the count, so the developer can decide on the tradeoff between accuracy and speed. Ofc this has it's limitations if the sample size is so small, or the actual results so few, that only a few hits - if any at all - make it into it. So maybe a min_hits value should be considered as well, so the sample size is increased until at least that many hits are found or an exact count was produced (or until an optional max_samplesize was reached) Typical scenario: highly flexible searching/filtering data with paging mechanism and "total results" preview I think this would find a lot of fans. When once more hitting the famous count(*) performance issue I read again a lot of comments, stackoverflows and mail threads discussing it, possible workarounds and solutions, one more complex than the other. I ended up writing my own SQL statement that achieves the above and I think it meets the requirements of many who started discussions, but I really would love to see this become a supported feature so it is easier to use and most likely far more performant. Greetings, Torge. PS: Here my current SQL level statement to achieve this. Due to limitations I couldn't work around, it works slightly different. Instead of the row count, it uses the max ID of the table, which ofc has to exist to even work. Using reltuples was suggested, but in my system it is already often off by more than 10% and I don't know how to select random rows in that case (efficiently) while I can easily generate random IDs in the space of possible IDs. My statement returns an estimate in ~ 1 second on a 100M table with a not index supported WHERE matching 10M entries that is roughly 5% off while an exact count takes over 20 seconds WITH vars AS ( SELECT (SELECT my_id FROM my_table ORDER BY my_id DESC LIMIT 1) AS max_id, --the maximum ID currently used in this table 2000 AS sample_size --The number of entries to sample. The higher the more precise the estimate ), random_rows AS ( --Create a temporary table with sample_size number of random rows SELECT * FROM my_table WHERE my_id = ANY (ARRAY( SELECT (random() * (SELECT max_id FROM vars))::int --Select a random id from 0 to max_id FROM GENERATE_SERIES(1, (SELECT sample_size FROM vars)) --generate sample_size number of rows )) ) SELECT (SELECT max_id FROM vars) * count(*) / (SELECT sample_size from vars)::int AS estimated_count FROM random_rows WHERE ... --Any where condition possible There was a concern mentioned in chat that the ID might have holes, especially as the sequence is not reset on failed transactions or when stuff is deleted. But this should not matter much, it reduces the accuracy, but could be countered by a bigger sample size. Also a min_id could easily be added in cases where old rows get deleted. Example: table has 100M entries Condition would match 10M entries highest ID is 300.000.000 so only 1/3rd of IDs really exist in the table. Sample size: 3000 We try to fetch 3000 rows by probing random IDs resulting in roughly 1000 actual rows, lets say 1012 We compare those rows against our WHERE condition and match 96 The resulting estimate is maxId * hits / samplesize = 300.000.000 * 96 / 3000 = 9.600.000 The holes, no matter where they are, while reducing the precision, are not a general problem. This becomes clearer if we first calculate the estimated number of existing rows based on the data and then continue from there, which will in the end return the same result actualRowEstimate = maxId * fetchedRows / samplesize = 300.000.000 * 1012 / 3000 = 101.200.000 estimate = actualRowEstimate * hits / fetchedRows = 101.200.000 * 96 / 1012 = 9.600.000 => same number Because: estimate = actualRowEstimate * hits / fetchedRows = (maxId * fetchedRows / samplesize) * hits / fetchedRows = maxId *
Re: Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword
Torge writes: > Now I would like to humbly propose a feature that gives an easy way to > get a quick count estimate for any condition - index based or not - > based on a random sample of rows, that does not require a custom > function creation or complex SQL statement Can't you do that already using TABLESAMPLE? For example, to use a 1% sample: select count(*) * 100 from mytab tablesample system(1) where ; You do have to remember to multiply by the factor corresponding to your sample rate, but aside from that annoyance this (a) already exists, (b) is SQL-standard, and (c) can be adapted to a lot of other kinds of analysis besides plain count(*). regards, tom lane
Re: Feature request for count_estimate(samplesize) aggregate or SAMPLE keyword
I am not aware of this, sounds like what I want. Will look into it. Thanks! Am 01.09.22 um 03:47 schrieb Tom Lane: Torge writes: Now I would like to humbly propose a feature that gives an easy way to get a quick count estimate for any condition - index based or not - based on a random sample of rows, that does not require a custom function creation or complex SQL statement Can't you do that already using TABLESAMPLE? For example, to use a 1% sample: select count(*) * 100 from mytab tablesample system(1) where ; You do have to remember to multiply by the factor corresponding to your sample rate, but aside from that annoyance this (a) already exists, (b) is SQL-standard, and (c) can be adapted to a lot of other kinds of analysis besides plain count(*). regards, tom lane