Reconstructing transaction content after the fact

2020-09-02 Thread Lionel Bouton
Hi,

I'm a long time PostgreSQL user (since 7.x), familiar with the basics of
PostgreSQL internals and I'm currently participating in an audit of a
software voting solution which should for obvious reasons avoid to store
any link between the voter register and the actual votes.

The software uses a modern version of PostgreSQL (9.6 or later) and uses
a transaction to store a vote and the corresponding register entry in
two separate tables.

There's already an obvious angle of attack to correlate a vote and a
register entry : the WAL. That said the WAL can be configured to be
short lived to minimize the attack surface. By my understanding a low
checkpoint_timeout value should allow postgresql to recycle/remove old
WAL files early unless the system is overloaded. This can become a bit
messy but if the system is tightly controlled I believe the attack
surface can be reduced.

Another angle I could think of are the transaction ids. If I'm not
mistaken they are stored on disk inside the files storing table and
index data (my quick read of the documentation lets me think the txid is
stored in t_xmin in the HeapTupleHeaderData for a row content).
These values might be cleaned up when the data is visible by all
currently running transaction but I think this isn't needed and so
probably not done (this information is stored in the visibility map). So
reading the raw content from disk you should be able to correlate data
in several tables from a single transaction by comparing the txid.

Are txids in table file data indeed a means by which you can recover the
data written by a single transaction (assuming the txids don't overflow
the 32bit limit during the life of the cluster) ?

Are these t_xmin values ever cleaned up (by VACUUM or another mechanism)
? If positive is there a way to configure the approximate time during
which these values can be recovered ?

Is there a way to access these values by connecting to a PostgreSQL
server instead of analyzing in-memory or on-disk data ?

Best regards,

-- 
Lionel Bouton
gérant de JTEK SARL
https://www.linkedin.com/in/lionelbouton/





Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

2021-02-11 Thread Lionel Bouton
Hi,

Le 11/02/2021 à 01:21, Guy Burgess a écrit :
>
> Hello,
>
> Running 13.1 on Windows Server 2019, I am getting the following log
> entries occasionally:
>
>     2021-02-11 12:34:10.149 NZDT [6072] LOG:  could not rename file
> "pg_wal/0001009900D3": Permission denied
>     2021-02-11 12:40:31.377 NZDT [6072] LOG:  could not rename file
> "pg_wal/0001009900D3": Permission denied
>     2021-02-11 12:46:06.294 NZDT [6072] LOG:  could not rename file
> "pg_wal/0001009900D3": Permission denied
>     2021-02-11 12:46:16.502 NZDT [6072] LOG:  could not rename file
> "pg_wal/0001009900DA": Permission denied
>     2021-02-11 12:50:20.917 NZDT [6072] LOG:  could not rename file
> "pg_wal/0001009900D3": Permission denied
>     2021-02-11 12:50:31.098 NZDT [6072] LOG:  could not rename file
> "pg_wal/0001009900DA": Permission denied
>
> What appears to be happening is the affected WAL files (which is
> usually only 2 or 3 WAL files at a time) are somehow "losing" their
> NTFS permissions, so the PG process can't rename them - though of
> course the PG process created them. Even running icacls as admin gives
> "Access is denied" on those files. A further oddity is the affected
> files do end up disappearing after a while.
>
> The NTFS permissions on the pg_wal directory are correct, and most WAL
> files are unaffected. Chkdsk reports no problems, and the database is
> working fine otherwise. Have tried disabling antivirus software in
> case that was doing something but no difference.
>

I haven't dealt with a Windows environment for quite some time, but from
what I remember an antivirus installs a driver intercepting file
accesses and these drivers are still active even if you disable the
antivirus (I suppose they just call a noop instead of content analysis
code) and can still interfere with your system. For example some years
ago I've seen what looked like a race condition involving rename for
MySQL on Windows that could not be fixed by disabling the antivirus but
could by uninstalling it completely.

You might want to uninstall the antivirus temporarily to check this.

Best regards,

-- 
Lionel Bouton
gérant de JTEK SARL
https://www.linkedin.com/in/lionelbouton/





Re: What have I done!?!?!? :-)

2022-04-06 Thread Lionel Bouton

Hi Perry,

Le 07/04/2022 à 00:25, Perry Smith a écrit :

[...]
Notice that parent_id is suppose to refer to an id in the same table — 
at least, that is what I’m trying to do.  I’m trying to create a 
“root” entry whose parent points to themselves


Note that you don't usually define a root as having a parent_id being 
the same as its id (hard to manage especially when you use a sequence 
nextval() to auto-fill the "id" primary keys).
The usual way is to have parent_id being nullable and roots are then 
rows with no parent_id. This matches the intuitive idea of a root which 
makes code more maintainable.



and I botched the code first time around and now I have this:

files_development=# select * from files;
 id | basename | parent_id |    dev    |   ftype   | uid  | gid  | ino 
| mode  |     mtime      | nlink | size | created_at         |         
updated_at

+--+---+---+---+--+--+-+---++---+--++
 11 | pedz     | *1234* | 687931150 | directory | 1000 | 1002 |   2 | 
16877 | 18:43:29.65271 |  31 |   34 | 2022-04-06 21:58:43.570539 | 
2022-04-06 21:58:43.570539
 12 | pedz     |        12 | 687931150 | directory | 1000 | 1002 |   2 
| 16877 | 18:43:29.65271 |    31 |   34 | 2022-04-06 22:00:29.087417 | 
2022-04-06 22:00:29.115021

(2 rows)


The record with id 11 has a parent id of 1234 which doesn’t exist.

My question isn’t how do I fix it, my question is why didn’t Postgres 
back out the botched record?  Why isn’t it complaining?


Disabling referential integrity in Active Record explicitly disables 
triggers that would have made PostgreSQL return an error.




I’m using Active Record with the psql adapter.  It has a 
disable_referential_integrity which takes a block of code.  When the 
block of code exists, the constraints are put back.  At least, that is 
what I thought.


If you look at ActiveRecord's code 
(https://www.rubydoc.info/docs/rails/ActiveRecord/ConnectionAdapters/PostgreSQL/ReferentialIntegrity#disable_referential_integrity-instance_method) 
:
before the block of code the triggers are disabled, then the block is 
executed and finally the triggers are enabled again (but only after they 
would have had a chance to be used).


I don't think this code is meant for general use (I believe I only used 
it in data migrations on rare occasions). I would bet that this isn't 
safe to use in many cases : unless I missed something you could kill 
your process before the triggers are enabled again leaving your 
application with 0 constraints until disable_referential_integrity is 
used again. What happens when several processes are using it 
simultaneously is probably not what you want either (triggers being 
enabled again by another process in the middle of the execution of your 
code).




I’m wondering if the disabled constraints are still disabled somehow.


Constraints are implemented using triggers so they aren't meant to 
ensure a consistent global state, they only check that the modifications 
are OK at the moment they are done.
If you disable constraints temporarily nothing prevents your data from 
being inconsistent with your constraints.


 If so, how would I check for that and how would I turn them back on? 
 Or am I way off in the weeds?


I'd say the later : in your case I would use a NULL parent_id for 
root(s). Your way leads you to bend PostgreSQL until its back brakes.


Best regards,

--
Lionel Bouton
gérant de JTEK SARL
https://www.linkedin.com/in/lionelbouton/


Re: AWS forcing PG upgrade from v9.6 a disaster

2021-05-29 Thread Lionel Bouton
Le 29/05/2021 à 02:38, Dean Gibson (DB Administrator) a écrit :
> On 2021-05-28 16:51, Ron wrote:
>> On 5/28/21 5:06 PM, Dean Gibson (DB Administrator) wrote:
>>> On 2021-05-28 12:38, Ron wrote:
>>>> On 5/28/21 1:40 PM, Dean Gibson (DB Administrator) wrote:
>>>>> On 2021-05-28 08:12, Adrian Klaver wrote:
>>>>>> On 5/27/21 8:41 PM, Dean Gibson (DB Administrator) wrote:
>>>>>>> I started to use PostgreSQL v7.3 in 2003 on my home Linux
>>>>>>> systems (4 at one point), gradually moving to v9.0 w/
>>>>>>> replication in 2010.  In 2017 I moved my 20GB database to
>>>>>>> AWS/RDS, gradually upgrading to v9.6, & was entirely satisfied
>>>>>>> with the result.
>>>>>>>
>>>>>>> In March of this year, AWS announced that v9.6 was nearing end
>>>>>>> of support, & AWS would forcibly upgrade everyone to v12 on
>>>>>>> January 22, 2022, if users did not perform the upgrade earlier. 
>>>>>>> My first attempt was successful as far as the upgrade itself,
>>>>>>> but complex queries that normally ran in a couple of seconds on
>>>>>>> v9.x, were taking minutes in v12.
>>>>>>
>>>>>> Did you run a plain
>>>>>> ANALYZE(https://www.postgresql.org/docs/12/sql-analyze.html) on
>>>>>> the tables in the new install?
>>>>>
>>>>> After each upgrade (to 10, 11, 12, & 13), I did a "VACUUM FULL
>>>>> ANALYZE".  On 10 through 12, it took about 45 minutes &
>>>>> significant CPU activity, & temporarily doubled the size of the
>>>>> disk space required.  As you know, that disk space is not
>>>>> shrinkable under AWS's RDS.  On v13, it took 10 hours with limited
>>>>> CPU activity, & actually slightly less disk space required.
>>>>
>>>> Under normal conditions, VACUUM FULL is pointless on a
>>>> freshly-loaded database; in RDS, it's *anti-useful*.
>>>>
>>>> That's why Adrian asked if you did a plain ANALYZE.
>>>
>>> Just now did.  No change in EXPLAIN ANALYZE output.
>>
>> Did it run in less than 10 hours?
>>
>
> The original VACUUM FULL ANALYZE ran in 10 hours.  The plain ANALYZE
> ran in 88 seconds.

One possibility is that your data has a distribution that defeats the
ANALYZE sampling strategy.

If that is the case you can force ANALYZE to do a better job by
increasing the default_statistics_target value (100 by default) and
reload the configuration. This will sample more data from your table
which should help the planner find out what the value distribution looks
like for a column and why using an index for conditions involving it is
a better solution.
The last time I had to use this setting to solve this kind of problem I
ended with :

default_statistics_target = 500

But obviously the value suited to your case could be different (I'd
increase it until the planner uses the correct index). Note that
increasing it increases the costs of maintaining statistics (so you
don't want to increase this by several orders of magnitude blindly) but
the default value seems fairly conservative to me.

For reference and more fine-tuned settings using per table statistics
configuration and multi-column statistics for complex situations, see :
- https://www.postgresql.org/docs/13/runtime-config-query.html
- https://www.postgresql.org/docs/13/planner-stats.html

-- 
Lionel Bouton
gérant de JTEK SARL
https://www.linkedin.com/in/lionelbouton/