Re: Where to store Blobs?

2019-03-13 Thread Jamesie Pic
Make dump/restore of database data unnecessarily expensive in terms of time
and space imho.


Re: Where to store Blobs?

2019-04-19 Thread Jamesie Pic
Storing files in the database has already been demonstrated as not an
efficient trade-off [0]. As such, this post provides a subjective story
about files in the database, rather than an objective compare.

So, I have this ridiculous situation where dumping a database with 12k
user-submitted forms with files, made by 3000 users, which I hope we can
agree “is absolutely nothing”, yet, it eats 35G of PG data.

As a result, dumping the database takes already 32 minutes and is extremely
write-intensive, causing more than 5 second disk backlogs during the
operation at the same time.

If I had stored file paths, like I should have, the db would take a few MBs
and the dump a few seconds even less than one second ?

Also, the backup could just read the file tree from the file system and
synchronize only new files: we don’t have programmatic modifications of
written files for legal reasons: which makes it even more ridiculous to
dump them every time we do a backup.

As such, my biggest regret is to have listened to the manager who imposed
this design decision, but I trusted him at the time I was a fresh hire.

Today, we have the choice of two poisons for deployment:

not shutting down the service during the backup, to save 32 minutes of
downtime, but that’s 32 minutes of writes that are at risk in the case of a
faulty destructive operation, yes that doesn’t happen 99% of the time,
shutting down the service during the backup, as we should, but that
means 32 minutes of extra downtime

In my experience, storing files in the database is a critical mistake. I
mean, if you know what you’re doing maybe, and I thought the said manager
would know what he’s doing.

But at the end of a year the manager decided to ignore all monitoring
alarms concerning disk space that were raised during the backup. As such, I
kept going and freeing as much space as possible when a backup failed to
happen, which hopefully canceled the full deployment, leaving the service
online, even though with a full disk.

I have raised the warning to the customer for months and but the manager
kept insisting that we close our eyes on it, and kept on doing palliative
fixes when needed.

Of course, piling up palliatives fixes in production eventually created the
situation where the disk was too full to make a backup. The manager that
had installed the server OS had put 3 disks in a RAID1 array with extremely
poor partitioning.

As such, i had to spend a night repartitioning the RAID1, so that the /
would be on 10G instead of 40G. Which kept us going a bit more, but
unfortunnately one week shorter than expected, because I had forgot to
include the exponential growth in the math of the estimation.

Leading to even more incidents. If you store files in the database, which
you shouldn’t unless you’re 100% sure about what you’re doing, then do not
ignore disk space warnings during backups. Or else … well what do you think
happens when an airplane pilot ignores the alarms on their dashboard ?

99% of incidents are a suite of predictable events.

https://habiletechnologies.com/blog/better-saving-files-database-file-system/
https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database
https://blog.yourlabs.org/post/184290880553/story-of-a-database-with-files-in-it


Re: Where to store Blobs?

2019-04-19 Thread Jamesie Pic
I forgot to mention that my deployments include automated migrations as
often as possible, sometimes destructive for refactoring purpose, as such,
to maintain PostgreSQL on a basic linux box I am:

- for having an automated backup prior in the automated deployment script
that may play destructive migrations,
- against the needless overhead of coupling both binary and relational data
in operations that slows the whole thing down or makes it less reliable

Also got supposedly many new points against, mixed with more detail on the
points briefly exposed in my previous email, going deeper in detail, about
how it fits in the big picture of my personal practice ... and how this has
destabilized my prod for months:
https://blog.yourlabs.org/post/184290880553/storing-hd-photos-in-a-relational-database-recipe

tl;dr
If you store media files in PostgreSQL on a production server, then do take
disk space alarms seriously even if they happen only during backups.
Otherwise I fail to see how to avoid a pattern of recurring incidents,
"manually unblocking automated deployments" (double debt interest cost
because also defeats the purpose of automating deployment), when not
filling up a disk during the nightly backup dump ...

Hope this helps,

Have a great day


Re: Where to store Blobs?

2019-04-22 Thread Jamesie Pic
Thanks for your feedback. In my case, Bob the manager said this would "make
backups easier" xD

The general pro that I see is "transactional". While I can understand that,
it's irrelevant in our case: users upload files through AJAX, that happens
**before** they submit the form. That means, the file gets saved in an HTTP
exchange that happens **before** the actual data insertion transaction.

As such, be careful too that this argument does not be irrelevant in your
case like it is in mine.