On 06.04.22 18:34, [email protected] wrote:
Hi Stefan!
Thank you so much for your answer!!. I do answer below in green bold
for instance... for a better distinction....
Very thankful for all your comments Stefan!!! :) :) :)
Cheers!!
El 2022-04-06 17:43, Stefan Esser escribió:
ATENCION
ATENCION
ATENCION!!! Este correo se ha enviado desde fuera de la organizacion.
No pinche en los enlaces ni abra los adjuntos a no ser que reconozca
el remitente y sepa que el contenido es seguro.
Am 06.04.22 um 16:36 schrieb [email protected]:
Hi Rainer!
Thank you so much for your help :) :)
Well I assume they are in a datacenter and should not be a power
outage....
About dataset size... yes... our ones are big... they can be 3-4 TB
easily each
dataset.....
We bought them, because as they are for mailboxes and mailboxes grow and
grow.... for having space for hosting them...
Which mailbox format (e.g. mbox, maildir, ...) do you use?
*I'm running Cyrus imap so sort of Maildir... too many little files
normally..... Sometimes directories with tons of little files....*
We knew they had some speed issues, but those speed issues, we
thought (as
Samsung explains in the QVO site) they started after exceeding the
speeding
buffer this disks have. We though that meanwhile you didn't exceed it's
capacity (the capacity of the speeding buffer) no speed problem
arises. Perhaps
we were wrong?.
These drives are meant for small loads in a typical PC use case,
i.e. some installations of software in the few GB range, else only
files of a few MB being written, perhaps an import of media files
that range from tens to a few hundred MB at a time, but less often
than once a day.
*We move, you know... lots of little files... and lot's of different
concurrent modifications by 1500-2000 concurrent imap connections we
have...*
As the SSD fills, the space available for the single level write
cache gets smaller
*The single level write cache is the cache these ssd drivers have,
for compensating the speed issues they have due to using qlc memory?.
Do you refer to that?. Sorry I don't understand well this paragraph.*
A single flash cell can be thought of as a software adjustable resistor
as part of a voltage divider with a fixed resistor. Storing just a
single bit per flash cell allows very fast writes and long lifetimes for
each flash cell at the cost of low data density. You cheaped out and
bough the crappiest type of consumer SSDs. These SSDs are optimized for
one thing: price per capacity (at reasonable read performance). They
accomplish this by exploiting the expected user behavior of modifying
only small subsets of the stored data in short bursts and buying (a lot
more capacity) than they use. You deployed them in a mail server facing
at least continuous writes for hours on end most days of the week. As
average load increases and the cheap SSDs fill up less and less
unallocated flash can be used to cache and the fast SLC cache fills up.
The SSD firmware now has to stop accepting new requests from the SATA
port and because only ~30 operations can be queued per SATA disk and the
ordering requirements between those operations not even reads can be
satisfied while the cache gets slowly written out storing four bits per
flash cell instead of one. To the user this appears as the system almost
hanging because every uncached read and sync write takes tens to 100s of
milliseconds instead of less than 3ms. No amount of file system or
driver tuning can truly fix this design flaw/compromise without severely
limiting the write throughput in software to stay below the sustained
drain rate of the SLC cache. If you want to invest time, pain and
suffering to squish the most out of this hardware look into the ~2015
CAM I/O scheduler work Netflix upstreamed back to FreeBSD. Enabling this
requires at least building and installing your own kernel with this
feature enabled, setting acceptable latency targets and defining the
read/write mix the scheduler should maintain.
I don't expect you'll get satisfactory results out of those disks even
with lots of experimentation. If you want to experiment with I/O
scheduling on cheap SSDs start by *migrating all production workloads*
out of your lab environment. The only safe and quick way out of this
mess is for you to replace all QVO SSDs with at least as large SSDs
designed for sustained write workloads.