On Tue, 09 May 2017, David Guyot wrote: > I recently loaned a server with NVMe SSD and saw, during my research on > the relevance of the discard mount option for them, that its use is > discouraged for NVMe SSDs. Why? Does NVMe SSDs not need trimming? Is it > integrated in the NVMe driver for Linux?
Linux does better with filesystem-level TRIM every so often (how often depends really on your write load and level of overprovisioning) as a rule, anyway. That said: NVMe usually dislikes frequent use of TRIM because it typically will play badly with the many-queue scheduler inside the device: the device will typically have to issue a device-wide write barrier internally, which has to drain (and freeze) every [write?] queue up to the barrier point before the barrier can be cleared. The blocks are then marked as free for future garbage collection, and all queues unfrozen, thus resuming operation. This hurts multi-stream streaming performance quite a lot... Even if it had to freeze just one queue, it would still hurt when compared to an fstrim every hour/day/week/month. Non-NVMe devices have far less command-path paralellism, so the device-wide write barrier should typically hurt less (in relative terms) than it would on a NVMe device. Besides, I/O latency becomes *utterly unpredictable* when online discard is active, which can cause all sort of stuttering on the default I/O scheduler. Latency will get unpredictable during fstrim as well, but you can schedule the fstrim to a time of your choice, instead of every time the filesystem frees a data or metadata block... As for flash wear, on a modern SSD (NVMe or otherwise), to keep it at the bare minimum it should be enough to overprovision it properly and issue an fstrim (on average) after writing about 50%[1] of the size of the overprovisioned area. That might even give you less flash wear than online-discard over time... [1] 50% is just a hunch. You could test that, but please keep in mind that it will be device-firmware dependent. I bet there are a few devices that are going to priorize copying data around to free almost-empty erase blocks for erasure, no matter how many fully-erased blocks are already available... -- Henrique Holschuh