On Sun, Aug 26, 2012 at 11:14 AM, Alex Schuster <wo...@wonkology.org> wrote: > Whatever. Then align to 8K instead. But what does this have to do with the > erasable page size?
Short answer: Any page written to a block already containing data, the whole block must be erased. This is the "erase block size" people talk about. Block size is always divisible by page size. So if you align to the erase block size, you will always be okay. Long answer: NAND flash cells do not operate like a normal HDD storage, they can only be written to when they are empty. Empty meaning null, devoid of data, unallocated, not just "filled with zeroes" or "ignored by filesystem". So, any time you want to write to a block that already contains data, it must be erased and re-written by the drive controller. On most current-generation SSD the block size is 512k and contains 128 pages (4k each page). In older/slower SSD, or other kind of flash devices like CompactFlash or SD cards, often the erase block is larger, usually 4M or sometimes even up to 16M. (Definitely check the specs for your specific model of SSD to find the correct values.) SSD can write at page-size chunks of data, which is very fast, but only in an empty block. So if the block has data, that data must be relocated or erased and rewritten. TRIM feature tells the SSD that these pages are not used anymore, and allows it to do better garbage collection and combine pages/deallocate those unused blocks. Next time you write to one of those blocks, it will be very fast because erase already happened at TRIM time and these unused blocks are available for writing. This is why SSD without TRIM feature become slower once they have filled up. The drive controller has no knowledge of your filesystem, erase overhead is added to every write once the internal NAND free space is used up. So instead of writing a 4k page now it's potentially erasing 512k data then writing 512k data. 256 times more data touched for the same 4k write! (For a case where you have no TRIM support the only possible way to improve performance once a full drive worth of data has been written is to backup, perform ATA Secure Erase, which will clear the SSD allocation metadata, then restore your backup.) Now imagine if the alignment is not correct for both page size and erase block size, then when you write data it could overlap, causing two blocks to be erased and written instead of only one. In the example from the previous paragraph you can see now how the performance degrades even worse, as well as causing extra erases and writes which will potentially reduce the lifetime of your drive. Additional complexity is added by any further layers, filesystem block size, filesystem alignment (I'm looking at you, FAT32), LVM, RAID stripe size, etc... A good article giving more information about the subject is in the English version of Wikipedia: https://en.wikipedia.org/wiki/Write_amplification (disclaimer: all above info is AFAIK, please correct me if I got any facts or advice wrong)