> Therefore, except for the narrow case of writing into a block which has > never before been written, every write on a SSD *is* an erase+write > operation.
No, that would lead to terribly poor performance (both in terms of speed and in terms of wear). >> So: you read the whole block, blank it, then re-write all the other >> sectors and your updated sector? No, definitely not, that would be >> terrible. > That is exactly what I've always been told *does* happen, Don't believe everything you hear. > That's an interesting design approach. Given that I've never seen it > mentioned before, I imagine it must be comparatively recent as SSD > controller designs go. Nope. It's more like "step 1". There are rumors that some early cheap SD cards did not perform any wear-leveling, and I'm quite willing to believe them, but I'd be very surprised if those still exist. > I'm also not sure that I'd have chosen to take the trade-off of that > added complexity for the presumed added performance and lack of need to > keep track of block size handling. It's not just that: it's also cheaper. By spreading the writes around the whole flash memory, you can extend the life-expectancy of your drives very significantly, which means you can use cheaper flash cells (e.g. which may die even before reaching than 10k erase-cycle) and still have a device that will last long enough to avoid embarrassment. Yes it costs a bit more on the controller side, but this cost is mostly *design* cost, i.e. a one-time cost which has been paid already more than a decade ago. [ The other downside is the complexity of the code running on the controller, which leads to a higher risk of encountering bugs (which manifest themselves as a dead drive or at lest a total loss of your data), which is why for a while SSDs were not really more reliable than HDDs. ] > So, if this is correct, we have both less understanding and less control > of where and how data is stored on the drives than we think we do. If you want more control, you need to use flash memory cells directly, as happens sometimes in some embedded systems (those won't appear as /dev/sd devices but /dev/mtd). And then you need to use software on your CPU to do the same wear-leveling job (e.g. with UBI or jffs2 or more modern variants of that). FWIW, I liked the UBI design and it would arguably be a good idea for SSDs to expose such an API rather than to try and pretend they're "normal block devices", but that's not the way the industry chose. Stefan