On Wed, Jun 28, 2017 at 9:17 AM Peter Maloney < [email protected]> wrote:
> On 06/28/17 16:52, [email protected] wrote: > > We were using HP Helion 2.1.5 ( OpenStack + Ceph ) > > The OpenStack version is *Kilo* and Ceph version is *firefly* > > > > The way we backup VMs is create a snapshot by Ceph commands (rbd snapshot) > then download (rbd export) it. > > > > We found a very high Disk Read / Write latency during creating / deleting > snapshots, it will higher than 10000 ms. > > > > Even not during backup jobs, we often see a more than 4000 ms latency > occurred. > > > > Users start to complain. > > Could you please help us to how to start the troubleshooting? > > > > For creating snaps and keeping them, this was marked wontfix > http://tracker.ceph.com/issues/10823 > > For deleting, see the recent "Snapshot removed, cluster thrashed" thread > for some config to try. > Given he says he's seeing 4 second IOs even without snapshot involvement, I think Keynes must be seeing something else in his cluster. > > And I find this to be a very severe problem. And you haven't even seen the > worst... also make more and it gets slower and slower to do many things > (resize, clone, snap revert, etc.) (but a fully flattened image seen by a > client seems as fast as normal usually). > > Let's pool some money together as a reward for making snapshots work > properly/modern, like on ZFS and btrfs where they don't have to copy so > much....they "redirect on write" rather than literally "copy on write". > (what would be a good way to pool money like that?). If others are > interested, I surely am, but would have to ask the boss about money. Even > if it's only for bluestore, so only for future releases, that's ok with me. > And if it keeps the copy on the same osd/fs as the original, that is > acceptable too. > > > > https://storageswiss.com/2016/04/01/snapshot-101-copy-on-write-vs-redirect-on-write/ > > Consider a *copy-on-write* system, which *copies* any blocks before they > are overwritten with new information (i.e. it copies on writes). In other > words, if a block in a protected entity is to be modified, the system will > copy that block to a separate snapshot area before it is overwritten with > the new information. This approach requires three I/O operations for each > write: one read and two writes. [...] This decision process for each block > also comes with some computational overhead. > > > A *redirect-on-write* system uses pointers to represent all protected > entities. If a block needs modification, the storage system merely > *redirects* the pointer for that block to another block and writes the > data there. [...] There is zero computational overhead of reading a > snapshot in a redirect-on-write system. > > > The redirect-on-write system uses 1/3 the number of I/O operations when > modifying a protected block, and it uses no extra computational overhead > reading a snapshot. Copy-on-write systems can therefore have a big impact > on the performance of the protected entity. The more snapshots are created > and the longer they are stored, the greater the impact to performance on > the protected entity. > > I wouldn't consider that a very realistic depiction of the tradeoffs involved in different snapshotting strategies[1], but BlueStore uses "redirect-on-write" under the formulation presented in those quotes. RBD clones of protected images will remain copy-on-write forever, I imagine. -Greg [1]: There's no reason to expect a copy-on-write system will first copy the original data and then overwrite it with the new data when it can simply inject the new data along the way. *Some* systems will copy the "old" block into a new location and then overwrite in the existing location (it helps prevent fragmentation), but many don't. And a "redirect-on-write" system needs to persist all those block metadata pointers, which may be much cheaper or much, much more expensive than just duplicating the blocks. > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
