On Wed, Jun 28, 2017 at 9:17 AM Peter Maloney <
[email protected]> wrote:

> On 06/28/17 16:52, [email protected] wrote:
>
> We were using HP Helion 2.1.5 ( OpenStack + Ceph )
>
> The OpenStack version is *Kilo* and Ceph version is *firefly*
>
>
>
> The way we backup VMs is create a snapshot by Ceph commands (rbd snapshot)
> then download (rbd export) it.
>
>
>
> We found a very high Disk Read / Write latency during creating / deleting
> snapshots, it will higher than 10000 ms.
>
>
>
> Even not during backup jobs, we often see a more than 4000 ms latency
> occurred.
>
>
>
> Users start to complain.
>
> Could you please help us to how to start the troubleshooting?
>
>
>
> For creating snaps and keeping them, this was marked wontfix
> http://tracker.ceph.com/issues/10823
>
> For deleting, see the recent "Snapshot removed, cluster thrashed" thread
> for some config to try.
>

Given he says he's seeing 4 second IOs even without snapshot involvement, I
think Keynes must be seeing something else in his cluster.


>
> And I find this to be a very severe problem. And you haven't even seen the
> worst... also make more and it gets slower and slower to do many things
> (resize, clone, snap revert, etc.) (but a fully flattened image seen by a
> client seems as fast as normal usually).
>
> Let's pool some money together as a reward for making snapshots work
> properly/modern, like on ZFS and btrfs where they don't have to copy so
> much....they "redirect on write" rather than literally "copy on write".
> (what would be a good way to pool money like that?). If others are
> interested, I surely am, but would have to ask the boss about money. Even
> if it's only for bluestore, so only for future releases, that's ok with me.
> And if it keeps the copy on the same osd/fs as the original, that is
> acceptable too.
>
>
>
> https://storageswiss.com/2016/04/01/snapshot-101-copy-on-write-vs-redirect-on-write/
>
> Consider a *copy-on-write* system, which *copies* any blocks before they
> are overwritten with new information (i.e. it copies on writes). In other
> words, if a block in a protected entity is to be modified, the system will
> copy that block to a separate snapshot area before it is overwritten with
> the new information. This approach requires three I/O operations for each
> write: one read and two writes. [...] This decision process for each block
> also comes with some computational overhead.
>
>
> A *redirect-on-write* system uses pointers to represent all protected
> entities. If a block needs modification, the storage system merely
> *redirects* the pointer for that block to another block and writes the
> data there. [...] There is zero computational overhead of reading a
> snapshot in a redirect-on-write system.
>
>
> The redirect-on-write system uses 1/3 the number of I/O operations when
> modifying a protected block, and it uses no extra computational overhead
> reading a snapshot. Copy-on-write systems can therefore have a big impact
> on the performance of the protected entity. The more snapshots are created
> and the longer they are stored, the greater the impact to performance on
> the protected entity.
>
>
I wouldn't consider that a very realistic depiction of the tradeoffs
involved in different snapshotting strategies[1], but BlueStore uses
"redirect-on-write" under the formulation presented in those quotes. RBD
clones of protected images will remain copy-on-write forever, I imagine.
-Greg

[1]: There's no reason to expect a copy-on-write system will first copy the
original data and then overwrite it with the new data when it can simply
inject the new data along the way. *Some* systems will copy the "old" block
into a new location and then overwrite in the existing location (it helps
prevent fragmentation), but many don't. And a "redirect-on-write" system
needs to persist all those block metadata pointers, which may be much
cheaper or much, much more expensive than just duplicating the blocks.


>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to