>> Bumping this... >> >> For now, we are rarely suffering with an unlimited cache growth issue >> which can be observed on all post-1.4 versions of qemu with rbd >> backend in a writeback mode and certain pattern of a guest operations. >> The issue is confirmed for virtio and can be re-triggered by issuing >> excessive amount of write requests without completing returned acks >> from a emulator` cache timely. Since most applications behave in a >> right way, the oom issue is very rare (and we developed an ugly >> workaround for such situations long ago). If anybody is interested in >> fixing this, I can send a prepared image for a reproduction or >> instructions to make one, whichever is preferable. >> >> Thanks! > >A gentle bump: for at least rbd backend with writethrough/writeback >cache it is possible to achieve unlimited growth with lot of large >unfinished ops, what can be considered as a DoS. Usually it is >triggered by poorly written applications in the wild, like proprietary >KV databases or MSSQL under Windows, but regular applications, >primarily OSS databases, can trigger the RSS growth for hundreds of >megabytes just easily. There is probably no straight way to limit >in-flight request size by re-chunking it, as supposedly malicious >guest can inflate it up to very high numbers, but it`s fine to crash >such a guest, saving real-world stuff with simple in-flight op count >limiter looks like more achievable option.
Any chance you can provide the reproducer VM image via ceph-post-file [1]? Using the latest Firefly release with QEMU 2.3.1, I was unable to reproduce unlimited growth while hammering the VM with a randwrite fio job with iodepth=256, blocksize=4k. [1] http://ceph.com/docs/master/man/8/ceph-post-file/ -- Jason
