On 03/07/2014 04:13 AM, David Bierce wrote:
Ello —

I’ve been watching with great eagerness at the design and features of ceph 
especially compared to the current distributed file systems I use.  One of the 
pains with VM work loads is when writes stall for more than a few seconds, 
virtual machines that think they are communicating with a real live block 
device generally error out their file systems, in the case of ext? they remount 
as read only, with file and operating systems the behaviors for that scenario 
is…erratic at best.

It looks like the default write timeout for an OSD is 30 seconds.  With the 
write consistency behavior that ceph has, does than mean a write could be 
stalled by the client for up to 30 seconds in the event of an OSD failing to 
write, for whatever reason?  If that is the case, is there a way around such a 
long timeout in block device terms short of 1 second checks?

What timeout are you looking at? Since by default librados/librbd block for ever, so there shouldn't be a timeout.

I've had multiple VMs hang for hours at a time when I broke a Ceph cluster and after fixing it the VMs would start working again.

They only reported some "task blocked for more then 120 seconds" messages in their dmesg, but that's all.

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to