On 2012. September 14. 20:51:39 Ritesh Raj Sarraf wrote: > On Friday 14 September 2012 07:01 PM, Laszlo Fekete wrote: > > In the logs reports iscsi connection error detected and try to recover. > > Is there an error message/code ? This is in the initiator logs: Sep 13 14:40:09 mail01 iscsid: Kernel reported iSCSI connection 4:0 error (1020) state (3) Sep 13 14:40:20 mail01 iscsid: connection4:0 is operational after recovery (2 attempts)
> > >> If the iscsi target restart fail it random which initiator stuck, I think > >> it> > > only depend on who is the faster to be in the first 32 session. > > > >> I am not sure here. The open-iscsi default replacement timeout is 120 > >> secs. > > > > Even then, when the target is back, it will poll it. > > > > You're right, I meant for 5sec default settings this: > > node.conn[0].timeo.noop_out_interval = 5 > > node.conn[0].timeo.noop_out_timeout = 5 > > and with 1 sec also this settings on those connections where using > > multipath. > Why do you change it to 1 ? That's a very low value and will just flood > the target. As I said, using multipath, so want a fast response if there is a connection/session error to change to the other path. That's why I'm using these values: node.session.timeo.replacement_timeout = 5 node.session.err_timeo.abort_timeout = 5 node.session.err_timeo.lu_reset_timeout = 5 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].timeo.logout_timeout = 5 node.conn[0].timeo.login_timeout = 5 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 1 node.conn[0].timeo.noop_out_timeout = 1 But as I said, this also affected to that initiators which don't use multipath and had the default open-iscsi values. There is an INCOMING_MAX 32 limit in the source, that wrote few minutes before your last mail, hope you got that, I think that will be the problem and will check it next week. > > >> Tried to check, maybe this is a network connection, but if the restart > >> fail > > > > and try to telnet to them sometimes it also don't answer (tcpdump show, > > that the target server got the request, but don't send any answer). > > > >> Checked that maybe on the iscsi target stop stucked, but it seems to be > > > > okay, the session closed, the modules unloaded, there isn't any error, > > tried to raise the sleep time before start to 10 sec from 1, but got the > > same error.> > >> What 1 sec setting are you referring here? > > > > Sleep time in init script which is in restart after the stop and before > > the > > start. > > That sleep won't help. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org