Hello Laszlo, Glad that that solved the problem. I have CCed the upstream devs and the list. If it gets committed upstream, I can consider pulling it.
On Tuesday 18 September 2012 03:18 PM, Laszlo Fekete wrote: > Hello! > > The INCOMING_MAX parameter change (in source code) solved the problem. > > http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems- > with-iscsi/ > > I think it would be very helpful if this parameter could be changed with a > global variable at iscsitarget daemon start or just create an error log entry > if the limit reached, maybe a simple warning at init script if more than 32 > active sessions, that it's possible fail. > > Regards, blackluck > > On 2012. September 14. 23:48:27 Laszlo Fekete wrote: >> On 2012. September 15. 01:06:27 Ritesh Raj Sarraf wrote: >>> On Friday 14 September 2012 09:07 PM, Laszlo Fekete wrote: >>>>> Is there an error message/code ? >>>> >>>> This is in the initiator logs: >>>> Sep 13 14:40:09 mail01 iscsid: Kernel reported iSCSI connection 4:0 >>>> error >>>> (1020) state (3) >>>> Sep 13 14:40:20 mail01 iscsid: connection4:0 is operational after >>>> recovery >>>> (2 attempts) >>> >>> So the connection did recover. >> >> Yes, it recovers because of 1-5 another iscsi target restart after the first >> failed restart just the initiator don't see any change if the target >> restart failed. >> The connection recovers only after a sucessful restart but not all restart >> sucessful if there is more than 32 sessions try to recover in a short time. >> >>>>> Why do you change it to 1 ? That's a very low value and will just flood >>>>> the target. >>>> >>>> As I said, using multipath, so want a fast response if there is a >>>> connection/session error to change to the other path. That's why I'm >>>> using >>> >>> The multipath path checker loop triggers every 5 seconds. >>> >>>> these values: >>>> node.session.timeo.replacement_timeout = 5 >>>> node.session.err_timeo.abort_timeout = 5 >>>> node.session.err_timeo.lu_reset_timeout = 5 >>>> node.session.err_timeo.host_reset_timeout = 60 >>>> node.session.iscsi.FastAbort = Yes >>>> node.session.iscsi.InitialR2T = No >>>> node.session.iscsi.ImmediateData = Yes >>>> node.session.iscsi.FirstBurstLength = 262144 >>>> node.session.iscsi.MaxBurstLength = 16776192 >>>> node.conn[0].timeo.logout_timeout = 5 >>>> node.conn[0].timeo.login_timeout = 5 >>>> node.conn[0].timeo.auth_timeout = 45 >>>> node.conn[0].timeo.noop_out_interval = 1 >>>> node.conn[0].timeo.noop_out_timeout = 1 >>>> >>>> But as I said, this also affected to that initiators which don't use >>>> multipath and had the default open-iscsi values. >>>> >>>> >>>> There is an INCOMING_MAX 32 limit in the source, that wrote few minutes >>>> before your last mail, hope you got that, I think that will be the >>>> problem and will check it next week. >>> >>> Okay!! Let me know what your findings are. From what you have shared up >>> till now, I don't see much a problem with IET or open-iscsi. >> >> The problem is if there are more than 32 active connections when restart >> iscsi target it may fail and don't see any error in the logs, just the >> initiators try to reconnect. >> >> You can tell to raise the timeouts, but that's still like lottery. If I have >> 80 sessions when restarting the target and 35 of them try to reconnect in >> the same time it will also fail and there is nothing error message. >> >> >> I hope increasing the default INCOMING_MAX 32 setting in the source code >> will solve the problem. (Next week I'm going to test this.) >> >> If you say this isn't a bug, that's fine because this is a limit in the >> source code (if really it's the problem) and can't be configured >> dinamically. But this wasn't clear for me and spent 4 days with debugging >> to suspect only that maybe there is a 32 limit somewhere. >> >> So maybe a warning message would be helpful about that in the init script if >> there are more than 32 active sessions or create an error log entry that >> reached the incoming_max limit. -- Ritesh Raj Sarraf | http://people.debian.org/~rrs Debian - The Universal Operating System
signature.asc
Description: OpenPGP digital signature