Hello! The INCOMING_MAX parameter change (in source code) solved the problem.
http://blog.wpkg.org/2007/09/09/solving-reliability-and-scalability-problems- with-iscsi/ I think it would be very helpful if this parameter could be changed with a global variable at iscsitarget daemon start or just create an error log entry if the limit reached, maybe a simple warning at init script if more than 32 active sessions, that it's possible fail. Regards, blackluck On 2012. September 14. 23:48:27 Laszlo Fekete wrote: > On 2012. September 15. 01:06:27 Ritesh Raj Sarraf wrote: > > On Friday 14 September 2012 09:07 PM, Laszlo Fekete wrote: > > >> Is there an error message/code ? > > > > > > This is in the initiator logs: > > > Sep 13 14:40:09 mail01 iscsid: Kernel reported iSCSI connection 4:0 > > > error > > > (1020) state (3) > > > Sep 13 14:40:20 mail01 iscsid: connection4:0 is operational after > > > recovery > > > (2 attempts) > > > > So the connection did recover. > > Yes, it recovers because of 1-5 another iscsi target restart after the first > failed restart just the initiator don't see any change if the target > restart failed. > The connection recovers only after a sucessful restart but not all restart > sucessful if there is more than 32 sessions try to recover in a short time. > > > >> Why do you change it to 1 ? That's a very low value and will just flood > > >> the target. > > > > > > As I said, using multipath, so want a fast response if there is a > > > connection/session error to change to the other path. That's why I'm > > > using > > > > The multipath path checker loop triggers every 5 seconds. > > > > > these values: > > > node.session.timeo.replacement_timeout = 5 > > > node.session.err_timeo.abort_timeout = 5 > > > node.session.err_timeo.lu_reset_timeout = 5 > > > node.session.err_timeo.host_reset_timeout = 60 > > > node.session.iscsi.FastAbort = Yes > > > node.session.iscsi.InitialR2T = No > > > node.session.iscsi.ImmediateData = Yes > > > node.session.iscsi.FirstBurstLength = 262144 > > > node.session.iscsi.MaxBurstLength = 16776192 > > > node.conn[0].timeo.logout_timeout = 5 > > > node.conn[0].timeo.login_timeout = 5 > > > node.conn[0].timeo.auth_timeout = 45 > > > node.conn[0].timeo.noop_out_interval = 1 > > > node.conn[0].timeo.noop_out_timeout = 1 > > > > > > But as I said, this also affected to that initiators which don't use > > > multipath and had the default open-iscsi values. > > > > > > > > > There is an INCOMING_MAX 32 limit in the source, that wrote few minutes > > > before your last mail, hope you got that, I think that will be the > > > problem and will check it next week. > > > > Okay!! Let me know what your findings are. From what you have shared up > > till now, I don't see much a problem with IET or open-iscsi. > > The problem is if there are more than 32 active connections when restart > iscsi target it may fail and don't see any error in the logs, just the > initiators try to reconnect. > > You can tell to raise the timeouts, but that's still like lottery. If I have > 80 sessions when restarting the target and 35 of them try to reconnect in > the same time it will also fail and there is nothing error message. > > > I hope increasing the default INCOMING_MAX 32 setting in the source code > will solve the problem. (Next week I'm going to test this.) > > If you say this isn't a bug, that's fine because this is a limit in the > source code (if really it's the problem) and can't be configured > dinamically. But this wasn't clear for me and spent 4 days with debugging > to suspect only that maybe there is a 32 limit somewhere. > > So maybe a warning message would be helpful about that in the init script if > there are more than 32 active sessions or create an error log entry that > reached the incoming_max limit. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org