In the debug info I see 1000's of the following events: FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149225: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149226: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149227: abort_task_offline called for LPORT: lport abort timed out emlxs1:0149228: port state change from 11 to 11 FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149228: abort_task_offline called for LPORT: lport abort timed out :0149228: fct_port_shutdown: port-ffffff1157ff1278, fct_process_logo: unable to clean up I/O. iport-ffffff1157ff1378, icmd-ffffff1195463110 FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed out FROM STMF:0149229: abort_task_offline called for LPORT: lport abort timed out
And then the following as the port recovers. emlxs1:0150128: port state change from 11 to 11 emlxs1:0150128: port state change from 11 to 0 emlxs1:0150128: port state change from 0 to 11 emlxs1:0150128: port state change from 11 to 0 :0150850: fct_port_initialize: port-ffffff1157ff1278, emlxs initialize emlxs1:0150950: port state change from 0 to e emlxs1:0150953: Posting sol ELS 3 (PLOGI) rp_id=fffffd lp_id=22000 emlxs1:0150953: Processing sol ELS 3 (PLOGI) rp_id=fffffd emlxs1:0150953: Sol ELS 3 (PLOGI) completed with status 0, did/fffffd emlxs1:0150953: Posting sol ELS 62 (SCR) rp_id=fffffd lp_id=22000 emlxs1:0150953: Processing sol ELS 62 (SCR) rp_id=fffffd emlxs1:0150953: Sol ELS 62 (SCR) completed with status 0, did/fffffd emlxs1:0151053: Posting sol ELS 3 (PLOGI) rp_id=fffffc lp_id=22000 emlxs1:0151053: Processing sol ELS 3 (PLOGI) rp_id=fffffc emlxs1:0151053: Sol ELS 3 (PLOGI) completed with status 0, did/fffffc emlxs1:0151054: Posting unsol ELS 3 (PLOGI) rp_id=fffc02 lp_id=22000 emlxs1:0151054: Processing unsol ELS 3 (PLOGI) rp_id=fffc02 emlxs1:0151054: Posting unsol ELS 20 (PRLI) rp_id=fffc02 lp_id=22000 emlxs1:0151054: Processing unsol ELS 20 (PRLI) rp_id=fffc02 emlxs1:0151055: Posting unsol ELS 5 (LOGO) rp_id=fffc02 lp_id=22000 emlxs1:0151055: Processing unsol ELS 5 (LOGO) rp_id=fffc02 emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000 emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21500 emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000 emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21500 emlxs1:0151146: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000 emlxs1:0151146: Processing unsol ELS 3 (PLOGI) rp_id=21600 emlxs1:0151146: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000 emlxs1:0151146: Processing unsol ELS 20 (PRLI) rp_id=21600 emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000 emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21500 emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000 emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21500 emlxs1:0151338: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000 emlxs1:0151338: Processing unsol ELS 3 (PLOGI) rp_id=21600 emlxs1:0151338: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000 emlxs1:0151338: Processing unsol ELS 20 (PRLI) rp_id=21600 emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21500 lp_id=22000 emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21500 emlxs1:0151428: port state change from e to 4 emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21500 lp_id=22000 emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21500 emlxs1:0151428: Posting unsol ELS 3 (PLOGI) rp_id=21600 lp_id=22000 emlxs1:0151428: Processing unsol ELS 3 (PLOGI) rp_id=21600 emlxs1:0151428: Posting unsol ELS 20 (PRLI) rp_id=21600 lp_id=22000 emlxs1:0151428: Processing unsol ELS 20 (PRLI) rp_id=21600 To be honest it does not really tell me much since I do not understand comstar to these depths. It would appear that the link fails so either driver problem or hardware issue? I will replace the LPe11002 with a brand new unopened one and then give up on FC on OI. On Fri, Jun 7, 2013 at 4:54 PM, Heinrich van Riel < [email protected]> wrote: > I did find this in my inbox from 2009, I have been using FC with ZFS for > quite sometime and only recently retired an install with OI a5 that was > upgraded from opensolaris. It did not do real heavy duty stuff, but I had a > similar problem where we were stuck on build 99 for quite some time. > > To Jean-Yves Chevallier@emulex > Any comments on the future of Emulex with regards to the COMSTAR project? > It seems I am not the only one that have problems using Emulex in later > builds. For now I am stuck with build 99. > As always any feedback would be greatly appreciated since we have to make > a decision of sticking with Opensolaris & COMSTAR or start migrating to > another solution since we cannot stay on build 99 forever. > What I am really trying to find out is if there is a roadmap/decision to > ultimately only support Qlogic HBA’s in target mode. > > Response: > > > Sorry for the delay in answering you. I do have news for you. > First off, the interface used by COMSTAR has changed in recent Nevada > releases (NV120 and up I believe). Since it is not a public interface we > had no prior indication on this. > We know of a number of issues, some on our driver, some on the COMSTAR > stack. Based on the information we have from you and other community > members, we have addressed all these issues in our next driver version – we > will know for sure after we run our DVT (driver verification testing) next > week. Depending on progress, this driver will be part of NV 128 or else NV > 130. > I believe it is worth taking another look based on these upcoming builds, > which I imagine might also include fixes to the rest of the COMSTAR stack. > > Best regards. > > > I can confirm that this was fixed in 128 and all I did was update from 99 > to 128 and there were no problems. > Seem like the same problem has now returned and emulex does not appear to > be a good fit since sun mostly used qlogic. > > guess it is back to iscsi only for now. > > > > On Fri, Jun 7, 2013 at 4:40 PM, Heinrich van Riel < > [email protected]> wrote: > >> I changed the settings. I do see it writing all the time now, but the >> link still dies after a a few min >> >> Jun 7 16:30:57 emlxs: [ID 349649 kern.info] [ 5.0608]emlxs1: NOTICE: >> 730: Link reset. (Disabling link...) >> Jun 7 16:30:57 emlxs: [ID 349649 kern.info] [ 5.0333]emlxs1: NOTICE: >> 710: Link down. >> Jun 7 16:33:16 emlxs: [ID 349649 kern.info] [ 5.055D]emlxs1: NOTICE: >> 720: Link up. (4Gb, fabric, target) >> Jun 7 16:33:16 fct: [ID 132490 kern.notice] NOTICE: emlxs1 LINK UP, >> portid 22000, topology Fabric Pt-to-Pt,speed 4G >> >> >> >> >> On Fri, Jun 7, 2013 at 3:06 PM, Jim Klimov <[email protected]> wrote: >> >>> Comment below >>> >>> >>> On 2013-06-07 20:42, Heinrich van Riel wrote: >>> >>>> One sec apart cloning 150GB vm from a datastore on EMC to OI. >>>> >>>> alloc free read write read write >>>> ----- ----- ----- ----- ----- ----- >>>> 309G 54.2T 81 48 452K 1.34M >>>> 309G 54.2T 0 8.17K 0 258M >>>> 310G 54.2T 0 16.3K 0 510M >>>> 310G 54.2T 0 0 0 0 >>>> 310G 54.2T 0 0 0 0 >>>> 310G 54.2T 0 0 0 0 >>>> 310G 54.2T 0 10.1K 0 320M >>>> 311G 54.2T 0 26.1K 0 820M >>>> 311G 54.2T 0 0 0 0 >>>> 311G 54.2T 0 0 0 0 >>>> 311G 54.2T 0 0 0 0 >>>> 311G 54.2T 0 10.6K 0 333M >>>> 313G 54.2T 0 27.4K 0 860M >>>> 313G 54.2T 0 0 0 0 >>>> 313G 54.2T 0 0 0 0 >>>> 313G 54.2T 0 0 0 0 >>>> 313G 54.2T 0 9.69K 0 305M >>>> 314G 54.2T 0 10.8K 0 337M >>>> >>> ... >>> Were it not for your complaints about link resets and "unusable" >>> connections, I'd say this looks like a normal behavior for async >>> writes: they get cached up, and every 5 sec you have a transaction >>> group (TXG) sync which flushes the writes from cache to disks. >>> >>> In fact, the picture still looks like that, and possibly is the >>> reason for hiccups. >>> >>> The TXG sync may be an IO intensive process, which may block or >>> delay many other system tasks; previously when the interval >>> defaulted to 30 sec we got unusable SSH connections and temporarily >>> "hung" disk requests on the storage server every half a minute when >>> it was really busy (i.e. initial filling up with data from older >>> boxes). It cached up about 10 seconds worth of writes, then spewed >>> them out and could do nothing else. I don't think I ever saw network >>> connections timing out or NICs reporting resets due to this, but I >>> wouldn't be surprised if this were the cause for your case, though >>> (i.e. disk IO threads preempting HBA/NIC threads for too long somehow, >>> making the driver very puzzled about staleness state of its card). >>> >>> At the very least, TXG syncs can be tuned by two knobs: the time >>> limit (5 sec default) and the size limit (when the cache is "this" >>> full, begin the sync to disk). The latter is a realistic figure that >>> can allow you to sync in shorter bursts - with less interruptions >>> to smooth IO and process work. >>> >>> A somewhat related tunable is the number of requests that ZFS would >>> queue up for a disk. Depending on its NCQ/TCQ abilities and random >>> IO abilities (HDD vs. SSD), long or short queues may be preferable. >>> See also: http://www.solarisinternals.**com/wiki/index.php/ZFS_Evil_** >>> Tuning_Guide#Device_I.2FO_**Queue_Size_.28I.2FO_**Concurrency.29<http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29> >>> >>> These tunables can be set at runtime with "mdb -K", as well as in >>> the /etc/system file to survive reboots. One of our storage boxes >>> has these example values in /etc/system: >>> >>> *# default: flush txg every 5sec (may be max 30sec, optimize >>> *# for 5 sec writing) >>> set zfs:zfs_txg_synctime = 5 >>> >>> *# Spool to disk when the ZFS cache is 0x18000000 (384Mb) full >>> set zfs:zfs_write_limit_override = 0x18000000 >>> *# ...for realtime changes use mdb. >>> *# Example sets 0x18000000 (384Mb, 402653184 b): >>> *# echo zfs_write_limit_override/**W0t402653184 | mdb -kw >>> >>> *# ZFS queue depth per disk >>> set zfs:zfs_vdev_max_pending = 3 >>> >>> HTH, >>> //Jim Klimov >>> >>> >>> ______________________________**_________________ >>> OpenIndiana-discuss mailing list >>> OpenIndiana-discuss@**openindiana.org<[email protected]> >>> http://openindiana.org/**mailman/listinfo/openindiana-**discuss<http://openindiana.org/mailman/listinfo/openindiana-discuss> >>> >> >> > _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss
