Hello,
I'm trying to understand a behavior, and I am hoping to further my
understanding of what Fio is doing. In the specific case in question,
I'm seeing a seven minute wait in resuming IO after a failover. In other
variations on this job file, the seven minute wait disappears, and it
drops back down to the 40 second wait that I see with usual IO
loads that are run.
The setup:
- I have one Windows 2012 R2 host, with two NICs.
- I have one storage array, with 2 controllers A and B, with 2 10GbE ports
for each side, making 4 ports, with failover capability between the two
sides.
- I have iSCSI and MPIO setup so that there is one login from each NIC to
each side, so four sessions total for each volume. The map looks
something like this:
nic1 nic2
/ \ / \
/ \ / \
side A side B side A side B
port 0 port 0 port 1 port 1
- I have the fio job below. It is basically 256k blocks, 1 iodepth, one
worker with 48 drives.
[global]
do_verify=0
ioengine=windowsaio
numjobs=1
iodepth=1
offset=0
direct=1
thread
[fio-0]
blocksize=256k
readwrite=rw
filename=\\.\PHYSICALDRIVE19
filename=\\.\PHYSICALDRIVE20
<snipped out the other 44 drives>
filename=\\.\PHYSICALDRIVE13
filename=\\.\PHYSICALDRIVE14
size=100%
If I alter the job in any of the following ways, IO keeps going after the
failover period which is about 40 seconds. To summarize:
Doesn't work:
- multiple disks, single job, 1 iodepth
Works:
- Single disk, one job, 1 iodepth
- multiple disks, one job with all disks, same iodepth as # of disks (e.g.
if there's 48 disks, iodepth is set to 48)
- multiple disks, one job per disk, 1 iodepth
Would anyone have any idea why that one arrangement causes a
significant delay before IO is resumed?
Thanks in advance,
Todd
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html