Public bug reported:

On a customer deployment, on focal-ussuri, with iSCSI backends and multipath 
enabled we face an issue where iscsiadm will fail to mount one the path of an 
iSCSI volume with the following error :
"iscsiadm: Could not make /etc/iscsi/nodes: File exists\niscsiadm: Error while 
adding record: encountered iSCSI database failure"
(see nova-compute.log for more details)

In term of impact more exactly, all the servers mounting an iSCSI volume
for the first time will fail "silently", as the end-user won't be aware,
to mount the first path of the iSCSI target.

I noticed the iscsi database error happens solely on the first iSCSI volume to 
be mounted on each involved server in the deployment like units running 
cinder-volume and nova-compute services.
After some investigations, this appears to be a race coundition with os-brick 
and iscsid daemon.
After a deployment or a reboot, iscsid isn't started and os-brick tries too 
quickly to mount the first path of the iSCSI volume before iscsid has finished 
to initialise thus leading to the error we see in cinder-volume or nova-compute 
logs.
If iscsid is manually started on the server before the error just simply 
disappears and the target paths are all mounted properly on the first volume.


Here is an example of the processes running on a nova-compute :
# before an instance creation with the first iSCSI volume
ubuntu@nova-compute:~$ ps aux | grep iscsi
ubuntu   3705821  0.0  0.0   6304  2624 pts/1    S+   14:18   0:00 grep 
--color=auto iscsi
# after the instance creation
ubuntu@nova-compute:~$ ps aux | grep iscsi
root     3707866  0.0  0.0   5108   248 ?        Ss   14:21   0:00 /sbin/iscsid
root     3707867  0.0  0.0   5964  5816 ?        S<Ls 14:21   0:00 /sbin/iscsid
root     3707869  0.0  0.0      0     0 ?        I<   14:21   0:00 [iscsi_eh]
root     3707878  0.0  0.0      0     0 ?        I<   14:21   0:00 [iscsi_q_1]
ubuntu   3708321  0.0  0.0   6436  2524 pts/1    S+   14:21   0:00 grep 
--color=auto iscsi

To avoid the first issue of iscsiadm encountering the database error,
the current workaround I found for now is simply to start and enable
iscsid on every cinder-volume and nova-compute units before mounting any
iSCSI volume.

Looking more in depth, this issue is also mentionned in this ticket on
os-brick #1944474 , where they have implemented a retry mecanism to try
again to mount the path if iscsiadm returns the database failure error
code (6).

Would it be possible either to backport the fix from #1944474 to the
package and/or to see if it's feasible to start iscsid beforehand
through a charm configuration ?

** Affects: python-os-brick (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "nova-compute.log"
   
https://bugs.launchpad.net/bugs/1969087/+attachment/5580635/+files/nova-compute.log

** Summary changed:

- os-brick failing to mount iSCSI path
+ failing to mount iSCSI path with first volume

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1969087

Title:
  failing to mount iSCSI path with first volume

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-os-brick/+bug/1969087/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to