Am 27.06.21 um 10:00 schrieb Joshua M. Clulow via oi-dev:
On Fri, 25 Jun 2021 at 18:52, Gary Mills <[email protected]> wrote:
On Fri, Jun 25, 2021 at 12:24:52PM -0700, Joshua M. Clulow via oi-dev wrote:
It seems like it would be good to figure out, on the systems that _do_
work, what exactly is performing the mount.  Then we can work
backwards to why that is no longer happening.
Good idea.  I have a system running an older BE where the automount
does work.  I did exactly what you suggested.
     <root@ryzen># dtrace -w -n '
     > syscall::*mount*:entry {
     > raise(SIGSTOP);
     > system("pargs %d; ptree %d; prun %d", pid, pid, pid);
     > }'
     dtrace: description '
     syscall::*mount*:entry ' matched 2 probes
     dtrace: allowing destructive actions
     CPU     ID                    FUNCTION:NAME
      10   8968                    umount2:entry 3951:       
/usr/lib/hal/hald-addon-storage
     argv[0]: /usr/lib/hal/hald-addon-storage
     1994   /usr/lib/hal/hald --daemon=yes
       1995   hald-runner
         3951   /usr/lib/hal/hald-addon-storage

      11   8532                      mount:entry 3955:       mount -o nosuid 
/dev/dsk/c4t0d0p0:1 /media/STORE N GO
     argv[0]: pcfs_mount
     argv[1]: -o
     argv[2]: nosuid
     argv[3]: /dev/dsk/c4t0d0p0:1
     argv[4]: /media/STORE N GO
     1994   /usr/lib/hal/hald --daemon=yes
       1995   hald-runner
         3954   /usr/lib/hal/hal-storage-mount
           3955   mount -o nosuid /dev/dsk/c4t0d0p0:1 /media/STORE N GO

       2   8532                      mount:entry 3951:       
/usr/lib/hal/hald-addon-storage
     argv[0]: /usr/lib/hal/hald-addon-storage
     1994   /usr/lib/hal/hald --daemon=yes
       1995   hald-runner
         3951   /usr/lib/hal/hald-addon-storage
Thanks for that!

OK, so I have looked into this a little bit.  It seems like there is a
bug in the HAL code we ship, or in the glib that OI is now using, or
somewhere inbetween.

With DTrace, I am able to see (in the "hald --daemon=yes" process at
the top of the HAL process tree) that it receives the appropriate
sysevents from the kernel when a USB disk is plugged in or removed.
We get as far as the sysevent_dev_handler() routine:

     
https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/hal/hald/solaris/sysevent.c#L157-L191

In particular, on my system, I see three write(2) calls that look like this:

    EC_devfs ESC_devfs_devi_add /pci@0,0/pci8086,2064@14/storage@2

    EC_devfs ESC_devfs_devi_add /pci@0,0/pci8086,2064@14/storage@2/disk@0,0

    EC_dev_add disk /pci@0,0/pci8086,2064@14/storage@2/disk@0,0
/dev/rdsk/c4t0d0 0

This seems about right.  These writes are into a self-pipe (i.e., both
ends of the pipe are held open within this single hald process) that
is established in the sysevent_init() routine, and stored in the
"sysevent_pipe_fds" global where I was able to confirm with pfiles
that the pipe is still open.

Where things appear to fall down is once we get into the glib area.
The strings that are written into one end of the pipe by the sysevent
consumer, as described above, are meant to then be read through a glib
GIOChannel object in sysevent_iochannel_data():

     
https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/hal/hald/solaris/sysevent.c#L244-L272

Though we do enter sysevent_iochannel_data() on schedule for each
sysevent, it seems like the call to g_io_channel_read_line() always
returns a value of 3 on my system -- which seems like an EOF?  Because
the value is not G_IO_STATUS_NORMAL, we don't even try to call
sscanf() to parse the lines we wrote above.  This means we never call
into sysevent_dev_add() and thus we never pass the hotplug event on to
the rest of HAL.

I have run out of steam on this for now, but I hope this is enough for
someone to keep digging.  In particular, it seems like it is worth
investigating whether glib has been updated in OI at some point
between when this was last working and now.  Perhaps there is a build
issue or a bug there.  It doesn't seem like there has been a lot of
change in the HAL daemon itself (which is in the gate, as linked
above).

One imagines this may also have an impact on the X11 keyboard/mouse
situation as those changes are presumably communicated via sysevents
to HAL, and HAL is similarly dropping the ball there.

Cheers.

Thanks a lot for your analysis, Josh.

In fact we had some minor glib updates in the past. Alas we have neither
automatic tests nor official testers at all.
So, the main test burden is left to me. And I am only able to do limited
manual tests, because I have lots of other things I want to do.
I only use USB sticks very rarely and while I do change my mouse or
keyboard from time to time, it hasn't been on my test scenarios in the past.
So problems like this will only be detected long after they have been
introduced.

I'd appreciate if we could find some volunteers for tests and would even
more appreciate if you could find somebody starting to create automatic
tests.

Regards


_______________________________________________
oi-dev mailing list
[email protected]
https://openindiana.org/mailman/listinfo/oi-dev

Reply via email to