I have tried a couple of versions of a patch to make the activate/
waitactive more atomic but each has had problems.  This note is
to document the issues I have found so far.  I'm looking for some
feedback since this problem spans the kernel, X and a small set of
apps.

I have cut the code sequence from Matt's attachment to doc the
issue.  I have verified the X code (actually in two hw spec places).

This is the key bit of uglyness.  Note that Ajax comments on this in
the IRC log attached...  What follows is an analysis with results of some 
attempted patches.

        console_fd = open ("/dev/tty7", O_RDWR|O_NDELAY);
        chown ("/dev/tty7", 0, 0);
        chown ("/dev/tty0", 0, 0);
        ioctl (console_fd, VT_GETSTATE, &vts);
        ioctl (console_fd, VT_ACTIVATE, 7);

This is the first problem.  We *really* need the ACTIVATE/WAITACTIVE
to be atomic.  My latest change was to turn "want_console" into
a mutex with a cmpxchg, setting it to -1 after the wait.  This would serialize 
the activates so that the waits return.  The problem with
this is twofold.  First, the clearing depends on the other side completing the 
transaction.  Second, there is not always a waiter.
Therefore, it either gets reset too son or the whole of console switching locks 
up.  We don't have waiters for the ctrl+meta+fN, output char sequence, or for 
chvt and friends, the "async" case...
There is also the problem that an ACTIVATE could silently fail to
do anything in the current code (doesn't check return status).

        ioctl (console_fd, VT_WAITACTIVE, 7);

This is the second nasty part of this exchange.  The only way to guarantee it 
happens is to delay the reset until the wait.  If we keep the reset on the 
"complete_change_console" side, there is a
window for the next activate to slip through.  Where we want to reset is here 
but we can't.  The fundamental problem is that we cannot assert that the 
waitactive will ever return.  We can't even test "want_console" reliably.

        ioctl (console_fd, VT_GETMODE, &VT);
        VT.mode = VT_PROCESS;
        VT.relsig = SIGUSR1;
        VT.acqsig = SIGUSR1;
        ioctl (console_fd, VT_SETMODE, &VT);

This is also a race.  Somebody else can come by after the wait and
steal the console before the setmode takes place.  This call completely changes 
the interface to no longer do activate/waits but instead, exchange signals at 
the process level.

        ioctl (console_fd, KDSETMODE, KD_GRAPHICS);

This adds some more fun.  It disables switching.

---------
The problem comes down to our using an outmoded design for
things it was never intended for.  It harkens back to single cpu,
non-preemptive kernels with simple VT and X sessions.  Using
this for suspend signaling is bizarre.

There seem to be three choices:

1.  Try more hacks.  As ajax notes, he handles this in userspace and
simply crashes X, not an option with suspend/resume.  One
problem with the existing api and its use is the lack of returning or
handling error returns.  X could be a bit more graceful and
careful use could be able to re-start failed switches.  This would increase the 
ubuntu maintenance load for an interface that not
many maintainers are enthusiastic about...

2.  Try a new api to existing code.  The big requirement would be to
have all the syscalls atomic wrt console management.  The
downside is it can't mix with the old api and breaks chvt and friends.
This also breaks a kernel api but, fortunately, it is of limited scope, i.e. 
chvt and friends, X, etc.  Ubuntu maintenance issues apply here too.

3.  Step back and try again with a new console subsystem.
There are some good ideas in 
http://homepages.tesco.net/J.deBoynePollard/Proposals/linux-console-daemon.html.
It appears that this has gotten no further than this white paper but
there are some good ideas here.  It *should* be possible to even preserve the 
old API as a compatibility mode for apps outside X and
other ubunto apps.  This is not even a series of bug fixes, it is a project.  
The details would be more appropriate for a whitepaper
or design proposal.

I'll next publish a provisional patch here.  I want some feedback
first so I can better understand requirements on the lpia+ubuntu
side and confirm in my own mind whether such a patch would
really help.


** Attachment added: "IRC chat with Adam Jackson"
   http://launchpadlibrarian.net/19098448/ajax.irclog

-- 
VT switching API is inherently racy
https://bugs.launchpad.net/bugs/290197
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to