Hi, On Wed, Dec 31, 2008 at 02:42:21PM +0200, Sergiu Ivanov wrote: > On Mon, Dec 29, 2008 at 8:25 AM, <olafbuddenha...@gmx.net> wrote: > > On Mon, Dec 22, 2008 at 07:19:50PM +0200, Sergiu Ivanov wrote:
> > The most radical approach would be to actually start a new nsmux > > instance for each filesystem in the mirrored tree. This might in > > fact be easiest to implement, though I'm not sure about other > > consequences... What do you think? Do you see how it could work? Do > > you consider it a good idea? > > > I'm not really sure about the details of such implementation, but when > I consider the recursive magic stuff, I'm rather inclined to come to > the conclusion that this will be way too much... Too much what?... > Probably, this variant could work, but I'll tell you frankly: I cannot > imagine how to do that, although I feel that it's possible. Well, it's really almost trivial... Let's take a simple example. Say the directory tree contains a file "/a/b/c". Say nodes "a" and "b" are served by the root filesystem, but "b" is translated, so "b'" (the effective, translated version of "b") and "c" are served by something else. Let's assume we have an nsmux set up at "n", mirroring the root filesystem. Now a client looks up "/n/a/b/c". (Perhaps indirectly by looking up "/a/b/c" in a session chroot-ed to "/n".) This results in a lookup for "a/b/c" on the proxy filesystem provided by nsmux. nsmux forwards the lookup to the real directory tree it mirrors, which is the root filesystem in our case. The root filesystem will look up "a/b", and see that "b" is translated. It will obtain the root of the translator, yielding the translated node "b'", and return that as the retry port, along with "c" as the retry name. (And RETRY_REAUTH I think.) Now in the traditional "monolithic" implementation, nsmux will create a proxy node for "a/b'", and pass on a port for this proxy node to the client as retry port, along with the other retry parameters. (Which are passed on unchanged.) The client will then do the retry, finishing the lookup. Now what about the control ports? The client could do a lookup for "/n/a/b" with O_NOTRANS for example, and then invoke file_get_translator_cntl() to get the control port of the translator sitting on "b". nsmux in this case forwards the request to the original "/a/b" node, but doesn't pass the result to the client directly. Instead, it creates an new port: a proxy control port to be precise. The real control port is stored in the port structure, and the proxy port is passed to the client. Any invocations the client does on this proxy control port are forwarded to the real one as appropriate. The same happens when the client looks up "/n/a/b/c", and then invokes file_getcontrol() on the result: nsmux forwards the request to the real "/a/b/c" node, and proxies the result. With the "distributed" nsmux, things would work a bit differently. Again there is a lookup for "/n/a/b/c", i.e. "a/b/c" on the proxy filesystem provided by nsmux; again it is forwarded to the root filesystem, and results in "a/b'" being returned along with a retry notification. The distributed nsmux now creates a proxy node of "a/b" (note: the untranslated b). It starts another nsmux instance, mirroring "/a/b'", and attaches this to the "a/b" proxy node. Again, the client will finish the lookup. (By doing the retry on the new nsmux instance.) When the client invokes file_get_translator_cntl() on "/n/a/b", the main nsmux sees that there is a translator attached to this proxy node (namely the second nsmux), and returns its control port to the client -- just like any normal filesystem would do. When the client does file_getcontrol() on "/n/a/b/c", the second nsmux will simply return its own control port, again like any normal filesystem. Now unfortunately I realized, while thinking about this explanation, that returning the real control port of nsmux, while beautiful in its simplicity, isn't really useful... If someone invoked fsys_goaway() on this port for example, it would make the nsmux instance go away, instead of the actual translator on the mirrored tree. And we can't simply forward requests on the nsmux control port to the mirrored filesystem: The main nsmux instance must handle both requests on it's real control port itself (e.g. when someone does "settrans -ga /n"), and forward requests from clients that did fsys_getcontrol on one of the proxied nodes to the mirrored filesystem. So we can't do without passing some kind of proxy control port to the clients, rather than the nsmux control port. Considering the we need the proxy control ports anyways, the whole idea of the distributed nsmux seems rather questionable now... Sorry for the noise :-) > > But let's assume for now we stick with one nsmux instance for the > > whole tree. With trivfs, it's possible for one translator to serve > > multiple filesystems -- I guess netfs can do that too... > > Could you please explain what do you mean by saying this in a more > detailed way? Well, normally each translator is attached to one underlying node. This association is created by fsys_startup(), which is usually invoked through trivfs_startup() for filesystems using libtrivfs. However, a translator can also create another control port with trivfs_create_control(), and attach it to some filesystem location manually with file_set_translator(). (This is not very common, but the term server for example does it, so the same translator can be attached both to the pty master and corresponding pty slave node.) We get a translator serving multiple filesystems. I assume that this is possible with libnetfs as well... Anyways, this variant would probably behave exactly the same as the "distributed" one, only that a single process would serve all nsmux instances, instead of a new process for each instance. It would have the same problems... So we probably don't need to consider it further. > > The alternative would be to override the default implementations of > > some of the fsys and other RPCs dealing with control ports, so we > > would only serve one filesystem from the library's point of view, > > but still be able to return different control ports. > > > > As we override the standard implementations, it would be up to us > > how we handle things in this case. Easiest probably would be to > > store a control port to the respective real filesystem in the port > > structure of every proxy control port we return to clients. > > This is variant I was thinking about: custom implementations of some > RPCs is the fastest way. At least I can imagine quite well what is > required to do and I can tell that this variant will be, probably, the > least resource consuming of all. Well, if this is the variant you feel most comfortable with, it's probably best to implement this one :-) We can still change it if we come up with something better later on... > > > As for the ``dotdot'' node, nsmux usually knows who is the parent > > > of the current node; if we are talking about a client using nsmux, > > > it is their responsibility to know who is the parent of the > > > current node. > > > > > > OTOH, I am not sure at all about the meaning of this argument, > > > especially since it is normally provided in an unauthenticated > > > version. > > > > AIUI it is returned when doing a lookup for ".." on the node > > returned by fsys_getroot(). In other words, it normally should be > > the directory in which the translated node resides. > > Yep, this is my understanding, too. I guess I have to take a glimpse > into the source to figure out *why* this argument is required... Well, there must be a way for the translator to serve lookups for the ".." node... So we need to tell it what the ".." node is at some point. One might think that a possible alternative approach would be to provide it once at translator startup, instead of individually on each root lookup. The behaviour would be different however: Imaginge a translator sitting on a node that is hardlinked (or firmlinked) from several directories. You can look it up from different directories, and the ".." node should be different for each -- passing it once on startup wouldn't work here. > > As the authentication is always specific to the client, there is no > > point in the translator holding anything but an unauthenticated port > > for "..". > > Sorry for the offtopic, but could be please explain what do you mean > by authentication here? (I would just like to clear out some issues in > my understanding of Hurd concepts) I wish someone would clear up *my* understanding of authentication a bit... ;-) Anyways, let's try. File permissions are generally checked by the filesystem servers. The permission bits and user/group fields in the inode determine which user gets what kind of access permissions on the file. To enforce this, the filesystem server must be able to associate the client's UID and GID with the UIDs and GIDs stored in the inode. So, how does the filesystem server know that a certain UID capability presented by the user corresponds to say UID 1003? This is done through the authentication mechanism. I'm not sure about the details. AIUI, on login, the password server asks the auth server to create an authentication token with all UIDs and GIDs the user posesses. (Well, one UID actually :-) ) This authentication token is (normally) inherited by all processes the user starts. Now one of the user processes contacts the filesystem server, and wants to access a file. The filesystem server must be told by the auth server what UIDs/GIDs this process has. This is what happens during the reauthentication: A bit simplyfied, the client process presents its authentication token to the auth server, and tells it to inform the filesystem server which UIDs/GIDs it conveys. From now on, the filesystem server knows which UIDs/GIDs correspond to port (protid) the client holds. (The actual reauthentication process is a bit tricky, because it is a three-way handshake...) This process needs to be done for each filesystem server the client contacts. This is why a reauthentication needs to be done after crossing a translator boundary -- or at least that is my understanding of it. The retry port returned to the client when a translator is encountered during lookup, is obtained by the server containing the node on which the translator sits, and obviously can't have the client's authentication; the client has to authenticate to the new server itself. > > > Well, setting translators in a simple loop is a bit faster, since, > > > for example, you don't have to consider the possibility of an > > > escaped ``,,'' every time > > > > Totally neglectible... > > > Of course :-) I've got a strange habit of trying to reduce the number > of string operations... I also suffer from this unhealthy tendency to think too much about pointless micro-optimizations... The remedy is to put things in perspective: In an operation that involves various RPCs, a process startup etc., some string operations taking some dozens or hundreds of clock cycles won't even show up in the profile... Code size optimisations are a different thing of course: Not only are ten bytes saved usually much more relevant than ten clock cycles saved (except in inner loops of course); but also it tends to improve code quality -- if you manage to write something with less overall operations, less special cases, less redundancy etc., it becomes much more readable, considerably less error-prone, much easier to modify, and alltogether more elegant... > The main reason that makes me feel uneasy is the fact that retries > actually involve lookups. I cannot really figure out for now what > should be looked up if I want to add a new translator in the dynamic > translator stack... The not yet processed rest of the file name -- just like on any other retry... These can be additional suffixes, or also additional file name components if dealing with directories. When looking up "foo,,x,,y/bar,,z" for example, the first lookup will process "foo,,x" and return ",,y/bar,,z" as the retry name; the second will process ",,y" and return "bar,,z"; the third will process "bar,,z" and return ""; and the last one will finish by looking up "" (i.e. only get a new port to the same node, after reauthenticating). I'm not entirely sure, but I think the retry is actually unavoidable for correct operation, as reauthentication should be done with each new translator... BTW, I just realized there is one area we haven't considered at all so far: Who should be able to start dynamic translators, and as which user should they run?... > > That's the definition of dynamic translators: They are visible only > > for their own clients, but not from the underlying node. (No matter > > whether this underlying node is served by another dynamic > > translator.) > > That seems clear. What makes we wonder, however, is how a filter will > traverse a dynamic translator stack, if it will not be able to move > from a dynamic translator to the one which is next in the dynamic > translator stack? Well, note that the filter is started on top of the stack. It is a client of the shadow note created when starting it, which is a client of the topmost dynamic translator in the stack, which is a client of its shadow node, which is a client of the second-to-top dynamic translator... So fundamentally, there is no reason why it wouldn't be able to traverse the stack -- being an (indirect) client of all the shadow nodes in the stack, it can get access to all the necessary information. The question is how it obtains that information... And you are right of course: I said in the past that the translator sitting on a shadow node doesn't see itself, because it has to see the other translators instead (so the filter can work). This is obviously a contradiction. This is a bit tricky, and I had to think about it for a while. I think the answer is that a dynamic translator should see both the underlying translator stack, *and* itself on top of it. For this, the shadow node needs to proxy all translator stack traversal requests (forwarding them to the underlying stack), until a request arrives for the node which the shadow node mirrors, in which case it returns the dynamic translator sitting on the shadow node. Let's look at an example. Say we have a node "file" with translators "a", "b" and "c" stacked on it. The effective resulting node translated trough all three is "file'''". When we access it through nsmux, we get a proxy of this node. Now we use a filter to skip "c", so we get a proxy node of "file''" -- the node translated through "a" and "b" only. Having that, we set a dynamic translators "x" on top of it. nsmux creates a shadow node mirroring "file''", and sets the "x" translator on this shadow node. Finally, it obtains the root node of "x" -- this is "file'',,x" -- and returns an (ordinary non-shadow) proxy node of that to the client. (To be clear: the "'" are not actually part of any file name; I'm just using them to distinguish the node translated by static translators from the untranslated node.) And now, we use another filter on the result. This is the interesting part. First another shadow node is set up, mirroring "file'',,x"; and the filter attached to it. (So temporarily we get something like "file'',,x,,f".) Now the filter begins its work: It starts by getting the untranslated version of the underlying node. The request doing this (fys_startup() IIRC?) is sent to the underlying node, i.e. to the second shadow node, mirroring (the proxy of) "file'',,x". This "x" shadow node forwards it to the node it mirrors: which is the aforementioned (non-shadow) proxy node of "file'',,x". nsmux may be able to derive the completely untranslated "file" node from that, but this would be rushing it: It would put the first, "file''" shadow node out of the loop. (Remember that conceptually, we treat the shadow nodes as if they were provided by distinct translators...) So what it does instead is obtaining the first shadow node (mirroring the proxy of "file''"): This one is the underlying node of the "x" translator, and is considered the untranslated version of the node provided by "x". It asks this shadow node for the untranslated version of the node it shadows... So the "file''" shadow node in turn forwards the request to the proxy of "file''". This node is handled by nsmux directly, without any further shadow nodes; so nsmux can directly get to the untranslated "file" node, and return a proxy of that. This new proxy node is returned to the first shadow node. The (conceptual) shadow translator now creates another shadow node: this one shadowing (the proxy of) the untranslated "file". (Oh no, even more of these damn shadow nodes!...) A port for this new shadow node is then returned to the requestor. The requestor in this case was the "file'',,x" proxy node, which also sets up a new proxy node, and passes the result (the proxy of the shadow of the proxy of untranslated "file"...) on to the second (conceptual) shadow translator, mirroring "file'',,x". This one also creates a new shadow node, so we get a shadow of a proxy of a shadow of the "file" proxy node... And this is what the filter sees. In the next step, the filter invokes file_get_translator_cntl() on this shadow-proxy-shadow-proxy-"file". The request is passed through the second new shadow node (let's call it the "x" shadow, as it is derived from the original "file'',,x" shadow node), and the new "x" proxy node (derived from the "file'',,x" proxy), and the first new shadow node (the "file" shadow), and through the primary "file" proxy node finally reaches the actual "file". There it gets the control port of "a". A proxy of this control port is created, and passed to the "file" shadow translator, which creates a shadow control... (ARGH!) This is in turn passed to the intermediate "x" proxy, which creates another proxy control port -- only to return that to the "x" shadow translator, which shadows the whole thing again. A shadow-proxy-shadow-proxy-control port for the "a" translator. Lovely. Now the filter launches fsys_getroot() on this, again passing through shadow and proxy and shadow and proxy, finally looking up the root node of "a" (i.e. "file'"), which is passed up again -- we get shadow-proxy-shadow-proxy-"file'". Again file_get_translator_cntl() resulting in shadow-proxy-shadow-proxy-control of "b", and again fsys_getroot() yielding shadow-proxy-shadow-proxy-"file''". Then yet another file_get_translator_cntl(), giving us... Can you guess it? Wait, not so hasty :-) This time actually something new happens, something exciting, something wonderful: The request is passed down by the "x" shadow and the "x" proxy as usual, but by the time it reaches the "file''" shadow (the one to which "x" is attached), the routine is broken: The shadow node realizes that the request is not for some faceless node further down, but indeed for the very "file''" node it is shadowing! So instead of passing the request down (which would get the control port of "c"), the shadow node handles the request itself, returning the control port of the translator attached to it: i.e. the control port of "x". This is again proxied by the "x" proxy, and shadowed by the "x" shadow translator. The following fsys_getroot() is again passed down by the "x" shadow and the "x" proxy, and handled by the "file''" shadow: It returns the root node of "x". The "x" proxy creates another proxy node; the "x" shadow translator creates another shadow node. The filter invokes file_get_translator_cntl() yet another time, this time handled by the top-most (now only active) shadow node directly (the "x" shadow, to which the filter is attached), returning the control port of the filter itself. We are there at last. Doesn't all this shadow-proxy-shadow-proxy-fun make your head spin, in at least two directions at the same time? If it does, I might say that I made a point... ;-) While I believe that this approach would work, I'm sure you will agree that it's horribly complicated and confusing. If the shadow nodes were really implemented by different translators, it would be terribly inefficient too, with all this proxying... And it is actually not entirely correct: Let's say we use a filter to skip "b" (and the rest of the stack on top of it). The filter would traverse the stack through all this shadow machinery, and would end up with a port to the root node of "a" that is still proxied by the various shadow nodes. The filter should return a "clean" node, without any shadowing on top of it; but the shadow nodes don't know when the filter is done and passes the result to the client, so they can't put themselfs out of the loop... All these problems (both the correctness issue and the complexity) result from the fact that the stack is traversed bottom-to-top: First all underlying nodes need to be traversed, and only then the shadow node takes action -- that's why it needs to proxy the other operations, so it's still in control when its time finally arrives. The whole thing would be infinitely simpler if we traversed the stack top to bottom: The shadow node would just handle the first request, and as soon as the client asks for the next lower translator in the stack, it would just hand over to that one entirely, not having any reason to interfere anymore. And this is what I meant above about making a point: It seems that my initial intuition about traversing top to bottom being simpler/more elegant, now proves to have been merited... Note that unlike what I originally suggested, this probably doesn't even strictly need a new RPC implemented by all the translators, to ask for the underlying node directly: As nsmux inserts proxy nodes at each level of the translator stack, nsmux should always be able to provide the underlying node, without the help of the actual translator... (In fact I used this possibility in the scenario described above, when initially wading down through the layers of dynamic translators, until we finally get to the static stack...) -antrik- PS. Your mails contain both plain text and HTML versions of the content. This unnecessarily bloats the messages, and is generally not seen favourably on mailing lists. Please change the configuration of your mail client to send plain text only.