Hi, On Thu, Jan 22, 2009 at 10:54:53AM +0100, Carl Fredrik Hammar wrote: > On Fri, Jan 16, 2009 at 01:11:09PM +0100, olafbuddenha...@gmx.net > wrote: > > On Sat, Jan 10, 2009 at 06:56:15PM +0100, Carl Fredrik Hammar wrote:
> I actually think we agree on what an object is: a bundle of state and > code with a specific interface, i.e. what you call abstract objects. > The interface can be RPCs, function calls, direct state manipulation, > or some other way of using the object. I'm not sure we are talking about the same... By "abstract object", I mean a bundle of state and code, but not necessarily bound to a specific interface. It could have multiple interfaces, or a single internal one that can be mapped do different external interfaces (RPC, local function call etc.). Or rather, there is a single interface at an abstract level, but this can be implemented using different transport mechanisms or containers or whatever we call them. (You see, we do not even agree on a definition of "interface" ;-) ) > A /remote/ object is an object that can be called remotely. A /local/ > object is one that can be called locally. I'm not sure remote object vs. local object is a meaningful distinction. We have the normal Hurd servers, where the objects are hard-wired to the RPC transport. We have the store framework (and hopefully a more generic framework in the future) for mobile objects, which can reside in the server and be accessed through the RPC transport, or be loaded into the client and accessed through local function calls. And we discussed the possibility of objects that can only reside in the client, and could be hard-wired to a local function call transport. > They are the best /pair/ of terms I've found so far. `RPC object' is > more specific than remote, but I haven't been able to find a good > substitute for /local/, the best I have mustered is /C/ object. I'm not sure whether this was clear: By "RPC object", I did not mean a specific subclass of some larger class I happened to call "abstract objects", but rather *any* abstract object, when accessed through the RPC transport/container. The *same* abstact object can be access as an RPC object, or as something else... But to avoid confusion, I guess it's better not to talk about "RPC objects" at all, but only about abstract objects and transports/containers. > A mobile object is one that can be copied from one process to another, > code and all. Note that both local and remote objects can both be > mobile or not. This statement doesn't make sense to me... I haven't really understood the definition of local/remote objects you proposed. It's not intuitive. > An /object system/ is a framework for implementing objects and > controls how they may be formed. libstore is a trivial object system > where all objects have the same single interface. Mach's IPC and MiG > forms the object system for remote objects. which allows objects with > several interfaces. > > A /mob/ is an object specifically implemented through my future object > system. Unless otherwise mentioned, a mob is assumed to be mobile as > it is the framework's primary purpose. This seems a bit confusing to me: The object system encompasses both the transport(s), and the methods for mapping an abstract interface to them? BTW, I just realized that I never considered the interaction between MiG and abstract interfaces... Stores support only a single abstract interface, so providing a matching MiG definition file for the RPC transport is not a problem; but if we have a generic framework that can handle any abstract interface, would we still create matching MiG definitions for each one manually?... > /Transparent/ in this context means that either a local or remote > object can be used with the same interface (using a wrapper). This is > to make it possible to fall-back on using the object remotely if the > object can't be transferred. Well, when a remote object is migrated to run locally, I would actually still call it the same (abstract) object; only it uses a different transport/container now... Transparent in this case means that neither the client nor the object itself cares whether it's in the client or in a server -- the mobility framework completely abstracts the differences between the RPC and local function call transports. Anyways, things are becoming clearer now :-) > I'm trying to avoid making assumptions on how interfaces might look > like. Well, to be honest, this discussion is in parts a bit too abstract/vague to my taste... I'd rather talk about more specific bits. > I do, however, suspect that transparent interfaces will be optimized > for the local object case. For a io interface that would probably > mean a POSIX style, rather than a Hurd style, interface. Not sure what you mean by POSIX style vs. Hurd style... But admittedly I do not really have any idea at all how the abstract interfaces of transparently migrating objects could look. However, I tend to think they would rather resemble the standard RPC interfaces. For one, the RPC transport is obviously the more limiting one, so it must dictate what is possible. Also, for practical reasons it seems inevitable that the interfaces should not be too different -- having two completely different approaches for creating Hurd objects would be too big a burden on programmers. And last but not least, I actually think the RPC case should probably have higher priority. I guess migration will be rather the exception than the rule: While not my primary interest in the Hurd, I think the potentially better robustness resulting from small isolated components is a nice bonus, which we shouldn't give up readily except in cases where really necessary for good performance... > > So I guess by your definition, the use case I'm interested in for > > translator stacking, would actually not classify under object > > migration, but under other uses... I guess you remember that I don't > > consider actual RPC emulation particularily useful :-) > > I'm guessing it classifies under partially transparent or > non-transparent object migration. This classification really depends on what level you are looking at... At the transport level, it would be totally non-transparent; on the abstract level, it would be completely transparent. (It might be worth considering cases where it's only transparent to the client but not the object, i.e. there is some special handling in the server implementation. I think we should try to avoid that however, to keep the implementations as simple as possible.) Also note that by transparency I mean that the implementation of a mobile object doesn't need to know whether it runs at the server side or the client side at any particular moment. I do not mean that a mobile object can be implemented exactly in the same manner as a traditional RPC-only object... (While this property would be certainly desirable, I don't think a mechanism working that way would really buy us much -- as I explained before.) > > > The command line mechanism can ignore many of the issues that > > > arise in mobility, e.g. consistency between different copies. > > > > I must admit that I don't see the difference... Please explain. > > Take copy store as an example. The copy store makes a copy-on-write > copy of another store and discards changes when closed. For instance, > a copy store over a zero store is useful for backing /tmp. > > If a copy store where to migrate, then all modifications would also be > copied. Writes made to the copy would not be reflected in the > original and vice versa. Because of this, the copy store has the > enforced flag set, which makes storeio refuse migration requests. > > When creating an object instead there will only be a single copy. > Which circumvents the problem entirely. So it's really not because the objects are created in the client in the first place instead of being migrated, but simply because objects created from the textual store representation are never shared between clients... We could get the very same situation with actual migration, by enforcing an "exclusive" property. Not a fundamental difference, but rather just a special case really. Same situation as the file pointer object we discussed below -- as long as there is only one client, certain things are possible that become problematic with multiple clients... I'd consider these to be special cases of object mobility, rather than completely different use cases of parts of the framework. > > > I do not have high hopes for this method though, mostly because > > > it's hard for the recipient to determine if it can trust the code. > > > > Well, in the simple case -- using the traditional UNIX model -- it's > > pretty trivial: The client trusts the code if it trust the server, > > which is the case when the server is run by the same user, or by > > root. In this case, there is no problem at all. > > Ah, but -- as per the Hurd's design goals -- we want to reduce the > trust needed between normal users to take advantage of this feature > when cooperating. And the client doesn't need to trust the server if > it acquires the code from a trusted source, e.g. from /lib, /usr/lib, > or $LD_LIBRARY_PATH, or even statically linked code. The problem with Hurd's design goals is that everyone has a different opinion on what they are... The only reference on that is "Towards a New Strategy of OS Design" -- but this is mostly a mixture of design ideas and nice features resulting from them; the goals are never really stated very explicitely. There is only one thing that manifests throughout the paper: Giving users more control over their computing environment. This is the one fundamental idea behind the Hurd design. Everything else in there is either a consequence of this fundamental goal, or just mentioned as another nice thing incidentally resulting from the design... Accessing servers that are run by untrusted users IMHO is one of the things in the latter category. It is mentioned in the paper, but it doesn't actually work: As Marcus and Neal pointed out, you *can't* blindly trust a filesystem server. It could do all kinds of nasty things, like creating infinite amounts of garbage, or just stalling indefinitely, both resulting in various kinds of DoS. Or it could provide a malicious link that tricks the user process into doing something destructive, like deleting or overwriting a precious file. (This last scenario is known as the "firmlink problem". AFAIK the standard firmlink implementation doesn't actually expose this problem, but it shouldn't be hard to create a corrupted variant that does.) Probably it wouldn't be hard to come up with other exploit scenarios that allow spying, manipulating files, and even completely taking over a user's account. So what does that mean for the Hurd? IMHO not much. (One of my major qualms with the "Critique" is that it presents this as a major failure of the Hurd architecture. I don't agree it is.) It has nothing to do with the fundamental goal of the Hurd design -- and it's not even a terribly useful feature. Why would anyone provide a filesystem server for other users of the machine? If someone wants to provide a service for others, the usual way of doing that would be implementing a network service. Not only does that remove the arbitrary limitation to other users of the same machine, but also network software is generally well aware of the possibility of misbehaving servers, and usually can handle it more or less well. Back on topic: Essentially it's out of the question ever to use a server that is run by an untrusted user. (In theory we could design clients that are immune against misbehaving file servers, but the ones we have so far aren't.) Admittedly, executing untrusted code is a more direct threat. Perhaps it's still worth trying to prevent it in the mobility framework; I'm not sure. > > Admittedly, this is more tricky when leaving the UNIX model, and > > working with pure capabilities... I'm not sure that an object named > > through a textual file name is indeed more trustworthy than one > > named through a port directly -- but I haven't really thought about > > it yet. I'm curious what you have to say on that in the promised > > later mails :-) > > Using a file name, you can figure out who controls the file, and > decide whether you trust it based on that. (Or at least I think so, > I'm not sure yet if a malicious file system can't fool you.) > > This might not be impossible with ports, but I imagine it's trickier. The problem with file names is that they aren't very reliable. For one, they only work if client and server are in the same name space. (You even mentioned chroot yourself...) Also, file names aren't stable temporaly: The meaning of the name could change between the time the server passes the name, and the time the client opens it. Symlink attacks for example are exploiting the unreliability of file names. FDs are less problematic in general -- they are much closer to pure capabilities. In the end, I think the only thing we can do with file names is resolving them, and then doing exactly the same checks we do on a directly passed FD: Checking that the node is owned and writable only by trusted users, and that it resides on a file system that can be trusted regarding this information. I can't see how we could derive any additional trust from the file name itself. It seems only to open additional potential for failure. > > In the UNIX case, it is actually quite symmetrical: The client > > trusts the object code provided by the server, if the server is the > > same user or root. The server entrusts the client with the content > > of the object, if the client is the same user or root. > > I'm hoping to make it so that the server doesn't need to trust that > the client doesn't miss use the content of the object. This by > verifying that the client already has the authority needed to hold it, > and would thus already able to acquire the content through other > means. You are right: I oversimplyfied it. Indeed it's not necessary that the client runs as the same user -- it suffices that it's a user that has access to all the capabilities the object requires. A simple mechanism for checking that is called for. (Note that the mechanism doesn't actually need to be secure: If the client turns out not to have the necessary capabilities after all, it will simply fail, harming only itself... Unless of course the object has some temporary state that must be migrated, and contains classified information -- but temporary state in translators is problematic in general, and should be avoided as far as possible.) > Also note that checking that it's the same user is not enough, a > process can have its authority limited by chroots and sub-Hurds. Interesting point. Does chroot normally prevent communication between processes inside the chroot and outside the chroot having the same UID?... > > We could still move the handling to the client in the more common > > case that there is only one client -- but that wouldn't solve the > > resource management problem, as there are still the cases where it > > must remain in the server. > > It doesn't need to be in *the* server, though someone must act as a > server for the file cursor object. This could be the original client, > the new client, the server, or a third-party server in the system/per > user/per login/whatever. > > My thoughts mostly revolve around clients pushing the cursor to a > third-party server and reloading if it becomes the sole client again. There is a somewhat similar situation with pipes: As long as there is only one reader and one writer, there is really no need for a server proccess -- the users could just communicate directly. When there are more readers and/or writers, an explicit server again becomes necessary. Note however that both in the FD case and the pipe case, the object migration is only an optimisation. The actual problem with the resource management is making sure that the (possibly shared) client state is accounted to the clients, not to the server. Depending on how the resource accounting framework works, your suggestion to keep the file pointer in an extra server could indeed help with that. But this doesn't require any object mobility framework. Introducing an explicit FD server is something that can be trivially done. All the object mobility framework does here is move the object to the client in the single-client case. This is again just a specific use of the standard mobility mechanism BTW, not a different use of some components of the framework :-) As I already said, I don't discourage conceptually considering the various indigents of the mobility framework as independant components. But as long as we don't actually have other users, you shouldn't try to make them any more generic than is strictly required for the standard mobility mechanism -- anything else would just be overengineering. > > I like the translator concept, because it allows intuitively naming > > objects through filesystem locations; and the objects are > > standalone, i.e. can be accessed directly from the command line, > > typically through a filesystem interface. > > I'm not sure what you mean by an object being standalone... Well, saying that they can be accessed from the command line is not exactly a precise definition, but I thought it would be sufficient to show what I mean... Standalone means that it is usable on its own. It doesn't require any external framework to use it; it doesn't need to be loaded in a special way or anything like that. > > An obvious use case are ioctl handlers: I believed for a long time > > that rather than being hardcoded in libc, they should be handled by > > some kind of loadable modules. This was actually discussed as part > > of the channel concept, but I discarded it back then, as it doesn't > > fulfill the transparency requirement, and thus didn't seem useful to > > me back then. > > I never did look into how ioctls are handled so I can't tell of-hand > whether this is a good idea. This is a bit surprising, considering that you explicitely mentioned that in your original libchannel design... Anyways, it's pretty simple really. Every ioctl is mapped to a distinct RPC. For simple ioctls, the mapping is systematic, and is done automatically by some crazy preprocessor magic. For more complex ones this isn't possible however. libc has explicit stubs for these, transforming the parameters as necessary (dereferencing pointers etc.), and then invoking the actual RPC. These stubs are individual for every ioctl: to support a new type of device, new stubs need to be added to libc -- which is obviously painful. Would be nice to have a mechanism that loads the stubs dynamically from some external source. > > Now I see that it might be still useful to implement this using a > > common mobility framework, so they can be handled like something > > akin to translators -- providing objects that are not really > > standalone, but are named through filesystem locations. > > They should be implementable as mobs. However, as they are more > specialized I don't think they need more than a single interface, so > they might want to use a separate object system. Well, is it better to use a generic framework that does more than strictly necessary in this case, or to create a specific framework for this particular use case? This is a very hard question. A too generic framework is problematic, because in addition to understanding the framework itself, you need to decide on how to make good use of it for a particular use case; and you have to implement a lot of code on top of the framework to achieve the desired properties. Too specific frameworks are problematic, because you have to learn a lot of specifics for each use case; and for a new use cases, you either need to create new frameworks, or use some existing ones that aren't really suitable for the purpose. Finding a good middle ground in any particular area is one of the main challanges of good design... -antrik-