http://lwn.net/Articles/305740/Linux and object storage devicesThe developers of OSDs were driven by the idea that traditional, block-based disk drives offer an overly low-level interface. With contemporary hardware, it should be possible to push more intelligence into storage devices, offloading work from the host while maintaining (or improving) performance and security. So the interface offered by an OSD does not deal in blocks; instead, the OSD provides "objects" to the host system. Most objects will simply be files, but a few other types of objects (partitions, for example) are supported as well. The host manipulates these objects, but need not (and cannot) concern itself with how those objects are implemented within the device. A file object is identified by two 64-bit numbers. It contains whatever data the creator chooses to put in there; an OSD does not interpret the data in any way. Files also have a collection of attributes and metadata; this includes much of the information stored in an on-disk inode in a traditional filesystem - but without the block layout information, which the OSD hides from the rest of the world. All of the usual operations can be performed on files - reading, writing, appending, truncating, etc. - but, again, the implementation of those operations is handled by the OSD. One thing that is not handled by the OSD, though, is the creation of a directory hierarchy or the naming of files. It is expected that the host filesystem will use file objects to store its directory structure, providing a suitable interface to the filesystem's users. One could, presumably, also use an OSD as a sort of hardware-implemented object database without a whole lot of high-level code, but that is not where the focus of work with OSDs is now. [PULL QUOTE: The OSD designers decided to offload another task from the host systems: security. END QUOTE] The OSD protocol [PDF] is a T10-sanctioned extension to the SCSI protocol. It is thus expected that OSD devices will be directly attached to host systems; the protocol has been designed to perform well in that mode. It is also expected, though, that OSDs will be used in network-attached storage environments. For such deployments, the OSD designers decided to offload another task from the host systems: security. To that end, the OSD protocol includes an extensive set of security-related commands. Every operation on an object must be accompanied by a "capability," a cryptographically-signed ticket which names the object and the access rights possessed by the owner of the capability. In the absence of a suitable capability, the drive will deny access. It is expected that capabilities will be handed out by a security policy daemon running somewhere on the network. That daemon may be in possession of the drive's root key, which allows unrestricted access to the drive, or it may have a separate, partition-level key instead. Either way, it can use that key to sign capabilities given out to processes elsewhere in the system. (Drives also have a "master" key, used primarily to change the root key. Loss of the master key is probably a restore-from-backup sort of event.) Capabilities last for a while (they include an expiration time) and describe all of the allowed operations. So the act of actually obtaining a capability should be relatively rare; most OSD operations will be performed using a capability which the system already has in hand. That is an important design feature; adding "ask a daemon for a capability" to the filesystem I/O path would not be a performance-enhancing move. In theory, it should be relatively easy to make a standard Linux filesystem support an OSD. It's mostly a matter of hacking out much of the low-level block layout and inode management code, replacing it with the appropriate object operations. The osdfs filesystem was created in this way; the developers started with ext2. After taking out all the code they no longer needed, the osdfs developers simply added code translating VFS-level requests into operations understood by the OSD. Those requests are then executed by way of the low-level osd-initiator code (which was also recently submitted for consideration). Directories are implemented as simple files containing names and associated object IDs. There is no separate on-disk inode; all of that information is stored as attributes to the file itself. The end result is that the osdfs code is relatively small; it is mostly concerned with remapping VFS operations into OSD operations. Anybody wanting to test this code may run into one small problem: there are few OSDs to be found in the neighborhood computer store. It would appear that most of the development work so far has been done using OSD simulators. The OSC software OSD is, like osdfs, part of the open-osd project; it implements the OSD protocol over an SQLite database. There is also an OSD simulator hosted at IBM, but it would not appear to be under current development. Simulator-based development and testing may not be as rewarding as having a shiny new device implementing OSD in hardware, but it will help to insure that both the software and the protocol are in good shape by the time such hardware is available. It should be noted that the success of OSDs is not entirely assured. An OSD takes much of the work normally done in an operating system kernel and shoves it into a hardware firmware blob where it cannot be inspected or fixed. A poor implementation will, at best, not perform well; at worst, the chances of losing data could increase considerably. It may yet prove best to insist that storage devices just concentrate on placing bits where the operating system tells them to and leave the higher-level decisions to higher-level code. Or it may turn out that OSDs are the next step forward in smarter, more capable hardware. Either way, it is an interesting experiment. See this article at Sun for more information on how OSD works. |
- [linuxkernelnewbies] Linux and object storage devices [LWN.net] Peter Teoh
- [linuxkernelnewbies] Linux and object storage devices [LWN... Peter Teoh
- [linuxkernelnewbies] Linux and object storage devices [LWN... Peter Teoh
