Applied, thanks! jbra...@dismail.de, le dim. 26 mai 2024 21:09:23 -0400, a ecrit: > * hurd/bootstrap.mdwn: I inlined the what_is_an_os_bootstrap page, and > wrote that the current bootstrap page is out of date and does not > include pci-arbiter or rumpdisk. > * hurd/what_is_an_os_bootstrap.mdwn: a new web page that is not meant > to be viewed directly. Instead hurd/bootstrap and > open_issues/serverbootv2 is meant to inline the content. > * open_issues/serverbootv2.mdwn: Sergey proposed this new bootstrap > for the Hurd. This is a draft RFC document that explains the > reasoning behind it. Not that "Serverboot V2" is a working name. We > have yet to find a better name for it. > --- > hurd/bootstrap.mdwn | 7 + > hurd/what_is_an_os_bootstrap.mdwn | 24 + > open_issues/serverbootv2.mdwn | 899 ++++++++++++++++++++++++++++++ > 3 files changed, 930 insertions(+) > create mode 100644 hurd/what_is_an_os_bootstrap.mdwn > create mode 100644 open_issues/serverbootv2.mdwn > > diff --git a/hurd/bootstrap.mdwn b/hurd/bootstrap.mdwn > index fbce3bc1..c77682b9 100644 > --- a/hurd/bootstrap.mdwn > +++ b/hurd/bootstrap.mdwn > @@ -15,8 +15,15 @@ this text. --> > > [[!toc]] > > +[[!inline pagenames=hurd/what_is_an_os_bootstrap raw=yes feeds=no]] > + > # State at the beginning of the bootstrap > > +Please note that as of May 2024 this document is out of date. It does > +not explain how rumpdisk or the pci-arbitor is started. Also consider > +reading about [[Serverboot V2|open_issues/serverbootv2]], which > +is a new bootstrap proposal. > + > After initializing itself, GNU Mach sets up tasks for the various bootstrap > translators (which were loader by the GRUB bootloader). It notably makes > variables replacement on their command lines and boot script function calls > (see > diff --git a/hurd/what_is_an_os_bootstrap.mdwn > b/hurd/what_is_an_os_bootstrap.mdwn > new file mode 100644 > index 00000000..b2db2554 > --- /dev/null > +++ b/hurd/what_is_an_os_bootstrap.mdwn > @@ -0,0 +1,24 @@ > +[[!meta copyright="Copyright © 2020 Free Software Foundation, Inc."]] > + > +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable > +id="license" text="Permission is granted to copy, distribute and/or modify > this > +document under the terms of the GNU Free Documentation License, Version 1.2 > or > +any later version published by the Free Software Foundation; with no > Invariant > +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the > license > +is included in the section entitled [[GNU Free Documentation > +License|/fdl]]."]]"""]] > + > +[[!meta title="What is an OS bootstrap"]] > + > +# What is an OS bootstrap? > + > +An operating system's bootstrap is the process that happens shortly > +after you press the power on button, as shown below: > + > +Power-on -> Bios -> Bootloader -> **OS Bootstrap** -> service manager > + > +Note that in this context the OS bootstrap is not [building a > +distribution and packages from source > +code](https://guix.gnu.org/manual/en/html_node/Bootstrapping.html). > +The OS bootstrap has nothing to do with [reproducible > +builds](https://reproducible-builds.org/). > diff --git a/open_issues/serverbootv2.mdwn b/open_issues/serverbootv2.mdwn > new file mode 100644 > index 00000000..9702183e > --- /dev/null > +++ b/open_issues/serverbootv2.mdwn > @@ -0,0 +1,899 @@ > +[[!meta copyright="Copyright © 2024 Free Software > +Foundation, Inc."]] > + > +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable > +id="license" text="Permission is granted to copy, distribute and/or modify > this > +document under the terms of the GNU Free Documentation License, Version 1.2 > or > +any later version published by the Free Software Foundation; with no > Invariant > +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the > license > +is included in the section entitled [[GNU Free Documentation > +License|/fdl]]."]]"""]] > + > + > +# ServerBootV2 RFC Draft > + > +[[!inline pagenames=hurd/what_is_an_os_bootstrap raw=yes feeds=no]] > + > +The Hurd's current bootstrap, [[Quiet-Boot|hurd/bootstrap]] (a biased > +and made-up name), is fragile, hard to debug, and complicated: > + > +* `Quiet-boot` chokes on misspelled or missing boot arguments. When > + this happens, the Hurd bootstrap will likely hang and display > + nothing. This is tricky to debug. > +* `Quiet-Boot` is hard to change. For instance, when the Hurd > + developers added `acpi`, the `pci-arbiter`, and `rumpdisk`, they > + struggled to get `Quiet-Boot` working again. > +* `Quiet-Boot` forces each bootstrap task to include special bootstrap > + logic to work. This limits what is possible during the > + bootstrap. For instance, it should be trivial for the Hurd to > + support netboot, but `Quiet-Boot` makes it hard to add `nfs`, > + `pfinet`, and `isofs` to the bootstrap. > +* `Quiet-Boot` hurts other Hurd distributions too. When Guix > + developers updated their packaged version of the Hurd, that included > + support for SATA drives, a simple misspelled boot argument halted > + their progress for a few weeks. > + > +The alternative `ServerBoot V2` proposal (which was discussed on > +[irc](https://logs.guix.gnu.org/hurd/2023-07-18.log) and is similar to > +the previously discussed [bootshell > +proposal](https://mail-archive.com/bug-hurd@gnu.org/msg26341.html)) > +aims to code all or most of the bootstrap specific logic into one > +single task (`/hurd/serverboot`). `Serverboot V2` has a number > +of enticing advantages: > + > +* It simplifies the hierarchical dependency of translators during > + bootstrap. Developers should be able to re-order and add new > + bootstrap translators with minimal work. > +* It gives early bootstrap translators like `auth` and `ext2fs` > + standard input and output which lets them display boot errors. It > + also lets signals work. > +* One can trivially use most Hurd translators during the > + bootstrap. You just have to link them statically. > +* `libmachdev` could be simplified to only expose hardware to > + userspace; it might even be possible to remove it entirely. Also > + the `pci-arbiter`, `acpi`, and `rumpdisk` could be simplified. > +* Developers could remove any bootstrap logic from `libdiskfs`, which > + detects the bootstrap filesystem, starts the `exec` server, and > + spawns `/hurd/startup`. Instead,`libdiskfs` would only focus on > + providing filesystem support. > +* If an error happens during early boot, the user could be dropped > + into a REPL or mini-console, where he can try to debug the issue. > + We might call this `Bootshell V2`, in reference to the original > + proposal. This could be written in lisp. Imagine having an > + extremely powerful programming language available during bootstrap > + that is only [436 bytes!](https://justine.lol/sectorlisp2) > +* It would simplify the code for subhurds by removing the logic from > + each task that deals with the OS bootstrap. > + > +Now that you know why we should use `Serverboot V2`, let's get more > +detailed. What is `Serverboot V2` ? > + > +`Serverboot V2` would be an empty filesystem dynamically populated > +during bootstrap. It would use a `netfs` like filesystem that will > +populate as various bootstrap tasks are started. For example, > +`/servers/socket2` will be created once `pfinet` starts. It also > +temporarily pretends to be the Hurd process server, `exec`, and `/` > +filesystem while providing signals and `stdio`. Let's explain how > +`Serverboot V2` will bootstrap the Hurd. > + > +**FIXME The rest of this needs work.** > + > +Any bootstrap that the Hurd uses will probably be a little odd, > +because there is an awkward and circular startup-dance between > +`exec`, `ext2fs`, `startup`, `proc`, `auth`, the `pci-arbiter`, > +`rumpdisk`, and `acpi` in which each translator oddly depends on the > +other during the bootstrap, as this ascii art shows. > + > + > + pci-arbiter > + | > + acpi > + | > + rumpdisk > + | > + ex2fs -- storeio > + / \ > + exec startup > + / \ > + auth proc > + > + > +This means that there is no *perfect* Hurd bootstrap design. Some > +designs are better in some ways and worse in others. `Serverboot V2` > +would simplify other early bootstrap tasks, but all that complicated > +logic would be in one binary. One valid criticism of `Serverboot V2` > +is that it will may be a hassle to develop and maintain. In any case, > +trying to code the *best* Hurd bootstrap may be a waste of time. In > +fact, the Hurd bootstrap has been rewritten several times already. > +Our fearless leader, Samuel, feels that rewriting the Hurd bootstrap > +every few years may be a waste of time. Now that you understand why > +Samuel's discourages a Hurd bootstrap rewrite, let's consider why we > +should develop `Serverboot V2`. > + > +# How ServerBoot V2 will work > + > +Bootstrap begins when Grub and GNU Mach start some tasks, and then GNU > +Mach resumes the not-yet-written > +`/hurd/serverboot`. `/hurd/serverboot` is the only task to accept > +special ports from the kernel via command line arguments like > +`--kernel-task`; `/hurd/serverboot` tries to implement/emulate as much > +of the normal Hurd environment for the other bootstrap translators. > +In particular, it provides the other translators with `stdio`, which > +lets them read/write without having to open the Mach console device. > +This means that the various translators will be able to complain about > +their bad arguments or other startup errors, which they cannot > +currently do. > + > +`/hurd/serverboot` will provide a basic filesystem with netfs, which > +gives the other translators a working `/` directory and `cwd` > +ports. For example, `/hurd/serverboot`, would store its port at > +`/dev/netdde`. When `/hurd/netdde` starts, it will reply to its > +parent with `fsys_startup ()` as normal. > + > +`/hurd/serverboot` will also emulate the native Hurd process server to > +early bootstrap tasks. This will allow early bootstrap tasks to get > +the privileged (device master and kernel task) ports via the normal > +glibc function `get_privileged_ports (&host_priv, &device_master).` > +Other tasks will register their message ports with the emulated > +process server. This will allow signals and messaging during the > +bootstrap. We can even use the existing mechanisms in glibc to set and > +get init ports. For example, when we start the `auth` server, we will > +give every task started thus far, their new authentication port via > +glibc's `msg_set_init_port ()`. When we start the real proc server, > +we query it for proc ports for each of the tasks, and set them the > +same way. This lets us migrate from the emulated proc server to the > +real one. > + > +**Fix me: Where does storeio (storeio with** > +`device:@/dev/rumpdisk:wd0`**), rumpdisk, and the pci-arbiter come > +in?** > + > +Next, we start `ext2fs`. We reattach all the running translators from > +our `netfs` bootstrap filesystem onto the new root. We then send > +those translators their new root and cwd ports. This should happen > +transparently to the translators themselves! > + > +# Supporting Netboot > + > +`Serverboot V2` could trivially support netboot by adding `netdde`, > +`pfinet` (or `lwip`), and `isofs` as bootstrap tasks. The bootstrap > +task will start the `pci-arbiter`, and `acpi` (FIXME add some more > +detail to this sentence). The bootstrap task starts `netdde`, which > +will look up any `eth` devices (using the device master port, which it > +queries via the fake process server interface), and sends its fsys > +control port to the bootstrap task in the regular `fsys_startup > +()`. The bootstrap task sets the fsys control port as the translator > +on the `/dev/netdde` node in its `netfs` bootstrap fs. Then > +`/hurd/serverboot` resumes `pfinet`, which looks up > +`/dev/netdde`. Then `pfinet` returns its `fsys` control port to the > +bootstrap task, which it sets on `/servers/socket/2`. Then bootstrap > +resumes `nfs`, and `nfs` just creates a socket using the regular glibc > +socket () call, and that looks up `/servers/socket/2`, and it just > +works. **FIXME where does isofs fit in here?** > + > +Then `nfs` gives its `fsys` control port to `/hurd/serverboot`, which > +knows it's the real root filesystem, so it take the netdde's and > +pfinet's fsys control ports. Then it calls `file_set_translator ()` > +on the nfs on the same paths, so now `/dev/netdde` and > +`/servers/socket/2` exist and are accessible both on our bootstrap fs, > +and on the new root fs. The bootstrap can then take the root fs to > +broadcast a root and cwd port to all other tasks via a > +`msg_set_init_port ()`. Now every task is running on the real root fs, > +and our little bootstrap fs is no longer used. > + > +`/hurd/serverboot` can resume the exec server (which is the first > +dynamically-linked task) with the real root fs. Then we just > +`file_set_translator ()` on the exec server to `/servers/exec`, so > +that `nfs` doesn't have to care about this. The bootstrap can now > +spawn tasks, instead of resuming ones loaded by Mach and grub, so it > +next spawns the `auth` and `proc` servers and gives everyone their > +`auth` and `proc` ports. By that point, we have enough of a Unix > +environment to call `fork()` and `exec()`. Then the bootstrap tasks > +would do the things that `/hurd/startup` used to do, and finally > +spawns (or execs) `init / PID 1`. > + > +With this scheme you will be able to use ext2fs to start to your root > +fs via as `/hurd/ext2fs.static /dev/wd0s1`. This eliminates boot > +arguments like `--magit-port` and `--next-task`. > + > +This also simplifies `libmachdev`, which exposes devices to userspace > +via some Mach `device_*` RPC calls, which lets the Hurd contain device > +drivers instead of GNU Mach. Everything that connects to hardware can > +be a `machdev`. > + > +Additionally, during the `Quiet Boot` bootstrap,`libmachdev` awkwardly > +uses `libtrivfs` to create a transient `/` directory, so that the > +`pci-arbiter` can mount a netfs on top of it at bootstrap. > +`libmachdev` needs `/servers/bus` to mount `/pci,`and it also > +needs `/servers` and `/servers/bus` (and `/dev`, and > +`/servers/socket`). That complexity could be moved to `ServerbootV2`, > +which will create directory nodes at those locations. > + > +`libmachdev` provides a trivfs that intercepts the `device_open` rpc, > +which the `/dev` node uses. It also fakes a root filesystem node, so > +you can mount a `netfs` onto it. You still have to implement > +`device_read` and `device_write` yourself, but that code runs in > +userspace. An example of this can be found in > +`rumpdisk/block-rump.c`. > + > +`libpciaccess` is a special case: it has two modes, the first time it > +runs via `pci-arbiter`, it acquires the pci config IO ports and runs > +as x86 mode. Every subsequent access of pci becomes a hurdish user of > +pci-arbiter. > + > +`rumpdisk` exposes `/dev/rumpdisk`: > + > +``` > +$ showtrans /dev/rumpdisk > + /hurd/rumpdisk > +``` > + > + > +# FAQ > + > +## `Server Boot V2` looks like a ramdisk + a script...? > + > +Its not quite a ramdisk, its more a netfs translator that > +creates a temporary `/`. Its a statically linked binary. I don't > +think it differs from a multiboot module. > + > +## How are the device nodes on the bootstrap netfs attached to each > translator? > +## How does the first non-bootstrap task get invoked? > +## does bootstrap resume it? > +## Could we just use a ram disk instead? > +## One could stick an unionfs on top of it to load the rest of the system > after bootstrap. > + > +It looks similar to a ramdisk in principle, i.e. it exposes a fs which > +lives only in ram, but a ramdisk would not help with early bootstrap. > +Namely during early bootstrap, there are no signals or console. > +Passing control from from one server to the next via a bootstrap port > +is a kludge at best. How many times have you seen the bootstrap > +process hang and just sit there? `Serverboot V2` would solve that. > +Also, it would allow subhurds to be full hurds without special casing > +each task with bootstrap code. It would also clean up `libmachdev`, > +and Damien, its author, is in full support. > + > +## A ramdisk could implement signals and stdio. Isn't that more flexible? > + > +But if its a ramdisk essentially you have to provide it with a tar > +image. Having it live inside a bootstrap task only is > +preferable. Also the task could even exit when its done whether you > +use an actual ramdisk or not. You still need to write the task that > +boots the system. That is different than how it works currently. Also > +a ramdisk would have to live in mach, and we want to move things out > +of mach. > + > +Additionally, the bootstrap task will be loaded as the first multiboot > +module by grub. It's not a ramdisk, because a ramdisk has to contain > +some fs image (with data), and we'd need to parse that format. It > +might make sense to steer it more into that direction (and Samuel > +seems to have preferred it), because there could potentially be some > +config files, or other files that the servers may need to run. I'm not > +super fond of that idea. I'd prefer the bootstrap fs to be just a > +place where ports (translators) can be placed and looked up. Actually > +in my current code it doesn't even use `netfs`, it just implements the > +RPCs directly. I'll possibly switch to `netfs` later, or if the > +implementation stays simple, I won't use `netfs`. > + > +## Serverboot V2 just rewrites proc and exec. Why reimplement so much code? > + > +I don't want to exactly reimplement full `proc` and `exec` servers in the > +bootstrap task, it's more of providing very minimal emulation of some > +of their functions. I want to implement the two RPCs from the > +`proc` interface, one to give a task the privileged ports on request and > +one to let the task give me its msg port. That seems fairly simple to > +me. > + > +While we were talking of using netfs, my actual implementation doesn't > +even use that, it just implements the RPCs directly (not to suggest I > +have anything resembling a complete implementation). Here's some > +sample code to give you an idea of what it is like > + > + > + error_t > + S_proc_getprivports (struct bootstrap_task *task, > + mach_port_t *host_priv, > + mach_port_t *device_master) > + { > + if (!task) > + return EOPNOTSUPP; > + > + if (bootstrap_verbose) > + fprintf (stderr, "S_proc_getprivports from %s\n", task->name); > + > + *host_priv = _hurd_host_priv; > + *device_master = _hurd_device_master; > + > + return 0; > + } > + > + error_t > + S_proc_setmsgport (struct bootstrap_task *task, > + mach_port_t reply_port, > + mach_msg_type_name_t reply_portPoly, > + mach_port_t newmsgport, > + mach_port_t *oldmsgport, > + mach_msg_type_name_t *oldmsgportPoly) > + { > + if (!task) > + return EOPNOTSUPP; > + > + if (bootstrap_verbose) > + fprintf (stderr, "S_proc_setmsgport for %s\n", > task->name); > + > + *oldmsgport = task->msgport; > + *oldmsgportPoly = MACH_MSG_TYPE_MOVE_SEND; > + > + task->msgport = newmsgport; > + > + return 0; > + } > + > +Yes, it really is just letting tasks fetch the priv ports (so > +`get_privileged_ports ()` in glibc works) and set their message ports. > +So much for a slippery slope of reimplementing the whole process > +server :) > + > + > +## Let's bootstrap like this: initrd, proc, exec, acpi, pci, drivers, > +## unionfs+fs with every server executable included in the initrd tarball? > + > +I don't see how that's better, but you would be able to try something > +like that with my plan too. The OS bootstrap needs to start servers > +and integrate them into the eventual full hurd system later when the > +rest of the system is up. When early servers start, they're running > +on bare Mach with no processes, no `auth`, no files or file > +descriptors, etc. I plan to make files available immediately (if not > +the real fs), and make things progressively more "real" as servers > +start up. When we start the root fs, we send everyone their new root > +`dir` port. When we start `proc`, we send everyone their new `proc` > +port. and so on. At the end, all those tasks we have started in > +early boot are full real hurd proceses that are not any different to > +the ones you start later, except that they're statically linked, and > +not actually `io map`'ed from the root fs, but loaded by Mach/grub > +into wired memory. > + > +# IRC Logs > + > + <damo22> showtrans /dev/wd0 and you can open() that node and it will > + act as a device master port, so you can then `device_open` () devices > + (like wd0) inside of it, right? > + > + oh it's a storeio, that's… cute. that's another translator we'd need > + in early boot if we want to boot off /hurd/ext2fs.static /dev/wd0 > + > + <damo22> We implemented it as a storeio with > + device:@/dev/rumpdisk:wd0 > + > + so the `@` sign makes it use the named file as the device master, right? > + > + <damo22> the `@` symbol means it looks up the file as the device > + master yes. Instead of mach, but the code falls back to looking up > + mach, if it cant be found. > + > + I see it's even implemented in libstore, not in storeio, so it just > + does `file_name_lookup ()`, then `device_open` on that. > + > + <damo22> pci-arbiter also needs acpi because the only way to know the > + IRQ of a pci device reliably is to use ACPI parser, so it totally > + implements the Mach `device_*` functions. But instead of handling the > + RPCs directly, it sets the callbacks into the > + `machdev_device_emulations_ops` structure and then libmachdev calls > + those. Instead of implementing the RPCs themselves, It abstracts them, > + in case you wanted to merge drivers. This would help if you wanted > + multiple different devices in the same translator, which is of course > + the case inside Mach, the single kernel server does all the devices. > + > + but that shouldn't be the case for the Hurd translators, right? we'd > + just have multiple different translators like your thing with rumpdisk > + and rumpusb. > + > + `<damo22>` i dont know > + > + ok, so other than those machdev emulation dispatch, libmachdev uses > + trivfs and does early bootstrap. pci-arbiter uses it to centralize the > + early bootstrap so all the machdevs can use the same code. They chain > + together. pci-arbiter creates a netfs on top of the trivfs. How > + well does this work if it's not actually used in early bootstrap? > + > + <damo22> and rumpdisk opens device ("pci"), when each task is resumed, > + it inherits a bootstrap port > + > + and what does it do with that? what kind of device "pci" is? > + > + <damo22> its the device master for pci, so rumpdisk can call > + pci-arbiter rpcs on it > + > + hm, so I see from the code that it returns the port to the root of its > + translator tree actually. Does pci-arbiter have its own rpcs? does it > + not just expose an fs tree? > + > + <damo22> it has rpcs that can be called on each fs node called > + "config" per device: hurd/pci.defs. libpciaccess uses these. > + > + how does that compare to reading and writing the fs node with regular > read and write? > + > + <damo22> so the second and subsequent instances of pciaccess end up > + calling into the fs tree of pci-arbiter. you can't call read/write on > + pci memory its MMIO, and the io ports need `inb`, `inw`, etc. They > + need to be accessed using special accessors, not a bitstream. > + > + but I can do $ hexdump /servers/bus/pci/0000/00/02/0/config > + > + <damo22> yes you can on the config file > + > + how is that different from `pci_conf_read` ? it calls that. > + > + <damo22> the `pci fs` is implemented to allow these things. > + > + why is there a need for `pci_conf_read ()` as an RPC then, if you can > + instead use `io_read` on the "config" node? > + > + <damo22> i am not 100% sure. I think it wasn't fully implemented from > + the beginning, but you definitely cannot use `io_read ()` on IO > + ports. These have explicit x86 instructions to access them > + MMIO. maybe, im not sure, but it has absolute physical addressing. > + > + I don't see how you would do this via `pci.defs` either? > + > + <damo22> We expose all the device tree of pci as a netfs > + filesystem. It is a bus of devices. you may be right. It would be best > + to implement pciaccess to just read/write from the filesystem once its > + exposed on the netfs. > + > + yes, the question is: > + > + 1 is there anything that you can do by using the special RPCs from > + pci.defs that you cannot do by using the regular read/write/ls/map > + on the exported filsystem tree, > + 2 if no, why is there even a need for `pci.defs`, why not always use > + the fs? But anyway, that's irrelevant for the question of bootstrap > + and libmachdev > + > + <damo22> There is a need for rpcs for IO ports. > + > + Could you point me to where rumpdisk does `device_open ("pci")`? grep > + doesn't show anything. which rpcs are for the IO ports? > + > + <damo22> They're not implemented yet we are using raw access I > + think. The way it works, libmachdev uses the next port, so it all > + chains together: `libmachdev/trivfs_server.c`. > + > + but where does it call `device_open ("pci")` ? > + > + <damo22> when the pci task resumes, it has a bootstrap port, which is > + passed from previous task. There is no `device_open ("pci")`. or if > + its the first task to be resumed, it grabs a bootstrap port from > + glibc? im not sure > + > + ok, so if my plan is implemented how much of `libmachdev` functionality > + will still be used / useful? > + > + <damo22> i dont know. The mach interface? device interface\*. maybe > + it will be useless. > + > + I'd rather you implemented the Mach device RPCs directly, without the > + emulation structure, but that's an unrelated change, we can leave that > + in for now. > + > + <damo22> I kind of like the emulation structure as a list of function > + pointers, so i can see what needs to be implemented, but that's > + neither here nor there. `libmachdev` was a hack to make the bootstrap > + work to be honest.…and we'd no longer need that. I would be happy if > + it goes away. the new one would be so much better. > + > + is there anything else I should know about this all? What else could > + break if there was no libmachdev and all that? > + > + <damo22> acpi, pci-arbiter, rumpdisk, rumpusbdisk > + > + right, let's go through these > + > + <damo22> The pci-arbiter needs to start first to claim the x86 config > + io ports. Then gnumach locks these ports. No one else can use them. > + > + so it starts and initializes **something** what does it need? the > + device master port, clearly, right? that it will get through the > + glibc function / the proc API > + > + <damo22> it needs a /servers/bus and the device master > + > + <solid_black> > + right, so then it just does fsys_startup, and the bootstrap task > + places it onto `/servers/bus` (it's not expected to do > + `file_set_translator ()` itself, just as when running as a normal > + translator) > + > + <damo22> it exposes a netfs on `/servers/bus/pci` > + > + <solid_black> so will pci-arbiter still expose mach devices? a mach > + device master? or will it only expose an fs tree + pci.defs? > + > + <damo22> i think just fs tree and pci.defs. should be enough > + > + <solid_black> ok, so we drop mach dev stuff from pci-arbiter > + completely. then acpi starts up, right? what does it need? > + > + <damo22> It needs access to `pci.defs` and the pci tree. It > + accesses that via libpciaccess, which calls a new mode that > + accesses the fstree. It looks up `servers/bus/pci`. > + > + ok, but how does that work now then? > + > + <damo22> It looks up the right nodes and calls pci.defs on them. > + > + <solid_black> looks up the right node on what? there's no root > + filesystem at that point (in the current scheme) > + > + `<damo22>` It needs pci access > + > + that's why I was wondering how it does `device_open ("pci")` > + > + <damo22> I think libmachdev from pci gives acpi the fsroot. there is a > + doc on this. > + > + so does it set the root node of pci-arbiter as the root dir of acpi? > + as in, is acpi effectively chrooted to `/servers/bus/pci`? > + > + <damo22> i think acpi is chrooted to the parent of /servers. It shares > + the same root as pci's trivfs. > + > + i still don't quite understand how netfs and trivfs within pci-arbiter > interact. > + > + <damo22> you said there would be a fake /. Can't acpi use that? > + > + <solid_black> yeah, in my plan / the new bootstrap scheme, there'll be > + a / from the very start. > + > + <damo22> ok so acpi can look up /servers/bus/pci, and it will exist. > + > + and pci-arbiter can really sit on `/servers/bus/pci` (no need for > + trivfs there at all) and acpi will just look up > + `/servers/bus/pci`. And we do not need to change anything in acpi to > + get it to do that. > + > + And how does it do it now? maybe we'd need to remove some > + no-longer-required logic from acpi then? > + > + <damo22> it looks up device ("pci") if it exists, otherwise it falls > + back to `/servers/bus/pci`. > + > + Ah hold on, maybe I do understand now. currently pci-arbiter exposes > + its mach dev master as acpi-s mach dev master. So it looks up > + device("pci") and finds it that way. > + > + <damo22> correct, but it doesnt need that if the `/` exists. > + > + yeah, we could remove this in the new bootstrap scheme, and just > + always open the fs node (or leave it in for compatibility, we'll see > + about that). acpi just sits on `/servers/acpi/tables`. > + > + `rumpdisk` runs next and it needs `/servers/bus/pci`, `pci.defs`, and > + `/servers/acpi/tables`, and `acpi.defs`. It exposes `/dev/rumpdisk`. > + > + Would it make sense to make rumpdisk expose a tree/directory of Hurd > + files and not Mach devices? This is not necessary for anything, but > + just might be a nice little cleanup. > + > + <damo22> well, it could expose a tree of block devices, like > + `/dev/rumpdisk/ide/1`. > + > + <solid_black> and then `ln -s /rumpdisk/ide/1 /dev/wd1`. and no need > + for an intermediary storeio. plus the Hurd file interface is much > + richer than Mach device, you can do fsync for instance. > + > + <damo22> the rump kernel is bsd under the hood, so needs to be > + `/dev/rumpdisk/ide/wd0` > + > + <solid_black> You can just convert "ide/0" to "/dev/wd0" when > + forwarding to the rump part. Not that I object to ide/wd0, but we can > + have something more hierarchical in the exposed tree than old-school > + unix device naming? Let's not have /dev/sda1. Instead let's have > + /dev/sata/0/1, but then we'd still keep the bsd names as symlinks into > + the *dev/rumpdisk*… tree > + > + <damo22> sda sda1 > + > + <solid_black> good point > + > + <damo22> 0 0/1 > + > + <solid_black> well, you can on the Hurd :D and we won't be doing that > + either, rumpdisk only exposes the devices, not partitions > + > + <damo22> well you just implement a block device on the directory? but > + that would be confusing for users. > + > + <solid_black> I'd expect rumpdisk to only expose device nodes, like > + /dev/rumpdisk/ide/0, and then we'd have /dev/wd0 being a symlink to > + that. And /dev/wd0s1 being a storeio of type part:1:/dev/wd0 or > + instead of using that, you could pass that as an option to your fs, > + like ext2fs -T typed part:1/dev/wd0 > + > + <damo22> where is the current hurd bootstrap (QuietBoot) docs hosted? > + here: > + https://git.savannah.gnu.org/cgit/hurd/web.git/plain/hurd/bootstrap.mdwn > + > + <solid_black> so yeah, you could do the device tree thing I'm > + proposing in rumpdisk, or you could leave it exposing Mach devices and > + have a bunch of storeios pointing to that. So anyway, let's say > + rumpdisk keeps exposing a single node that acts as a Mach device > + master and it sits on /dev/rumpdisk. > + > + <solid_black> Then we either need a storeio, or we could make ext2fs > + use that directly. So we start `/hurd/ext2fs.static -T typed > + part:1:@/dev/rumpdisk:wd0`. > + > + <solid_black> I'll drop all the logic in libdiskfs for detecting if > + it's the bootstrap filesystem, and starting the exec server, and > + spawning /hurd/startup. It'll just be a library to help create > + filesystems. > + > + <solid_black> After that the bootstrap task migrates all those > + translator nodes from the temporary / onto the ext2fs, broadcasts the > + root and cwd ports to everyone, and off we go to starting auth and > + proc and unix. sounds like it all would work indeed. so we're just > + removing libmachdev completely, right? > + > + <damo22> netdde links to it too. I think it has libmachdevdde > + > + <solid_black> Also how would you script this thing. Like ideally we'd > + want the bootstrap task to follow some sort of script which would say, > + for example, > + > + mkdir /servers > + mkdir /servers/bus > + settrans /servers/bus/pci ${pci-task} --args-to-pci > + mkdir /dev > + settrans /dev/netdde ${netdde-task} --args-to-netdde > + setroot ${ext2fs-task} --args-to-ext2fs > + > + <solid_black> and ideally the bootstrap task would implement a REPL > + where you'd be able to run these commands interactively (if the > + existing script fails for instance). It can be like grub, where it has > + a predefined script, and you can do something (press a key combo?) to > + instead run your own commands in a repl. or if it fails, it bails out > + and drops you into the repl, yes. this gives you **so much more** > + visibility into the boot process, because currently it's all scattered > + across grub, libdiskfs (resuming exec, spawning /hurd/startup), > + /hurd/startup, and various tricky pieces of logic in all of these > + servers. > + > + <solid_black> We could call the mini-repl hurdhelper? If something > + fails, you're on your own, at best it prints an error message (if the > + failing task manages to open the mach console at that point) Perhaps > + we call the new bootstrap proposal Bootstrap. > + > + <solid_black> When/if this is ready, we'll have to remove libmachdev > + and port everything else to work without it. > + > + <damo22> yes its a great idea. I'm not a fan of lisp either. If i > + keep in mind that `/` is available early, then I can just clean up the > + other stuff. and assume i have `/`, and the device master can be > + accessed with the regular glibc function, and you can printf freely > + (no need to open the console). Do i need to run `fsys_startup` ? > + > + yes, exactly like all translators always do. Well you probably run > + netfs_startup or whatever, and it calls that. you're not supposed to > + call fsys_getpriv or fsys_init > + > + <damo22> i think my early attempts at writing translators did not use > + these, because i assumed i had `/`. Then i realised i didn\`t. And > + libmachdev was born. > + > + <solid-black> Yes, you should assume you have /, and just do all the > + regular things you would do. and if something that you would usually > + do doesn't work, we should think of a way to make it work by adding > + more stuff in the bootstrap task when it's reasonable to, of > + course. and please consider exposing the file tree from rumpdisk, > + though that's orthogonal. > + > + <damo22> you mean a tree of block devices? > + > + <solid_black> Yes, but each device node would be just a Hurd (device) > + file, not a Mach device. i.e. it'd support io_read and io_write, not > + device_read and device_write. well I guess you could make it support > + both. > + > + <damo22> isnt that storeio's job? > + > + <solid_black> if a node only implements the device RPCs, we need a > + storeio to turn it into a Hurd file, yes. but if you would implement > + the file RPCs directly, there wouldn't be a need for the intermediary > + storeio, not that it's important. > + > + <damo22> but thats writing storeio again. thing is, i dont know at > + runtime which devices are exposed by rump. It auto probes them and > + prints them out but i cant tell programmatically which ones were > + detected, becuause rump knows which devices exist but doesn't expose > + it over API in any way. Because it runs as a kernel would with just > + one driver set. > + > + <damo22> Rump is a decent set of drivers. It does not have better > + hardware support than Linux drivers (of modern Linux)? Instead Rump is > + netbsd in a can, and it's essentially unmaintained upstream > + too. However, it still is used it to test kernel modules, but it lacks > + makefiles to separate all drivers into modules. BUT using rump is > + better than updating / redoing the linux drivers port of DDE, because > + netbsd internal kernel API is much much more stable than linux. We > + would fall behind in a week with linux. No one would maintain the > + linux driver -> hurd port. Also, there is a framework that lets you > + compile the netbsd drivers as userspace unikernels: rump. Such a > + thing only does not exist for modern Linux. Rump is already good > + enough for some things. It could replace netdde. It already works for > + ide/sata. > + > + <damo22> Rump it has its own /dev nodes on a rumpfs, so you can do > + something like `rump_ls` it. > + > + <damo22> Rump is a minimal netbsd kernel. It is just the device > + drivers, and a bit of pthreading, and has only the drivers that you > + link. So rumpdisk only has the ahci and ide drivers and nothing > + else. Additionally rump can detect them off the pci bus. > + > + <damo22> I will create a branch on > + <http://git.zammit.org/hurd-sv.git> with cleaned translators. > + > + <damo22> solid_black: i almost cleaned up acpi and pci-arbiter but > + realised they are missing the shutdown notification when i strip out > + libmachdev. > + > + <solid-black>: "how are the device nodes on the bootstrap netfs > attached to > + each translator?" – I don't think I understand the question, please > + clarify. > + > + <damo22> I was wondering if the new bootstrap process can resume a fs > + task and have all the previous translators wake up and serve their > + rpcs. without needing to resume them. we have a problem with the > + current design, if you implement what we discussed yesterday, the IO > + ports wont work because they are not exposed by pci-arbiter yet. I am > + working on it, but its not ready. > + > + <solid_black> I still don't understand the problem. the bootstrap > + task resumes others in order. the root fs task too, eventually, but > + not before everything that hash to come up before the root fs task is > + ready. > + > + <damo22> I don't think it needs to be a disk. Literally a trivfs is > enough. > + > + <solid_black> why are I/O ports not exposed by pci-arbiter? why isn't > + that in issue with how it works currently then? > + > + <damo22> solid_black: we are using ioperm() in userspace, but i want > + to refactor the io port usage to be granularly accessed. so one day > + gnumach can store a bitmap of all io ports and reject more than one > + range that overlaps ports that are in use. since only one user of any > + port at any time is allowed. i dont know if that will allow users to > + share the same io ports, but at least it will prevent users from > + clobbering each others hw access. > + > + <solid_black> damo22: (again, sorry for not understanding the hardware > + details), so what would be the issue? when the pci arbiter starts, > + doesn't it do all the things it has to do with the I/O ports? > + > + <damo22> io ports are only accessed in raw method now. Any user can do > + ioperm(0, 0xffff, 1) and get access to all of them > + > + <solid_black> doesn't that require host priv or something like that? > + > + <damo22> yeh probably. maybe only root can. But i want to allow > + unprivileged users to access io ports by requesting exclusive access > + to a range. > + > + <solid_black> I see that ioperm () in glibc uses the device master > + port, so yeah, root-only (good) > + > + `<damo22>` first in locks the port range > + > + <solid_black> but you're saying that there's someting about these I/O > + ports that works today, but would break if we implemented what we > + discussed yeasterday? what is it, and why? > + > + `<damo22>` well it might still work. but there's a lot of changes to > + be done in general > + > + <solid_black> let me try to ask it in a different way then > + > + <damo22> i just know a few of the specifics because i worked on them. > + > + <solid_black> As I understand it, you're saying that 1: currently any > + root process can request access to any range of I/O ports, and you > + also want to allow **unprivileged** processes to get access to ranges > + of I/O ports, via a new API of the PCI arbiter (but this is not > + implemented yet, right?) > + > + <damo22> yes > + > + <solid_black> 2: you're saying that something about this would break / > + be different in the new scheme, compared to the current scheme. i > + don't understand the 2, and the relation between 1 and 2. > + > + <damo22> 2 not really, I may have been mistaken it probably will > + continue working fine. until i try to implement 1. ioperm calls > + `i386_io_perm_create` and `i386_io__perm_modify` in the same system > + call. I want to seperate these into the arbiter so the request goes > + into pci-arbiter and if it succeeds, then the port is returned to the > + caller and the caller can change the port access. > + > + <solid_black> yes, so what about 2 will break 1 when you try to > implement it? > + > + <damo22> with your new bootstrap, we need `i386_io_perm_*` to be > + accessible. im not sure how. is that a mach rpc? > + > + <solid_black> these are mach rpcs. i386_io_perm_create is an rpc that > + you do on device master. > + > + <damo22> should be ok then > + > + <solid_black> i386_io_perm_modify you do on you task port. yes, I > + don't see how this would be problematic. > + > + <damo22>: you might find this branch useful > + <http://git.zammit.org/hurd-sv.git/log/?h=feat-simplify-bootstrap> > + > + <solid_black> although: > + > + 1. I'm not sure whether the task itself should be wiring its memory, > + or if the bootstrap task should do it. > + 2. why do you request startup notifications if you then never do > + anything in `S_startup_dosync`? > + > + <solid_black> same for essential tasks actaully, that should probably > + be done by the bootstrap task and not the translator itself (but we'll > + see) > + > + <solid_black> 1. don't `mach_print`, just `fprintf (stderr, "")` > + <solid_black> 2. please always verify the return result of > + `mach_port_deallocate` (and similar functions), > + typically like this: > + > + err = mach_port_deallocate (…); > + assert_perror_backtrace (err); > + > + this helps catch nasty bugs. > + > + <solid_black> 3. I wonder why both acpi and pci have their own > + `pcifs_startup` and `acpifs_startup`; can't they use `netfs_startup > + ()`? > + > + `<damo22>` 1. no idea, 2. rumpdisk needed it, but these might > + not 3. ACK, 4.ACK, 5. I think they couldnt use the `netfs_startup ()` > + before but might be able to now. Anyway, this should get you booting > + with your bootstrap translator (without rumpdisk). Rumpdisk seems to > + use the `device_* RPC` from `libmachdev` to expose its device. > + whereas pci and acpi dont use them for anything except `device_open` > + to pass their port to the next translator. I think my latest patch > + for io ports will work. but i need to rebuild glibc and libpciaccess > + and gnumach. Why does libhurduser need to be in glibc? It's quite > + annoying to add an rpc. > + > + I think i have done gnumach io port locking, and pciaccess, but hurd > + part needs work and then to merge it needs a rebuild of glibc because > + of hurduser > + > + <damo22> Why cant libhurduser be part of the hurd package? > + > + I don't think I understnad enough of this to do a review, but I'd > + still like to see the patch if it's available anywhere. > + > + <damo22> ok i can push to my repos > + > + <solid_black> glibc needs to use the Hurd RPCs (and implement some, > + too), and glibc cannot depend on the Hurd package because the Hurd > + package depends on glibc. > + > + <damo22> lol ok > + > + <solid_black> As things currently stand, glibc depends on the Hurd > + **headers** (including mig defs), but not any Hurd binaries. still, > + the cross build process is quite convoluted. I posted about it > + somewhere: https://floss.social/@bugaevc/109383703992754691 > + > + <jpoiret> the manual patching of the build system that's needed to > + bootstrap everything is a bit suboptimal. > + > + <damo22> what if you guys submit patches upstream to glibc to add a > + build target to copy the headers or whatever is needed? solid_black: > + see > + > [http://git.zammit.org/{libpciaccess.git,gnumach.git](http://git.zammit.org/%7Blibpciaccess.git,gnumach.git)} > + on fix-ioperm branches > -- > 2.45.1 > > > >
-- Samuel --- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria.