On 13/03/2019 17.31, Jason J. Herne wrote: > Allows guest to boot from a vfio configured real dasd device. > > Signed-off-by: Jason J. Herne <[email protected]> > Reviewed-by: Cornelia Huck <[email protected]> > --- [...] > diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt > new file mode 100644 > index 0000000..236428a > --- /dev/null > +++ b/docs/devel/s390-dasd-ipl.txt > @@ -0,0 +1,133 @@ > +***************************** > +***** s390 hardware IPL ***** > +***************************** > + > +The s390 hardware IPL process consists of the following steps. > + > +1. A READ IPL ccw is constructed in memory location 0x0. > + This ccw, by definition, reads the IPL1 record which is located on the > disk > + at cylinder 0 track 0 record 1. Note that the chain flag is on in this > ccw > + so when it is complete another ccw will be fetched and executed from > memory > + location 0x08. > + > +2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00. > + IPL1 data is 24 bytes in length and consists of the following pieces of > + information: [psw][read ccw][tic ccw]. When the machine executes the Read > + IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at > + location 0x0. Then the ccw program at 0x08 which consists of a read > + ccw and a tic ccw is automatically executed because of the chain flag > from > + the original READ IPL ccw. The read ccw will read the IPL2 data into > memory > + and the TIC (Tranfer In Channel) will transfer control to the channel
s/Tranfer/Transfer/ ? [...] > +********************************************************** > +***** How this all pertains to QEMU (and the kernel) ***** > +********************************************************** > + > +In theory we should merely have to do the following to IPL/boot a guest > +operating system from a DASD device: > + > +1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on. > +2. Execute channel program at 0x0. > +3. LPSW 0x0. > + > +However, our emulation of the machine's channel program logic within the > kernel > +is missing one key feature that is required for this process to work: > +non-prefetch of ccw data. > + > +When we start a channel program we pass the channel subsystem parameters via > an > +ORB (Operation Request Block). One of those parameters is a prefetch bit. If > the > +bit is on then the vfio-ccw kernel driver is allowed to read the entire > channel > +program from guest memory before it starts executing it. This means that any > +channel commands that read additional channel commands will not work as > expected > +because the newly read commands will only exist in guest memory and NOT > within > +the kernel's channel subsystem memory. The kernel vfio-ccw driver currently > +requires this bit to be on for all channel programs. This is a problem > because > +the IPL process consists of transferring control from the "Read IPL" ccw > +immediately to the IPL1 channel program that was read by "Read IPL". > + > +Not being able to turn off prefetch will also prevent the TIC at the end of > the > +IPL1 channel program from transferring control to the IPL2 channel program. > + > +Lastly, in some cases (the zipl bootloader for example) the IPL2 program also > +tansfers control to another channel program segment immediately after > reading it s/tansfers/transfers/ > +from the disk. So we need to be able to handle this case. > + > +************************** > +***** What QEMU does ***** > +************************** > + > +Since we are forced to live with prefetch we cannot use the very simple IPL > +procedure we defined in the preceding section. So we compensate by doing the > +following. > + > +1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit. > +2. Execute "Read IPL" at 0x0. > + > + So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08. > + > +4. Write a custom channel program that will seek to the IPL2 record and then > + execute the READ and TIC ccws from IPL1. Normamly the seek is not > required s/Normamly/Normally/ [...] > diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c > new file mode 100644 > index 0000000..1a44469 > --- /dev/null > +++ b/pc-bios/s390-ccw/dasd-ipl.c > @@ -0,0 +1,249 @@ > +/* > + * S390 IPL (boot) from a real DASD device via vfio framework. > + * > + * Copyright (c) 2019 Jason J. Herne <[email protected]> > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or (at > + * your option) any later version. See the COPYING file in the top-level > + * directory. > + */ > + > +#include "libc.h" > +#include "s390-ccw.h" > +#include "s390-arch.h" > +#include "dasd-ipl.h" > +#include "helper.h" > + > +static char prefix_page[PAGE_SIZE * 2] > + __attribute__((__aligned__(PAGE_SIZE * 2))); > + > +static void enable_prefixing(void) > +{ > + memcpy(&prefix_page, (void *)0, 4096); You could use the "lowcore" variable from s390-arch.h here instead of "(void *)0", I guess. > + set_prefix(ptr2u32(&prefix_page)); > +} > + > +static void disable_prefixing(void) > +{ > + set_prefix(0); > + /* Copy io interrupt info back to low core */ > + memcpy((void *)0xB8, prefix_page + 0xB8, 12); Maybe use &lowcore->subchannel_id instead of 0xB8 ? ... not sure whether that's nicer here, though... > +} > + > +static bool is_read_tic_ccw_chain(Ccw0 *ccw) > +{ > + Ccw0 *next_ccw = ccw + 1; > + > + return ((ccw->cmd_code == CCW_CMD_DASD_READ || > + ccw->cmd_code == CCW_CMD_DASD_READ_MT) && > + ccw->chain && next_ccw->cmd_code == CCW_CMD_TIC); > +} > + > +static bool dynamic_cp_fixup(uint32_t ccw_addr, uint32_t *next_cpa) > +{ > + Ccw0 *cur_ccw = (Ccw0 *)(uint64_t)ccw_addr; > + Ccw0 *tic_ccw; > + > + while (true) { > + /* Skip over inline TIC (it might not have the chain bit on) */ > + if (cur_ccw->cmd_code == CCW_CMD_TIC && > + cur_ccw->cda == ptr2u32(cur_ccw) - 8) { > + cur_ccw += 1; > + continue; > + } > + > + if (!cur_ccw->chain) { > + break; > + } > + if (is_read_tic_ccw_chain(cur_ccw)) { > + /* > + * Breaking a chain of CCWs may alter the semantics or even the > + * validity of a channel program. The heuristic implemented below > + * seems to work well in practice for the channel programs > + * generated by zipl. > + */ > + tic_ccw = cur_ccw + 1; > + *next_cpa = tic_ccw->cda; > + cur_ccw->chain = 0; > + return true; > + } > + cur_ccw += 1; > + } > + return false; > +} > + > +static int run_dynamic_ccw_program(SubChannelId schid, uint16_t cutype, > + uint32_t cpa) > +{ > + bool has_next; > + uint32_t next_cpa = 0; > + int rc; > + > + do { > + has_next = dynamic_cp_fixup(cpa, &next_cpa); > + > + print_int("executing ccw chain at ", cpa); > + enable_prefixing(); > + rc = do_cio(schid, cutype, cpa, CCW_FMT0); > + disable_prefixing(); > + > + if (rc) { > + break; > + } > + cpa = next_cpa; > + } while (has_next); > + > + return rc; > +} > + > +static void make_readipl(void) > +{ > + Ccw0 *ccwIplRead = (Ccw0 *)0x00; > + > + /* Create Read IPL ccw at address 0 */ > + ccwIplRead->cmd_code = CCW_CMD_READ_IPL; > + ccwIplRead->cda = 0x00; /* Read into address 0x00 in main memory */ > + ccwIplRead->chain = 0; /* Chain flag */ > + ccwIplRead->count = 0x18; /* Read 0x18 bytes of data */ > +} > + > +static void run_readipl(SubChannelId schid, uint16_t cutype) > +{ > + if (do_cio(schid, cutype, 0x00, CCW_FMT0)) { > + panic("dasd-ipl: Failed to run Read IPL channel program\n"); > + } > +} > + > +/* > + * The architecture states that IPL1 data should consist of a psw followed by > + * format-0 READ and TIC CCWs. Let's sanity check. > + */ > +static void check_ipl1(void) > +{ > + Ccw0 *ccwread = (Ccw0 *)0x08; > + Ccw0 *ccwtic = (Ccw0 *)0x10; > + > + if (ccwread->cmd_code != CCW_CMD_DASD_READ || > + ccwtic->cmd_code != CCW_CMD_TIC) { > + panic("dasd-ipl: IPL1 data invalid. Is this disk really > bootable?\n"); > + } > +} > + > +static void check_ipl2(uint32_t ipl2_addr) > +{ > + Ccw0 *ccw = u32toptr(ipl2_addr); > + > + if (ipl2_addr == 0x00) { > + panic("IPL2 address invalid. Is this disk really bootable?\n"); > + } > + if (ccw->cmd_code == 0x00) { > + panic("IPL2 ccw data invalid. Is this disk really bootable?\n"); > + } > +} > + > +static uint32_t read_ipl2_addr(void) > +{ > + Ccw0 *ccwtic = (Ccw0 *)0x10; > + > + return ccwtic->cda; > +} > + > +static void ipl1_fixup(void) > +{ > + Ccw0 *ccwSeek = (Ccw0 *) 0x08; > + Ccw0 *ccwSearchID = (Ccw0 *) 0x10; > + Ccw0 *ccwSearchTic = (Ccw0 *) 0x18; > + Ccw0 *ccwRead = (Ccw0 *) 0x20; > + CcwSeekData *seekData = (CcwSeekData *) 0x30; > + CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38; > + > + /* move IPL1 CCWs to make room for CCWs needed to locate record 2 */ > + memcpy(ccwRead, (void *)0x08, 16); lowcore->ccw1 ? > + /* Disable chaining so we don't TIC to IPL2 channel program */ > + ccwRead->chain = 0x00; > + > + ccwSeek->cmd_code = CCW_CMD_DASD_SEEK; > + ccwSeek->cda = ptr2u32(seekData); > + ccwSeek->chain = 1; > + ccwSeek->count = sizeof(*seekData); > + seekData->reserved = 0x00; > + seekData->cyl = 0x00; > + seekData->head = 0x00; > + > + ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ; > + ccwSearchID->cda = ptr2u32(searchData); > + ccwSearchID->chain = 1; > + ccwSearchID->count = sizeof(*searchData); > + searchData->cyl = 0; > + searchData->head = 0; > + searchData->record = 2; > + > + /* Go back to Search CCW if correct record not yet found */ > + ccwSearchTic->cmd_code = CCW_CMD_TIC; > + ccwSearchTic->cda = ptr2u32(ccwSearchID); > +} > + > +static void run_ipl1(SubChannelId schid, uint16_t cutype) > + { > + uint32_t startAddr = 0x08; > + > + if (do_cio(schid, cutype, startAddr, CCW_FMT0)) { > + panic("dasd-ipl: Failed to run IPL1 channel program\n"); > + } > +} > + > +static void run_ipl2(SubChannelId schid, uint16_t cutype, uint32_t addr) > +{ > + if (run_dynamic_ccw_program(schid, cutype, addr)) { > + panic("dasd-ipl: Failed to run IPL2 channel program\n"); > + } > +} > + > +static void lpsw(void *psw_addr) > +{ > + PSWLegacy *pswl = (PSWLegacy *) psw_addr; > + > + pswl->mask |= PSW_MASK_EAMODE; /* Force z-mode */ > + pswl->addr |= PSW_MASK_BAMODE; > + asm volatile(" llgtr 0,0\n llgtr 1,1\n" /* Some OS's expect to be */ > + " llgtr 2,2\n llgtr 3,3\n" /* in 32-bit mode. Clear */ > + " llgtr 4,4\n llgtr 5,5\n" /* high part of regs to */ > + " llgtr 6,6\n llgtr 7,7\n" /* avoid messing up */ > + " llgtr 8,8\n llgtr 9,9\n" /* instructions that work */ > + " llgtr 10,10\n llgtr 11,11\n" /* in both addressing */ > + " llgtr 12,12\n llgtr 13,13\n" /* modes, like servc. */ > + " llgtr 14,14\n llgtr 15,15\n" > + " lpsw %0\n" > + : : "Q" (*pswl) : "cc"); > +} Have you tried to use jump_to_low_kernel() already? ... it might be cleaner to do the diag 0x308 reset here, too, to avoid that some part of the machine is in an unexpected state... Thomas
