Hi Maciej, > > How can one reasonably prevent the bug when compiling a whole Linux > > distribution with thousands of packages if GAS merely warns and proceeds > > in many cases? > > I think the tools must not set a policy. By using `.set noreorder' the > user told the toolchain he knows better and asked to keep its hands away. > > As I say you can use `-Werror' for code auditing. And in most cases > manually scheduled delay slots in handcoded assembly are a sign of a > problem with the piece of code affected, regardless of the R5900 erratum.
Well, it seems practical to use unofficial tools and a patched GAS to fix this assembly bug, then. It's a grave problem for the R5900 and it needs to be reliably detected. > > > > [ In theory, GAS could actually insert NOPs prior to the noreorder > > > > directive, > > > > to make the loop longer that six instructions, but GAS does not have > > > > that > > > > kind of capability. Another option that GCC prevents is to place a NOP > > > > in > > > > the delay slot. ] > > > > > > Well, GAS does have that capability, although of course it is not > > > enabled > > > for `noreorder' code. > > > > Hmm? I'm unable to make sense of that, since such NOPs are unaffected by > > noreorder. This is what I meant: > > > > loop: addiu $5,$5,1 > > addiu $4,$4,1 > > lb $2,-1($5) > > nop /* These two NOPs would prevent the */ > > nop /* bug while preserving noreorder. */ > > .set noreorder > > .set nomacro > > bne $2,$3,loop > > sb $2,-1($4) > > .set macro > > .set reorder > > See `nops_for_insn'. However again, as I noted, `.set noreorder' tells > GAS not to analyse the dependencies for code within. There is no need to > schedule this delay slot manually, and if this is generated code, then the > producer (GCC) should have taken care of the hazards, be it architectural > or errata. If this is manually written code, then the author asked for > trouble. I'm using the principle above to unobtrusively adjust problematic kernel code, via a short_loop_war macro. Here is one patched instance: --- a/arch/mips/boot/compressed/head.S +++ b/arch/mips/boot/compressed/head.S @@ -13,6 +13,7 @@ */ #include <asm/asm.h> +#include <asm/asmmacro.h> #include <asm/regdef.h> .set noreorder @@ -29,6 +30,7 @@ start: PTR_LA a0, _edata PTR_LA a2, _end 1: sw zero, 0(a0) + short_loop_war(1b) bne a2, a0, 1b addiu a0, a0, 4 @@ -48,6 +50,7 @@ start: jr k0 nop 3: + short_loop_war(3b) b 3b nop END(start) The short_loop_war macro is designed to be placed just before the branch, and it inserts NOPs as necessary for the R5900: #ifdef CONFIG_CPU_R5900 .macro short_loop_war loop_target .set instruction_count,2 + (. - \loop_target) / 4 .ifgt 7 - instruction_count .rept 7 - instruction_count nop .endr .endif .endm #else .macro short_loop_war loop_target .endm #endif > Well, BogoMIPS is just an arbitrary number. So presumably the noreorder BogoMIPS variant can be replaced with a single reorder variant that works with all MIPS implementations? > There is a data dependency here, so manual delay slot scheduling was > unavoidable. I'd suggest using this as a functional equivalent for the > R5900: > > addiu a0,a0,1 > 1: addiu a0,a0,-1 > bnez a0,1b > > and avoiding the irregularity for a0==0. Thanks! > > I used the suggested patch, and recompiled the kernel with it, and verified > > with the machine code tool that there were no undefined loops. Wouldn't that > > be enough? > > It would break if GAS was invoked without `-mfix-r5900' (e.g. by using an > explicit `-Wa,-mno-fix-r5900' option). I see, hmm. > > I tried to limit the case to branch delays only, which is a rough > > approximation. Call and jump delay slots are acceptable. Are you referring > > to this piece? > > > > ;; Can the instruction be put into a delay slot? > > (define_attr "can_delay" "no,yes" > > (if_then_else (and (eq_attr "type" "!branch,call,jump") > > (eq_attr "hazard" "none") > > (match_test "get_attr_insn_count (insn) == 1")) > > (const_string "yes") > > (const_string "no"))) > > Yes. My suggestion does not preclude limiting the workaround to branches > only while being precise about what the situation is, i.e. branches still > require a delay slot instruction (unlike MIPS16 or uMIPS compact branches) > but no real instruction is suitable. I'd prefer if GCC actually scheduled > the required dummy NOP instruction explicitly. OK. Fredrik