Hi Maciej,
> > How can one reasonably prevent the bug when compiling a whole Linux
> > distribution with thousands of packages if GAS merely warns and proceeds
> > in many cases?
>
> I think the tools must not set a policy. By using `.set noreorder' the
> user told the toolchain he knows better and asked to keep its hands away.
>
> As I say you can use `-Werror' for code auditing. And in most cases
> manually scheduled delay slots in handcoded assembly are a sign of a
> problem with the piece of code affected, regardless of the R5900 erratum.
Well, it seems practical to use unofficial tools and a patched GAS to fix
this assembly bug, then. It's a grave problem for the R5900 and it needs to
be reliably detected.
> > > > [ In theory, GAS could actually insert NOPs prior to the noreorder
> > > > directive,
> > > > to make the loop longer that six instructions, but GAS does not have
> > > > that
> > > > kind of capability. Another option that GCC prevents is to place a NOP
> > > > in
> > > > the delay slot. ]
> > >
> > > Well, GAS does have that capability, although of course it is not
> > > enabled
> > > for `noreorder' code.
> >
> > Hmm? I'm unable to make sense of that, since such NOPs are unaffected by
> > noreorder. This is what I meant:
> >
> > loop: addiu $5,$5,1
> > addiu $4,$4,1
> > lb $2,-1($5)
> > nop /* These two NOPs would prevent the */
> > nop /* bug while preserving noreorder. */
> > .set noreorder
> > .set nomacro
> > bne $2,$3,loop
> > sb $2,-1($4)
> > .set macro
> > .set reorder
>
> See `nops_for_insn'. However again, as I noted, `.set noreorder' tells
> GAS not to analyse the dependencies for code within. There is no need to
> schedule this delay slot manually, and if this is generated code, then the
> producer (GCC) should have taken care of the hazards, be it architectural
> or errata. If this is manually written code, then the author asked for
> trouble.
I'm using the principle above to unobtrusively adjust problematic kernel
code, via a short_loop_war macro. Here is one patched instance:
--- a/arch/mips/boot/compressed/head.S
+++ b/arch/mips/boot/compressed/head.S
@@ -13,6 +13,7 @@
*/
#include <asm/asm.h>
+#include <asm/asmmacro.h>
#include <asm/regdef.h>
.set noreorder
@@ -29,6 +30,7 @@ start:
PTR_LA a0, _edata
PTR_LA a2, _end
1: sw zero, 0(a0)
+ short_loop_war(1b)
bne a2, a0, 1b
addiu a0, a0, 4
@@ -48,6 +50,7 @@ start:
jr k0
nop
3:
+ short_loop_war(3b)
b 3b
nop
END(start)
The short_loop_war macro is designed to be placed just before the branch,
and it inserts NOPs as necessary for the R5900:
#ifdef CONFIG_CPU_R5900
.macro short_loop_war loop_target
.set instruction_count,2 + (. - \loop_target) / 4
.ifgt 7 - instruction_count
.rept 7 - instruction_count
nop
.endr
.endif
.endm
#else
.macro short_loop_war loop_target
.endm
#endif
> Well, BogoMIPS is just an arbitrary number.
So presumably the noreorder BogoMIPS variant can be replaced with a
single reorder variant that works with all MIPS implementations?
> There is a data dependency here, so manual delay slot scheduling was
> unavoidable. I'd suggest using this as a functional equivalent for the
> R5900:
>
> addiu a0,a0,1
> 1: addiu a0,a0,-1
> bnez a0,1b
>
> and avoiding the irregularity for a0==0.
Thanks!
> > I used the suggested patch, and recompiled the kernel with it, and verified
> > with the machine code tool that there were no undefined loops. Wouldn't that
> > be enough?
>
> It would break if GAS was invoked without `-mfix-r5900' (e.g. by using an
> explicit `-Wa,-mno-fix-r5900' option).
I see, hmm.
> > I tried to limit the case to branch delays only, which is a rough
> > approximation. Call and jump delay slots are acceptable. Are you referring
> > to this piece?
> >
> > ;; Can the instruction be put into a delay slot?
> > (define_attr "can_delay" "no,yes"
> > (if_then_else (and (eq_attr "type" "!branch,call,jump")
> > (eq_attr "hazard" "none")
> > (match_test "get_attr_insn_count (insn) == 1"))
> > (const_string "yes")
> > (const_string "no")))
>
> Yes. My suggestion does not preclude limiting the workaround to branches
> only while being precise about what the situation is, i.e. branches still
> require a delay slot instruction (unlike MIPS16 or uMIPS compact branches)
> but no real instruction is suitable. I'd prefer if GCC actually scheduled
> the required dummy NOP instruction explicitly.
OK.
Fredrik