Hi Maciej,

> > How can one reasonably prevent the bug when compiling a whole Linux
> > distribution with thousands of packages if GAS merely warns and proceeds
> > in many cases?
> 
>  I think the tools must not set a policy.  By using `.set noreorder' the 
> user told the toolchain he knows better and asked to keep its hands away.
> 
>  As I say you can use `-Werror' for code auditing.  And in most cases 
> manually scheduled delay slots in handcoded assembly are a sign of a 
> problem with the piece of code affected, regardless of the R5900 erratum.

Well, it seems practical to use unofficial tools and a patched GAS to fix
this assembly bug, then. It's a grave problem for the R5900 and it needs to
be reliably detected.

> > > > [ In theory, GAS could actually insert NOPs prior to the noreorder 
> > > > directive,
> > > > to make the loop longer that six instructions, but GAS does not have 
> > > > that
> > > > kind of capability. Another option that GCC prevents is to place a NOP 
> > > > in
> > > > the delay slot. ]
> > > 
> > >  Well, GAS does have that capability, although of course it is not 
> > > enabled 
> > > for `noreorder' code.
> > 
> > Hmm? I'm unable to make sense of that, since such NOPs are unaffected by
> > noreorder. This is what I meant:
> > 
> > loop:       addiu   $5,$5,1
> >     addiu   $4,$4,1
> >     lb      $2,-1($5)
> >     nop                     /* These two NOPs would prevent the */
> >     nop                     /* bug while preserving noreorder. */
> >     .set    noreorder
> >     .set    nomacro
> >     bne     $2,$3,loop
> >     sb      $2,-1($4)
> >     .set    macro
> >     .set    reorder
> 
>  See `nops_for_insn'.  However again, as I noted, `.set noreorder' tells 
> GAS not to analyse the dependencies for code within.  There is no need to 
> schedule this delay slot manually, and if this is generated code, then the 
> producer (GCC) should have taken care of the hazards, be it architectural 
> or errata.  If this is manually written code, then the author asked for 
> trouble.

I'm using the principle above to unobtrusively adjust problematic kernel
code, via a short_loop_war macro. Here is one patched instance:

--- a/arch/mips/boot/compressed/head.S
+++ b/arch/mips/boot/compressed/head.S
@@ -13,6 +13,7 @@
  */
 
 #include <asm/asm.h>
+#include <asm/asmmacro.h>
 #include <asm/regdef.h>
 
        .set noreorder
@@ -29,6 +30,7 @@ start:
        PTR_LA  a0, _edata
        PTR_LA  a2, _end
 1:     sw      zero, 0(a0)
+       short_loop_war(1b)
        bne     a2, a0, 1b
         addiu  a0, a0, 4
 
@@ -48,6 +50,7 @@ start:
        jr      k0
         nop
 3:
+       short_loop_war(3b)
        b       3b
         nop
        END(start)

The short_loop_war macro is designed to be placed just before the branch,
and it inserts NOPs as necessary for the R5900:

#ifdef CONFIG_CPU_R5900
        .macro  short_loop_war loop_target
        .set    instruction_count,2 + (. - \loop_target) / 4
        .ifgt   7 - instruction_count
        .rept   7 - instruction_count
        nop
        .endr
        .endif
        .endm
#else
        .macro  short_loop_war loop_target
        .endm
#endif

>  Well, BogoMIPS is just an arbitrary number.

So presumably the noreorder BogoMIPS variant can be replaced with a
single reorder variant that works with all MIPS implementations?

>  There is a data dependency here, so manual delay slot scheduling was 
> unavoidable.  I'd suggest using this as a functional equivalent for the 
> R5900:
> 
>       addiu   a0,a0,1
> 1:    addiu   a0,a0,-1
>       bnez    a0,1b
> 
> and avoiding the irregularity for a0==0.

Thanks!

> > I used the suggested patch, and recompiled the kernel with it, and verified
> > with the machine code tool that there were no undefined loops. Wouldn't that
> > be enough?
> 
>  It would break if GAS was invoked without `-mfix-r5900' (e.g. by using an 
> explicit `-Wa,-mno-fix-r5900' option).

I see, hmm.

> > I tried to limit the case to branch delays only, which is a rough
> > approximation. Call and jump delay slots are acceptable. Are you referring
> > to this piece?
> > 
> > ;; Can the instruction be put into a delay slot?
> > (define_attr "can_delay" "no,yes"
> >   (if_then_else (and (eq_attr "type" "!branch,call,jump")
> >                  (eq_attr "hazard" "none")
> >                  (match_test "get_attr_insn_count (insn) == 1"))
> >             (const_string "yes")
> >             (const_string "no")))
> 
>  Yes.  My suggestion does not preclude limiting the workaround to branches 
> only while being precise about what the situation is, i.e. branches still 
> require a delay slot instruction (unlike MIPS16 or uMIPS compact branches) 
> but no real instruction is suitable.  I'd prefer if GCC actually scheduled 
> the required dummy NOP instruction explicitly.

OK.

Fredrik

Reply via email to