On Saturday 18 June 2011, Nicolas Pitre wrote:
> On Fri, 17 Jun 2011, Arnd Bergmann wrote:

> I played with this a bit. 

It looks like much more than a bit, thanks for the detailed analysis!

> I think that GCC might have improved on this 
> front.  It at least doesn't seem to produce the much suboptimal code 
> compared to the plain C implementation as it used to do in the past.  
> See my explanation and experiments below.

If everything else fails, we can make it depend on the gcc version and
e.g. use the inline assembly with 4.5 and higher, while we use the C code
that used to be fine with older versions.

> > The function in question is 
> > 
> > #define __raw_writel(v,a) (*(volatile unsigned int __force   *)(a) = (v))
> > 
> > AFAICT, this should really be written as
> > 
> > static inline void __raw_writel(unsigned int v, volatile __iomem unsigned 
> > int *a)
> > {
> >     asm volatile("str %0, %1" : "=o" (*a) : "r" (v) : "memory");
> > }
> > 
> > What I don't know is whether this form is ideal in all cases (or even 
> > correct),
> > and what the constraints should be. 
> 
> The problem here is that the assembly constraint doesn't let us specify 
> which type of access this constraint is for.
> 
> Let's take the str instruction as an example.  It may use many different 
> addressing modes:
> 
>       str     r0, [r1]        # destination address in r1
>       str     r0, [r1, r2]    # base address in r1 with offset in r2
>       str     r0, [r1, #<i>]  # base address in r1 with an immediate offset 

I believe that they are all part of the "o" constraint, which includes all
offsettable memory addresses.

> The result is the same as gcc is smart enough to share the common iobase 
> and spread the immediate offsets. This also provides better type 
> checking.  But last time this was attempted, a large amount of code was 
> then complaining due to the fact that sometimes the address is provided 
> as a pointer, sometimes as an integer.  this deserves to be cleaned, but 
> this should really be orthogonal to this issue and the move to a static 
> inline be done first, and separately from the actual access 
> implementation.

Good point. A lot of architectures have strict type checking here already,
so I don't think this will be much of a problem.

>  Looks like GCC is doing the right thing here.  
> I can't say for sure, but I have a vague recollection of GCC doing it 
> wrong in some distant past.  At the time the only thing that seemed to 
> be OK in all cases with an inline asm implementation was:
> 
> static inline void writeh(unsigned int v, volatile unsigned short  *a)
> {
>         asm volatile ("strh\t%1, [%0]" : : "r" (a), "r" (v));
> }

This would better use the '"=Q" (a)' form, to tell the compiler that
we are also writing into a. We cannot use the "=m" form because that
contains the pre/post-increment variants though.

> which produced this:
> 
> foobar_init:
>         mov     r1, #0
>         strh    r1, [r0]
>         mov     ip, #0
>         add     r2, r0, #8
>         strh    ip, [r2]
>         add     r3, r0, #2048
>         strh    ip, [r3]
>         mvn     r2, #0
>         add     r1, r0, #4
>         strh    r2, [r1]
>         mvn     r3, #0
>         add     r0, r0, #12
>         strh    r3, [r0]
>         bx      lr
> 
> As you can see, this is far less optimal and larger.  And then GCC even 
> gets confused wrt liveness of the value to store despite using -O2.  
> Hence the implementation using pure C which produced far better code.

Yes, it would be good to avoid this.

> > If I understood Uli correctly, it should be possible to express this 
> > with gcc syntax, but of course there may be a catch somewhere.
> 
> Yes.  You don't want to use a memory clobber.  That tells GCC to forget 
> everything about what it has cached into registers and forces it to 
> reload everything from memory, including the iobase pointer if it comes 
> from a struct device's private data.

Well, in case of readl/writel, we need the memory clobber for other reasons:
They might trigger a DMA operation, or the readl might be used to wait for
completion of a DMA operation. In either case, you need to have a barrier
to prevent the reordering of the MMIO instruction w.r.t the DMA memory that
it is referring to. On in-order CPUs, this does not always require a
synchronisation instruction, but it strill requires a compiler barrier,
which is typically expressed as a full memory clobber.

As far as I can tell, we don't need the memory clobber in case of
readl_relaxed()/writel_relaxed(), but we need it for the non-relaxed
versions.

> As you can see, the iobase value is reloaded all the time needlessly.  
> Another recollection of mine is that a while ago GCC was considering 
> static inline functions like full memory barriers too, which doesn't 
> seem to be the case anymore.
> 
> But for the constraints the following should be best:
> 
> static inline void writel(unsigned int v, volatile unsigned int *a)
> {
>         asm volatile ("str\t%1, %0" : : "+o" (*a) : "r" (v));
> }
> 
> static inline unsigned int readl(volatile unsigned int *a)
> {
>       unsigned int v;
>         asm volatile ("ldr\t%1, %0" : : "+o" (*a), "=r" (v));
>       return v;
> }
> 
> The bidirectional qualifier indicates that we're going to modify the 
> memory pointed by a, and yet it is going to be different after that, 
> which pretty much matches how hardware registers behave.

I don't see how the "+" is needed here, because the compiler won't
be accessing the pointers otherwise, and doesn't have the option of
omitting any accesses either.

        Arnd

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to