Re: [RFC] timers, pointers to functions and type safety

2006-12-02 Thread Arnd Bergmann
On Friday 01 December 2006 18:21, Al Viro wrote:
>  There is a tempting
> possibility to do that gradually, with all intermediates still in working
> state, provided that on all platforms currently supported by the kernel
> unsigned long and void * are indeed passed the same way (again, only for
> current kernel targets and existing gcc versions).  Namely, we could
> * introduce SETUP_TIMER variant with cast to void (*)(unsigned long)
> * switch to its use, converting callback instances to take pointers
> at the same time.  That would be done in reasonably sized groups.
> * once it's over, flip ->function to void (*)(void *), ->data to
> void * and update SETUP_TIMER accordingly; deal with remaining few instances
> of ->function.

Another alternative might be to define an intermediate version of
timer_list that avoids adding new macros, like

struct timer_list {
struct list_head entry;
­   unsigned long expires;

­   void (*function)(union { unsigned long l; void *p; }
 __attribute__((transparent_union)));
­   union {
unsigned long data __deprecated;
void *obj;
};

­   struct tvec_t_base_s *base;
};

Unfortunately, this relies more on gcc specific behavior than we
may want, and it makes it possible to abuse in new ways, like defining
a function to take an unsigned long argument, but passing the data
as void *obj.

Arnd <><


ARM unaligned MMIO access with attribute((packed))

2011-02-02 Thread Arnd Bergmann
As noticed by Peter Maydell, the EHCI device driver in Linux gets
miscompiled by some versions of arm-gcc (still need to find out which)
due to a combination of problems:

1. In include/linux/usb/ehci_def.h, struct ehci_caps is defined
with __attribute__((packed)), for no good reason. This is clearly
a bug and needs to get fixed, but other drivers have the same bug
and it used to work. The attribute forces byte access on all members
accessed through pointer dereference, which is not allowed on
MMIO accesses in general. The specific code triggering the problem
in Peter's case is in ehci-omap.c:
omap->ehci->regs = hcd->regs
+ HC_LENGTH(readl(&omap->ehci->caps->hc_capbase));


2. The ARM version of the readl() function is implemented as a macro
doing a direct pointer dereference with a typecast:

#define __raw_readl(a)  (__chk_io_ptr(a), *(volatile unsigned int 
__force   *)(a))
#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \
__raw_readl(__mem_pci(c))); __v; })
#define readl(c)({ u32 __v = readl_relaxed(c); __iormb(); __v; 
})

On other architectures, readl() is implemented using an inline assembly
specifically to prevent gcc from issuing anything but a single 32-bit
load instruction. readl() only makes sense on aligned memory, so in case
of a misaligned pointer argument, it should cause a trap anyway.

3. gcc does not seem to clearly define what happens during a cast between
aligned an packed pointers. In this case, the original pointer is packed
(byte aligned), while the access is done through a 32-bit aligned
volatile unsigned int pointer. In gcc-4.4, casting from "unsigned int
__attribute__((packed))" to "volatile unsigned int" resulted in a 32-bit
aligned access, while casting to "unsigned int" (without volatile) resulted
in four byte accesses. gcc-4.5 seems to have changed this to always do
a byte access in both cases, but still does not document the behavior.
(need to confirm this).

I would suggest fixing this by:

1. auditing all uses of __attribute__((packed)) in the Linux USB code
and other drivers, removing the ones that are potentially harmful.

2. Changing the ARM MMIO functions to use inline assembly instead of
direct pointer dereference.

3. Documenting the gcc behavior as undefined.

Other suggestions?

Arnd


Re: ARM unaligned MMIO access with attribute((packed))

2011-02-02 Thread Arnd Bergmann
On Wednesday 02 February 2011 17:37:02 Russell King - ARM Linux wrote:
> We used to use inline assembly at one point, but that got chucked out.
> The problem is that using asm() for this causes GCC to generate horrid
> code.
> 
> 1. there's no way to tell GCC that the inline assembly is a load
>instruction and therefore it needs to schedule the following
>instructions appropriately.
> 
> 2. GCC will needlessly reload pointers from structures and other such
>behaviour because it can't be told clearly what the inline assembly
>is doing, so the inline asm needs to have a "memory" clobber.
> 
> 3. It seems to misses out using the pre-index addressing, prefering to
>create add/sub instructions prior to each inline assembly load/store.
> 
> 4. There are no (documented) constraints in GCC to allow you to represent
>the offset format for the half-word instructions.
> 
> Overall, it means greater register pressure, more instructions, larger
> functions, greater instruction cache pressure, etc.

Another solution would be to declare the readl function extern and define
it out of line, but I assume that this would be at least as bad as an
inline assembly for all the points you brought up, right?

Would it be possible to add the proper constraints for defining readl
in an efficient way to a future version of gcc? That wouldn't help us
in the near future, but we could at some points use those in a number
of places.

Arnd


Re: ARM unaligned MMIO access with attribute((packed))

2011-02-03 Thread Arnd Bergmann
On Thursday 03 February 2011 00:08:01 Måns Rullgård wrote:
> > But you really need that memory clobber there whether you like it or
> > not, see above.
> 
> I don't know of any device where the side-effects are not explicitly
> indicated by other means in the code triggering them, so it probably
> is safe without the clobber as Russel says.

On configurations that have CONFIG_ARM_DMA_MEM_BUFFERABLE set, this is
definitely true, since they use the rmb() and wmb() that include
both an IO memory barrier instruction where required and a compiler barrier
(i.e. __asm__ __volatile__ ("" : : : "memory")):

8<-- from arch/arm/include/asm/io.h

#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#define __iormb()   rmb()
#define __iowmb()   wmb()
#else
#define __iormb()   do { } while (0)
#define __iowmb()   do { } while (0)
#endif

#define readb(c)({ u8  __v = readb_relaxed(c); __iormb(); __v; 
})
#define readw(c)({ u16 __v = readw_relaxed(c); __iormb(); __v; 
})
#define readl(c)({ u32 __v = readl_relaxed(c); __iormb(); __v; 
})

#define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); })
#define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); })
#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

8<---

Also, as Russell mentioned, anything using the streaming DMA mapping API
is fine because of the barriers included in the function calls there.

However, I would think that this fictional piece of code would be valid
for a possible PCI device driver (though inefficient) and not require
any additional synchronizing operations:

void foo_perform_operation(struct foo_dev *d, u32 in, u32 *out)
{
dma_addr_t dma_addr;
u32 *cpu_addr;

/* 
 * get memory from the consistent DMA mapping API, typically
 * uncached memory on ARM, but could be anywhere if the DMA
 * is coherent.
 */
cpu_addr = dma_alloc_coherent(&d->dev, sizeof (*cpu_addr),
&dma_addr, GFP_KERNEL);

/* lock the device, not required for the example, but normally
 * needed in practice for SMP operation.
 */
spin_lock(&d->lock);

/* initialize the DMA data */
*cpu_addr = in;

/*
 * send a posted 32 bit write to the device, triggering the
 * start of the DMA read from *cpu_addr, which is followed by
 * a DMA write to *cpu_addr. writel includes a barrier that
 * makes sure that the previous store to *cpu_addr is visible
 * to the DMA, but does not block until the completion like
 * outl() would.
 */
writel(dma_addr, d->mmio_addr);

/*
 * synchronize the outbound posted write, wait for the device
 * to complete the DMA and synchronize the DMA data on its
 * inbound path.
 */
(void)readl(d->mmio_addr);

/*
 * *cpu_addr contains data just written to by the device, and
 * the readl includes all the necessary barriers to ensure
 * it's really there when we access it.
 */
*out = *cpu_addr;

/* unlock the device */
spin_unlock(&d->lock);

/* free the DMA memory */
dma_free_coherent(&d->dev, sizeof (*cpu_addr), cpu_addr, dma_addr);
}

However, when readl contains no explicit or implicit synchronization, the
load from *cpu_addr might get pulled in front of the load from mmio_addr,
resulting in invalid output data. If this is the case, it is be a problem
on almost all architectures (not x86, powerpc or sparc64).

Am I missing something here?

Arnd


Re: ARM unaligned MMIO access with attribute((packed))

2011-02-03 Thread Arnd Bergmann
On Wednesday 02 February 2011, Russell King - ARM Linux wrote:
> On Wed, Feb 02, 2011 at 05:00:20PM +0100, Arnd Bergmann wrote:
> > I would suggest fixing this by:
> >
> > 2. Changing the ARM MMIO functions to use inline assembly instead of
> > direct pointer dereference.
>
> We used to use inline assembly at one point, but that got chucked out.
> The problem is that using asm() for this causes GCC to generate horrid
> code.

Here is an alternative suggestion, would that work?

8<---
arm: avoid unaligned arguments to MMIO functions

The readw/writew/readl/writel functions require aligned naturally arguments
which are accessed only using atomic load and store instructions, never
using byte or read-modify-write write accesses.

At least one driver (ehci-hcd) annotates its MMIO registers as
__attribute__((packed)), which causes some versions of gcc to generate
byte accesses due to an undefined behavior when casting a packed u32
to an aligned u32 variable.

There does not seem to be an efficient way to force gcc to do word
accesses, but we can detect the problem and refuse to build broken
code: This adds a check in these functions to ensure that their arguments
are either naturally aligned or they are void pointers.

Signed-off-by: Arnd Bergmann 
---

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 20e0f7c..b98f0bc 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -180,17 +180,36 @@ extern void _memset_io(volatile void __iomem *, int, 
size_t);
  * IO port primitives for more information.
  */
 #ifdef __mem_pci
+
+/*
+ * Ensure natural alignment of arguments:
+ * It is not allowed to pass a pointer to a packed variable into
+ * the readl/writel family of functions, because gcc may decide
+ * to create byte accesses that are illegal on multi-byte MMIO
+ * registers.
+ * A lot of code uses void pointers, which is fine.
+ */
+#define __checkalign(p) BUILD_BUG_ON(  \
+   !__builtin_types_compatible_p(__typeof__(p), void *) && \
+   (__alignof__ (*(p)) < sizeof (*(p
+
 #define readb_relaxed(c) ({ u8  __v = __raw_readb(__mem_pci(c)); __v; })
-#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16) \
-   __raw_readw(__mem_pci(c))); __v; })
-#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32) \
-   __raw_readl(__mem_pci(c))); __v; })
+#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16)\
+   __raw_readw(__mem_pci(c)));   \
+   __checkalign(c); __v; })
+#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32)\
+   __raw_readl(__mem_pci(c)));   \
+   __checkalign(c); __v; })
 
 #define writeb_relaxed(v,c)((void)__raw_writeb(v,__mem_pci(c)))
-#define writew_relaxed(v,c)((void)__raw_writew((__force u16) \
-   cpu_to_le16(v),__mem_pci(c)))
-#define writel_relaxed(v,c)((void)__raw_writel((__force u32) \
-   cpu_to_le32(v),__mem_pci(c)))
+#define writew_relaxed(v,c)do { (void)__raw_writew((__force u16) \
+   cpu_to_le16(v),__mem_pci(c)); \
+__checkalign(c); \
+   } while (0);
+#define writel_relaxed(v,c)do { (void)__raw_writel((__force u32) \
+   cpu_to_le32(v),__mem_pci(c)); \
+   __checkalign(c);  \
+   } while (0);
 
 #ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
 #define __iormb()  rmb()


Re: X32 psABI status

2011-02-13 Thread Arnd Bergmann
On Saturday 12 February 2011 20:41:01 H.J. Lu wrote:
> Hi,
> 
> We made lots of progresses on x32 pABI:
> 
> https://sites.google.com/site/x32abi/
> 
> 1. Kernel interface with syscall is close to be finalized.

Really? I haven't seen this being posted for review yet ;-)

The basic concept looks entirely reasonable to me, but I'm
curious what drove the decision to start out with the x86_64
system calls instead of the generic ones.

Since tile was merged, we now have support for compat syscalls
in the generic syscall ABI. I would have assumed that it
was possible to just use those if you decide to do a new
ABI in the first place.

The other option that would have appeared natural to me is
to just use the existing 32 bit compat ABI with the few
necessary changes done based on the personality.

Arnd


Re: X32 psABI status

2011-02-13 Thread Arnd Bergmann
On Sunday 13 February 2011, H. Peter Anvin wrote:
> The actual idea is to use the i386 compat ABI for memory layout, but
> with a 64-bit register convention.  That means that system calls that
> don't make references to memory structures can simply use the 64-bit
> system calls, otherwise we're planning to reuse the i386 compat system
> calls, but invoke them via the syscall instruction (which requires a new
> system call table) and to pass 64-bit arguments in single registers.

As far as I know, any task can already call both the 32 and 64 bit syscall
entry points on x86. Is there anything you can't do just as well by
using a combination of the two methods, without introducing a third one?

Arnd


Re: X32 psABI status

2011-02-13 Thread Arnd Bergmann
On Sunday 13 February 2011, H. Peter Anvin wrote:
> We prototyped using the int $0x80 system call entry point.  However,
> there are two disadvantages:
> 
> a. the int $0x80 instruction is much slower than syscall.  An actual
>i386 process can use the syscall instruction which is disambiguated
>by the CPU based on mode, but an x32 process is in the same CPU mode
>as a normal 64-bit process.

Well, you could simply change entry.S to allow syscall with high numbers
to have the same effect as int $0x80, but not introduce another table
to solve this.

> b. 64-bit arguments have to be split between two registers for the
>i386 entry points, requiring user-space stubs.

64 bit arguments are very rare, and most of those syscalls are not
performance critical, so this could be dealt with on a case-by-case
basis, possibly by introducing a new syscall number for the variant
passing a 64 bit register.

> All in all, the cost of an extra system call table is quite modest.

The memory size overhead may be small, but auditing another table
for every change could become a noticable burden (your though, not mine).

> The cost of an entire different ABI layer (supporting a new memory layout)
> would be enormous, a.k.a. "not worth it", which is why the memory layout
> of kernel objects needs to be compatible with i386.

Right, this makes sense, you certainly can't redefine all the data
structures. 

What would probably be a good idea is to compare the set of syscalls
in X32 and asm-generic, and to either eliminate or document the
differences. You can probably even take the asm-generic syscall numbers,
even if you keep the i386 data structures.

Arnd


Re: ARM unaligned MMIO access with attribute((packed))

2011-04-27 Thread Arnd Bergmann
On Wednesday 27 April 2011 18:25:40 Alan Stern wrote:
> On Wed, 27 Apr 2011, Rabin Vincent wrote:
> 
> > On Wed, Apr 27, 2011 at 00:21, Alan Stern  wrote:
> > > On Tue, 26 Apr 2011, Rabin Vincent wrote:
> > >> In my case it's this writel() in ehci-hub.c that gets chopped into
> > >> strbs:
> > >>
> > >> � � � /* force reset to complete */
> > >> � � � ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET),
> > >> � � � � � � � � � � � � � � � status_reg);
> > >
> > > Why would that get messed up? �The status_reg variable doesn't have any
> > > __atribute__((packed)) associated with it.
> > 
> > The initialization of status_reg is:
> > 
> >   u32 __iomem *status_reg
> >   = &ehci->regs->port_status[(wIndex & 0xff) - 1];
> > 
> > where ehci->regs is a pointer to the packed struct ehci_regs.  So, this
> > is the same problem of casting pointers to stricter alignment.
> 
> Right.  I can understand the compiler complaining about the cast to 
> stricter alignment during the initialization.  But I don't understand 
> why that would affect the code generated for the writel function.

The compiler does not complain, it just silently assumes that it needs
to do byte accesses. There is no way to tell the compiler to ignore
what it knows about the alignment, other than using inline assembly
for the actual pointer dereference. Most architectures today do that,
but on ARM it comes down to "*(u32 *)status_reg = temp".

Arnd


Re: ARM unaligned MMIO access with attribute((packed))

2011-04-28 Thread Arnd Bergmann
On Thursday 28 April 2011, Alan Stern wrote:
> > The compiler does not complain, it just silently assumes that it needs
> > to do byte accesses. There is no way to tell the compiler to ignore
> > what it knows about the alignment, other than using inline assembly
> > for the actual pointer dereference. Most architectures today do that,
> > but on ARM it comes down to "*(u32 *)status_reg = temp".
> 
> Ah -- so the compiler associates the alignment attribute with the data 
> value and not with the variable's type?  I didn't know that.

The behavior here is unspecified because the underlying typecase
is not valid. Gcc apparently uses some heuristics trying to do
the right thing, and in recent versions that heuristic seems to
have changed.

Arnd


Re: X32 project status update

2011-05-21 Thread Arnd Bergmann
On Saturday 21 May 2011 17:01:33 H.J. Lu wrote:
> This is the x32 project status update:
> 
> https://sites.google.com/site/x32abi/
> 

I've had another look at the kernel patch. It basically
looks all good, but the system call table appears to
diverge from the x86_64 list for no (documented) reason,
in the calls above 302. Is that intentional?

I can see why you might want to keep the numbers identical,
but if they are already different, why not use the generic
system call table from asm-generic/unistd.h for the new
ABI?

Arnd


Re: [RFC patch] spindep: add cross cache lines checking

2012-03-07 Thread Arnd Bergmann
On Wednesday 07 March 2012, Alex Shi wrote:

> Understand. thx. So is the following checking that your wanted?
> ===
> diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
> index bc2994e..64828a3 100644
> --- a/include/linux/rwlock.h
> +++ b/include/linux/rwlock.h
> @@ -21,10 +21,12 @@
>  do { \
>   static struct lock_class_key __key; \
>   \
> + BUILD_BUG_ON(__alignof__(lock) == 1);   \
>   __rwlock_init((lock), #lock, &__key);   \
>  } while (0)
>  #else
>  # define rwlock_init(lock)   \
> + BUILD_BUG_ON(__alignof__(lock) == 1);   \
>   do { *(lock) = __RW_LOCK_UNLOCKED(lock); } while (0)
>  #endif

I think the check should be (__alignof__(lock) < __alignof__(rwlock_t)),
otherwise it will still pass when you have structure with 
attribute((packed,aligned(2)))

> 1, it is alignof bug for default gcc on my fc15 and Ubuntu 11.10 etc?
> 
> struct sub {
> int  raw_lock;
> char a;
> };
> struct foo {
> struct sub z;
> int slk;
> char y;
> }__attribute__((packed));
> 
> struct foo f1;
> 
> __alignof__(f1.z.raw_lock) is 4, but its address actually can align on
> one byte. 

That looks like correct behavior, because the alignment of raw_lock inside of
struct sub is still 4. But it does mean that there can be cases where the
compile-time check is not sufficient, so we might want the run-time check
as well, at least under some config option.

Arnd


Re: Deprecation/removal of nios2 target support

2024-04-18 Thread Arnd Bergmann via Gcc
On Thu, Apr 18, 2024, at 17:44, Joseph Myers wrote:
> On Wed, 17 Apr 2024, Sandra Loosemore wrote:
>
>> Therefore I'd like to mark Nios II as obsolete in GCC 14 now, and remove
>> support from all toolchain components after the release is made.  I'm not 
>> sure
>> there is an established process for obsoleting/removing support in other
>> components; besides binutils, GDB, and GLIBC, there's QEMU, newlib/libgloss,
>> and the Linux kernel.  But, we need to get the ball rolling somewhere.
>
> CC:ing Arnd Bergmann regarding the obsolescence in the Linux kernel.

We have not yet marked nios2 as deprecated in the kernel, but that
is mostly because the implementation does not get in the way too
much and Dinh Nguyen is still around as a maintainer and merging
bugfixes.

Almost all nios2 kernel changes I see in the past decade have been
done blindly without testing on hardware, either for treewide
changes, or by code inspection. The only notable exceptions I could
find are from Andreas Oetken and Bernd Weiberg at Siemens and
from Marek Vasut (all added to Cc in case they have something to add).

We should probably remove nios2 from the kernel in the near future,
but even if we decide not to, I think deprecating it from gcc is the
right idea: If there are a few remaining users that still plan
to update their kernels, gcc-14 will still be able to build new
kernels for several years.

 Arnd