date:20120202

RE: Memory corruption due to word sharing

2012-02-02 Thread Bernd Petrovitsch

On Mit, 2012-02-01 at 21:04 +, Boehm, Hans wrote:
[...]
> The C11 memory model potentially adds overhead in only two cases:
> 
> 1. When current code involves touching a field that wouldn't otherwise
> be touched.  There are odd cases in which this measurably slows down
> code, but I think all agree that we need it.  In addition to
> bitfields, it can affect speculatively promoting a value to a register
> in a loop, which at least older versions of gcc also do.

Just adding an -f option for this and/or activating it only for -O5 (or
whatever the highest level is) and - in case that feature is activated -
emit warnings if bitfields (and/or any other data types that might be
affected)?

Kind regards,
Bernd
-- 
Bernd Petrovitsch  Email : be...@petrovitsch.priv.at
 LUGA : http://www.luga.at

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther

On Wed, 1 Feb 2012, Linus Torvalds wrote:

> On Wed, Feb 1, 2012 at 9:41 AM, Michael Matz  wrote:
> >
> > One problem is that it's not a new problem, GCC emitted similar code since
> > about forever, and still they turned up only now (well, probably because
> > ia64 is dead, but sparc64 should have similar problems).  The bitfield
> > handling code is _terribly_ complex and fixing it is quite involved.  So
> > don't expect any quick fixes.
> 
> I agree that bitfields are nasty, I've had to handle them myself in
> sparse. And we have actually had problems with bitfields before, to
> the point where we've replaced use of bitfields with masking and
> setting bits by hand.
> 
> But I also think that gcc is simply *buggy*, and has made them much
> nastier than they should be. What gcc *should* have done is to turn
> bitfield accesses into shift-and-masking of the underlying field as
> early as possible, and then do all optimizations at that level.
> 
> In fact, there is another gcc bug outstanding (48696) where I complain
> about absolutely horrible code generation, and that one was actually
> the exact same issue except in reverse: gcc wouldn't take the
> underlying size of the bitfield into account, and use the wrong
> (smaller) size for the access, causing absolutely horrendous code
> generation that mixes byte and word accesses, and causes slowdowns by
> orders of magnitude.

Yeah, sorry for dropping the ball on this one (and missing my original
GCC 4.7 target ...)

> And it really is the same issue: gcc has forgotten what the base type
> is, and tries to "compute" some type using the actual bits. Which is
> simply *wrong*. Always has been.
> 
> It's stupid, it generates bad code (both from performance *and* from a
> correctness angle), and it has actually resulted in gcc having *extra*
> complexity because it keeps the bitfield data around much too late.
> 
> > The other problem is specification.  While you think that the code you
> > wrote is totally obvious it might not actually be so.  For instance, what
> > about this struct:
> >
> > {long l:32; int i1:16; short s; int i2:1; char c:7; short s2:8; short s3;}
> >
> > What are the obviously correct accesses for various writes into this
> > struct?
> 
> I really don't think that example matters. You should instead turn the
> question around, and look at the real code examples, make *those*
> generate good and obviously correct code, and then *after* you've done
> that, you start looking at the mixed case where people do odd things.

Well, it matters because it shows that simply using the declared type
isn't going to work in the end.  What we instead need to do is
remember the underlying objects the bitfield packing algorithm uses.
It then simply becomes obvious how to implement the bitfield accesses.
I hope to get back to this for GCC 4.8.

> Quite frankly, the layout that makes *sense* for something like the
> above is not to pack them. You'd just do

Heh, but of course the ABI specifies they are supposed to be packed ...

In the end we all agree GCC does something nasty (and I would call
it a bug even), but any solution we find in GCC won't be backportable
to earlier releases so you have to deal with the GCC bug for quite
some time and devise workarounds in the kernel.  You'll hit the
bug for all structure fields that share the largest aligned
machine word with a bitfield (thus the size depends on the alignment
of the full object, not that of the struct containing the bitfield
in case that struct is nested inside another more aligned one).
This situation should be easily(?) detectable with sparse.

Thanks,
Richard.

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther

On Wed, 1 Feb 2012, Linus Torvalds wrote:

> Just out of morbid curiosity, what happens if you have totally
> *separate* variables that just happen to link together? IOW, something
> like
> 
>static struct { unsigned bit:1; } onebit;
>static volatile int var;
> 
> and they just *happen* to link next to each other (because they were
> declared next to each other) in the same 8-byte aligned block?

Just to make you run away screaming ...

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48124

where exactly such situation happens ...

Richard.

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther

On Wed, 1 Feb 2012, Linus Torvalds wrote:

> IT IS A GCC BUG.

I think you are wrong when you assume that we think it is not a bug.
Repeating it repeatedly in all-caps doesn't make the fix appear
faster though.

Richard.

Re: Dealing with compilers that pretend to be GCC

2012-02-02 Thread Ludovic Courtès

Hi,

Chris Lattner  skribis:

> On Jan 31, 2012, at 5:15 AM, Ludovic Courtès wrote:
>
>>> 
>>> Interestingly enough:
>>> $ cat q.c
>>> __has_builtin
>>> $ clang -E q.c
>>> 
>> 
>> Yes, that’s what I was asking.
>> 
>> It makes me think that the old CPP predicates (info "(gcc) Obsolete
>> Features") would be more appropriate than compiler magic, with the
>> caveat that they’re presumably not widely supported.
>
> They are similar in spirit.  The major difference between the two is that it 
> is easy for __has_feature and friends gracefully degrade when a compiler 
> doesn't support them, because you can do:
>
> #ifndef __has_builtin
> #define __has_builtin(x) 0
> #endif
>
> in your code, but you can't do anything like this for #assertion.  That and 
> assertions don't have any coverage over the features that you're interested 
> in, and the grammar doesn't have a good way to handle multiple cases like 
> features/attributes/extensions/etc.

Rather than assertions, one could use predicates:

  #if #has_builtin(foo) && #has_attribute(bar)
  ...
  #endif

The difference being that (1) predicates were designed specifically for
this purpose, and (2) there’s no magic involved.

Thanks,
Ludo’.

Re: Dealing with compilers that pretend to be GCC

2012-02-02 Thread Miles Bader

ludovic.cour...@inria.fr (Ludovic Courtès) writes:
> Rather than assertions, one could use predicates:
>
>   #if #has_builtin(foo) && #has_attribute(bar)
>   ...
>   #endif
>
> The difference being that (1) predicates were designed specifically for
> this purpose, and (2) there’s no magic involved.

but ... predicates are (1) less portable, since they involve magic
syntax (whereas it's obvious how to make "magic macros" portable)
and (2) deprecated even in gcc...

-miles

-- 
You can hack anything you want, with TECO and DDT.

Re: Memory corruption due to word sharing

2012-02-02 Thread James Courtier-Dutton

On 1 February 2012 15:19, Jan Kara  wrote:
>  Hello,
>
>  we've spotted the following mismatch between what kernel folks expect
> from a compiler and what GCC really does, resulting in memory corruption on
> some architectures. Consider the following structure:
> struct x {
>    long a;
>    unsigned int b1;
>    unsigned int b2:1;
> };
>
> We have two processes P1 and P2 where P1 updates field b1 and P2 updates
> bitfield b2. The code GCC generates for b2 = 1 e.g. on ia64 is:
>   0:   09 00 21 40 00 21       [MMI]       adds r32=8,r32
>   6:   00 00 00 02 00 e0                   nop.m 0x0
>   c:   11 00 00 90                         mov r15=1;;
>  10:   0b 70 00 40 18 10       [MMI]       ld8 r14=[r32];;
>  16:   00 00 00 02 00 c0                   nop.m 0x0
>  1c:   f1 70 c0 47                         dep r14=r15,r14,32,1;;
>  20:   11 00 38 40 98 11       [MIB]       st8 [r32]=r14
>  26:   00 00 00 02 00 80                   nop.i 0x0
>  2c:   08 00 84 00                         br.ret.sptk.many b0;;
>
> Note that gcc used 64-bit read-modify-write cycle to update b2. Thus if P1
> races with P2, update of b1 can get lost. BTW: I've just checked on x86_64
> and there GCC uses 8-bit bitop to modify the bitfield.
>
> We actually spotted this race in practice in btrfs on structure
> fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got
> corrupted due to update of following bitfield and there seem to be other
> places in kernel where this could happen.
>
> I've raised the issue with our GCC guys and they said to me that: "C does
> not provide such guarantee, nor can you reliably lock different
> structure fields with different locks if they share naturally aligned
> word-size memory regions.  The C++11 memory model would guarantee this,
> but that's not implemented nor do you build the kernel with a C++11
> compiler."
>
> So it seems what C/GCC promises does not quite match with what kernel
> expects. I'm not really an expert in this area so I wanted to report it
> here so that more knowledgeable people can decide how to solve the issue...
>
>                                                                Honza
> --
> Jan Kara 
> SUSE Labs, CR

What is the recommended work around for this problem?

Re: Memory corruption due to word sharing

2012-02-02 Thread David Sterba

On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote:
> We actually spotted this race in practice in btrfs on structure
> fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got
> corrupted due to update of following bitfield and there seem to be other
> places in kernel where this could happen.

Here's the list of structures where a bitfield is shared with spinlock
or atomic/kref within an 8B word, generated from 3.3-rc2:

spinlock+bitfield:

Struct: struct ak4113; Field: init
Struct: struct ak4114; Field: init
Struct: struct ak4117; Field: init
Struct: struct btrfs_block_rsv; Field: full
Struct: struct cm109_dev; Field: buzzer_pending
Struct: struct pch_udc_dev; Field: active
Struct: struct rds_iw_device; Field: dma_local_lkey
Struct: struct sierra_intf_private; Field: suspended
Struct: struct sm501_gpio; Field: registered
Struct: struct unix_sock; Field: gc_candidate
Struct: struct usb_anchor; Field: poisoned
Struct: struct usb_wwan_intf_private; Field: suspended

atomic/kref+bitfield:

Struct: struct dlm_lock_resource; Field: migration_pending
Struct: struct extent_map; Field: in_tree
Struct: struct kobject; Field: state_initialized
Struct: struct page; Field: inuse
Struct: struct rds_ib_connection; Field: i_flowctl
Struct: struct rds_iw_connection; Field: i_flowctl
Struct: struct sctp_transport; Field: dead
Struct: struct transaction_s; Field: t_synchronous_commit
Struct: struct xfs_ioend; Field: io_isasync


Not all listed structs are necessarily subject to the bug. There may be
another mechanism preventing concurrent access to the bitfield and
spinlock/atomic, or the bitfield is modified from a single cpu, or is
not used. But all of them need to be reviewed of course.


david

Re: Memory corruption due to word sharing

2012-02-02 Thread Ingo Molnar

* Linus Torvalds  wrote:

> [...]
> 
> And I realize that compiler people tend to think that loop 
> hoisting etc is absolutely critical for performance, and some 
> big hammer like "barrier()" makes a compiler person wince. You 
> think it results in horrible code generation problems.
> 
> It really doesn't. Loops are fairly unusual in the kernel to 
> begin with, and the compiler barriers are a total non-issue. 
> We have much more problems with the actual CPU barriers that 
> can be *very* expensive on some architectures, and we work a 
> lot at avoiding those and avoiding cacheline ping-pong issues 
> etc.

Just to underline this point, if barriers caused optimization 
problems when GCC builds the kernel then we'd expect to see 
various code generation problems: for example the compiler would 
not be able to cache things well enough and reorder it to make 
the code faster and (often) more compact.

So to test that effect of Linus's claim I picked up a fairly 
bleeding edge version of GCC:

  gcc version 4.7.0 20120112 (Red Hat 4.7.0-0.6) (GCC)

and performed a test build of the kernel with the majority of 
optimization barriers removed (using the v3.2 kernel, x86 
defconfig, 64-bit, -O2 optimization level): 1600 barriers were 
removed (!) and GCC's hands were thus freed to create more 
optimal code [and a very broken kernel], if it could.

I compared the resulting kernel image to an unmodified kernel 
image:

text   data bss  dechex filename
 9781555 982328 1118208 11882091 b54e6b vmlinux.vanilla
 9780972 982328 1118208 11881508 b54c24 vmlinux.no-barriers

So the barriers are costing us only a 0.06% size increase - 583 
bytes on an almost 10 MB kernel image.

To put that into perspectve: a *single* inline function inlining 
decision by the compiler has a larger effect than that. Just a 
couple of days ago we uninlined a function, which had an order 
of magnitude larger effect than this.

The other possible dimension would be the ordering of 
instructions.

To test for that effect I disassembled the two kernel images and 
performed a function by function, instruction by instruction 
comparison of instruction ordering. The summary is that GCC was 
able to remove only 86 instructions (0.005%) and reordered 
around 2400 instructions (0.15%) - out of about 1,570,000 
instructions.

Or, put differently, for the 1600 barriers in this particular 
kernel build, there's about 1.5 instructions reordered and 0.05 
instructions removed.

I also inspected the type of reordering: the overwhelming 
majority of reordering happened within a jump-free basic block 
of instructions and did not affect any loops.

Thus much of the effect of barriers kernel is only the crutial 
effect that we want them to have: to reorder code to have a 
specific program order sequence - but in the process the 
barriers() cause very, very small optimization quality side 
effects.

So the numbers support Linus's claim, the kernel incurs very 
little optimization cost side effects from barriers.

Thanks,

Ingo

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther

On Thu, 2 Feb 2012, David Sterba wrote:

> On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote:
> > We actually spotted this race in practice in btrfs on structure
> > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got
> > corrupted due to update of following bitfield and there seem to be other
> > places in kernel where this could happen.
> 
> Here's the list of structures where a bitfield is shared with spinlock
> or atomic/kref within an 8B word, generated from 3.3-rc2:
> 
> spinlock+bitfield:
> 
> Struct: struct ak4113; Field: init
> Struct: struct ak4114; Field: init
> Struct: struct ak4117; Field: init
> Struct: struct btrfs_block_rsv; Field: full
> Struct: struct cm109_dev; Field: buzzer_pending
> Struct: struct pch_udc_dev; Field: active
> Struct: struct rds_iw_device; Field: dma_local_lkey
> Struct: struct sierra_intf_private; Field: suspended
> Struct: struct sm501_gpio; Field: registered
> Struct: struct unix_sock; Field: gc_candidate
> Struct: struct usb_anchor; Field: poisoned
> Struct: struct usb_wwan_intf_private; Field: suspended
> 
> atomic/kref+bitfield:
> 
> Struct: struct dlm_lock_resource; Field: migration_pending
> Struct: struct extent_map; Field: in_tree
> Struct: struct kobject; Field: state_initialized
> Struct: struct page; Field: inuse
> Struct: struct rds_ib_connection; Field: i_flowctl
> Struct: struct rds_iw_connection; Field: i_flowctl
> Struct: struct sctp_transport; Field: dead
> Struct: struct transaction_s; Field: t_synchronous_commit
> Struct: struct xfs_ioend; Field: io_isasync
> 
> 
> Not all listed structs are necessarily subject to the bug. There may be
> another mechanism preventing concurrent access to the bitfield and
> spinlock/atomic, or the bitfield is modified from a single cpu, or is
> not used. But all of them need to be reviewed of course.

To hit this bug the containing objects also have to be at least
8-byte aligned.

Richard.

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther

On Thu, 2 Feb 2012, James Courtier-Dutton wrote:

> On 1 February 2012 15:19, Jan Kara  wrote:
> >  Hello,
> >
> >  we've spotted the following mismatch between what kernel folks expect
> > from a compiler and what GCC really does, resulting in memory corruption on
> > some architectures. Consider the following structure:
> > struct x {
> >    long a;
> >    unsigned int b1;
> >    unsigned int b2:1;
> > };
> >
> > We have two processes P1 and P2 where P1 updates field b1 and P2 updates
> > bitfield b2. The code GCC generates for b2 = 1 e.g. on ia64 is:
> >   0:   09 00 21 40 00 21       [MMI]       adds r32=8,r32
> >   6:   00 00 00 02 00 e0                   nop.m 0x0
> >   c:   11 00 00 90                         mov r15=1;;
> >  10:   0b 70 00 40 18 10       [MMI]       ld8 r14=[r32];;
> >  16:   00 00 00 02 00 c0                   nop.m 0x0
> >  1c:   f1 70 c0 47                         dep r14=r15,r14,32,1;;
> >  20:   11 00 38 40 98 11       [MIB]       st8 [r32]=r14
> >  26:   00 00 00 02 00 80                   nop.i 0x0
> >  2c:   08 00 84 00                         br.ret.sptk.many b0;;
> >
> > Note that gcc used 64-bit read-modify-write cycle to update b2. Thus if P1
> > races with P2, update of b1 can get lost. BTW: I've just checked on x86_64
> > and there GCC uses 8-bit bitop to modify the bitfield.
> >
> > We actually spotted this race in practice in btrfs on structure
> > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got
> > corrupted due to update of following bitfield and there seem to be other
> > places in kernel where this could happen.
> >
> > I've raised the issue with our GCC guys and they said to me that: "C does
> > not provide such guarantee, nor can you reliably lock different
> > structure fields with different locks if they share naturally aligned
> > word-size memory regions.  The C++11 memory model would guarantee this,
> > but that's not implemented nor do you build the kernel with a C++11
> > compiler."
> >
> > So it seems what C/GCC promises does not quite match with what kernel
> > expects. I'm not really an expert in this area so I wanted to report it
> > here so that more knowledgeable people can decide how to solve the issue...
> 
> What is the recommended work around for this problem?

The recommended work around is to re-layout your structures.

Richard.

Re: Memory corruption due to word sharing

2012-02-02 Thread Michael Matz

Hi,

On Wed, 1 Feb 2012, Linus Torvalds wrote:

> But I also think that gcc is simply *buggy*, and has made them much 
> nastier than they should be. What gcc *should* have done is to turn 
> bitfield accesses into shift-and-masking of the underlying field as 
> early as possible, and then do all optimizations at that level.

Indeed.  And there's even work going on towards that (stalled in the 
moment), but you know how it is, should, could, would.

> In fact, there is another gcc bug outstanding (48696) where I complain
> about absolutely horrible code generation, and that one was actually
> the exact same issue except in reverse: gcc wouldn't take the
> underlying size of the bitfield into account, and use the wrong
> (smaller) size for the access,

That's actually one example why not everything is so totally obvious.  We 
really don't want to store into more bytes than "allowed" (and as we are 
at it, extend this even to reading for volatile members).  For some 
definition for "allowed".

For the ia64 problem at hand it has to surely exclude non-adjacent 
bitfield members.  For PR48124 (writing beyond end of decl) it has to be 
constrained to the size of a decl of such struct type (seems obvious, but 
it's surprising how long you get away with some less strict rule, namely 
only excluding everything after the next alignment border). Well, that's 
all sensible restrictions.  But for your optimization problem you have to 
extend the allowed range a bit again (to at least contain all adjacent 
bitfields), but not too much to break the next guy screaming "but here you 
obviously do a wild write".

For instance, given these three structs:

struct s1 {short a; short b; unsigned c:1;};
struct s2 {short a:16; short b:16; unsigned c:1;};
struct s3 {unsigned a:16; unsigned b:16; unsigned c:1;};

Are the writes to x.b allowed to touch x.a?  And writing x.c?  I think I 
know what you will claim (no crossing writes, except for s3.a and s3.b can 
be combined), but let's say I claim that there's no difference between all 
three structures, and therefore there should be no difference in what's 
allowed to be touched and what not.  In particular the declared type is 
not what matters in allowing certain accesses.  So, if you want to combine 
s3.a and s3.b (which is implied by your PR48696), you'll have to give me 
also combining s2.a and s2.b.  (I'm not so nasty to also want combining 
s1.a and s1.b, because there are other deeper reasons why s1 and the rest 
differ).

The point is, there are simply different opinions and point of views.  
Using exclamation marks, bold typography and hyperbole doesn't make anyone 
more correct nor does it make the problem simpler than it is.

>   struct {long l:32; int i1:16; short s; int i2:1; char c:7; short
> s2:8; short s3;} dummy;
> 
>   int main(int argc, char **argv)
>   {
>   dummy.l = 1;
>   dummy.i1 = 2;
> 
> and then do a test-linearize (with "-m64" to show the difference
> between "long" and "int") I get
> 
>   t.c:2:48: error: dubious one-bit signed bitfield
>   main:
>   .L0x7f928e378010:
>   
>   load.64 %r1 <- 0[dummy]
>   and.64  %r2 <- %r1, $0x
>   or.64   %r3 <- %r2, $1
>   store.64%r3 -> 0[dummy]

Here you store 8 bytes, touching ...

>   load.32 %r4 <- 4[dummy]

... this int.  Both corresponds to members "long l:32; int i1:16;".  They 
don't have the same base type, hence per your own rule they shouldn't be 
allowed to touch each other (later on you also talk about the definedness 
for bit-fields only with int, which also hints that you think one better 
should always use int containers).  Dang, still thinking this is easy?

> And yes, look at how you can see how sparse mixes different access sizes 
> for different fields. But that is damn well what the programmer asked 
> for.
> 
> If programmers write stupid code, the compiler might as well generate 
> odd code.

But what exactly constitutes "stupid" code?  Are you really unable to see 
that we're exactly struggling to find rules to differ between "stupid" and 
supposedly "non-stupid" code?  What if I'm saying that anyone using 
bitfields writes "stupid" code, and hence the compiler can as well 
generate odd code?

> Actually, "int:96" isn't ok last I saw. Not in original C, at least.

GCC supports multiple languages.  For C++ it's valid (well, you get a 
warning, and you can't do much with that member, but there we are).

> Just out of morbid curiosity, what happens if you have totally
> *separate* variables that just happen to link together? IOW, something
> like
> 
>static struct { unsigned bit:1; } onebit;
>static volatile int var;
> 
> and they just *happen* to link next to each other (because they were
> declared next to each other) in the same 8-byte aligned block?

non-struct decls always work.  It's only the bit-field expander that 
willy-nilly invents access modes (for some targets), and sometimes block 
moves have problems.  Unt

Re: [Patch, fortran] Fix ICE with class array references.

2012-02-02 Thread Andrew Haley

On 02/02/2012 02:51 PM, Paul Richard Thomas wrote:
> Many thanks for the patch.  This makes for a rather complete
> implementation of OOP.

Excellent!  It used to be said that the determined Real Programmer can
write Fortran programs in any language; now it seems that they can
write any language in Fortran.

;-)

Andrew.

Re: Memory corruption due to word sharing

2012-02-02 Thread Aldy Hernandez

Linus Torvalds  writes:

> Seriously - is there any real argument *against* just using the base
> type as a hint for access size?

If I'm on the hook for attempting to fix this again, I'd also like to
know if there are any arguments against using the base type.

Re: Memory corruption due to word sharing

2012-02-02 Thread Michael Matz

Hi,

On Thu, 2 Feb 2012, Aldy Hernandez wrote:

> > Seriously - is there any real argument *against* just using the base 
> > type as a hint for access size?
> 
> If I'm on the hook for attempting to fix this again, I'd also like to 
> know if there are any arguments against using the base type.

Sure.  Simplest example: struct s {int i:24;} __attribute__((packed)).

You must access only three bytes, no matter what.  The basetype (int) is 
four bytes.

Ciao,
Michael.

Re: Memory corruption due to word sharing

2012-02-02 Thread Linus Torvalds

On Thu, Feb 2, 2012 at 8:28 AM, Michael Matz  wrote:
>
> Sure.  Simplest example: struct s {int i:24;} __attribute__((packed)).
>
> You must access only three bytes, no matter what.  The basetype (int) is
> four bytes.

Ok, so here's a really *stupid* (but also really really simple) patch attached.

What it does is to simply change the meaning of the SLOW_BYTE_ACCESS thing.

What SLOW_BYTE_ACCESS means is currently is not just that byte
accesses are slow: it means that *everything* but full-word accesses
are slow!

Which is (a) not what the name really implies, (b) not even the
historical meaning of that #define (why? the meaning of "full word"
has changed since: it used to be 32 bits, now it is 64 bits) and (c)
stupid, because that's not how hardware with slow byte accesses really
work anyway.

Finally, it's what causes gcc to do 64-bit accesses to bitfields that
aren't 64 bits in size.

So because of this, I propose gcc just change the rules for what
SLOW_BYTE_ACCESS means.

I propose to just change it to mean "accesses smaller than 32 bits are
slow". That's actually much closer to what the hardware definition
tends to be.

It doesn't fix the generic issue, it doesn't fix any C++11/C11 issues,
it doesn't really change anything, but what it *does* do is make for a
hell of a lot saner model. And it avoids bugs in practice.

NOTE! On at least some machines with SLOW_BYTE_ACCESS, accesses
smaller than a word cannot possibly be atomic anyway (well, not
without jumping through hoops), so the fact that we still extend to 32
bits and the problem of touching too much still exists with 'char' and
'short' variables that are in the same 32-bit word as a bitfield is
kind of unavoidable.

So this suggested patch doesn't really "fix" anything fundamental, but
it is (a) simple, (b) totally untested and (c) at least fixes *some*
cases.

For example, it might fix the 'sig_atomic_t' shadowning, since that is
usually 'int'. It obviously can never fix the issue with volatile
char/short, but at least it works around it for 'long'.

In other words - I'm not trying to say that this patch really "fixes"
anything (except sig_atomic_t interaction, which really does get fixed
for the normal 'int' case). But what it does do is to limit the damage
of a bad situation.

And it is small, and should hopefully be easy to accept even for
stable versions of gcc.

So can we please do something like this for maintenance releases, and
consider the much more complex C++11/C11 issues to be a separate much
bigger issue for the future?

Because the current SLOW_BYTE_ACCESS meaning really is crap. The
*only* thing that architecture define seems to do is literally the
bitfield extract semantics, and it does that horribly horribly badly
as shown by the bug on both 64-bit POWER and Sparc. For no good reason
- both of those have perfectly fine word operations afaik.

I did not look at other architectures that define SLOW_BYTE_ACCESS,
but if they have a 32-bit integer type, I'm pretty sure they support
fast loads/stores of it.

Hacky? Yes. Pretty? No. But better than the disaster that is there
now? Hell yes.

   Linus

PS. The only reason I hardcoded "32" was that I didn't know how to ask
the quesion "is this mode at least as wide as 'int'" in the proper gcc
way. So I'm really not suggesting you apply this patch as-is, I'm just
sending it out as a "hey, this is a pretty obvious way to work around
part of the problem, somebody who really knows what they are doing
should probably improve on it".
 gcc/stor-layout.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index ea0d44d64d26..1387c0e9e060 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -2486,7 +2486,12 @@ get_best_mode (int bitsize, int bitpos, unsigned int align,
 	  && unit <= MIN (align, BIGGEST_ALIGNMENT)
 	  && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
+	  {
 	wide_mode = tmode;
+	/* Wide enough? */
+	if (unit >= 32)
+	  break;
+	  }
 	}

   if (wide_mode != VOIDmode)

Re: Memory corruption due to word sharing

2012-02-02 Thread Paul E. McKenney

On Wed, Feb 01, 2012 at 03:11:00PM -0800, Linus Torvalds wrote:
> On Wed, Feb 1, 2012 at 2:45 PM, Paul E. McKenney
>  wrote:
> >
> > My (perhaps forlorn and naive) hope is that C++11 memory_order_relaxed
> > will eventually allow ACCESS_ONCE() to be upgraded so that (for example)
> > access-once increments can generate a single increment-memory instruction
> > on x86.
> 
> I don't think that is a semantic issue.
> 
> gcc could do it *today* with volatile accesses. It doesn't, because
> volatiles are scary and basically disables a lot of optimizations. Why
> would memory ordering be substantially different just because it has a
> different name?

I too would much prefer that gcc volatile worked more sanely.

But several people, including me, pushed on that and consistently got back
"the standard doesn't say we have to do that".

So I got together with the standards people and now there is something
(memory_order_relaxed atomics) that is specified to work the way we want
it to.  Of course, it will likely be quite some time before it appears
in usable form in gcc, but probably quite a bit less time than we have
been pushing on the gcc folks about volatile.

> > New architectures might eventually might define things like atomic_inc()
> > in terms of C++11 atomics, but let's start with the straightforward stuff
> > as and if it makes sense.
> 
> SMP-atomic or percpu atomic? Or both?

Only SMP-atomic.

> We need both variants in the kernel. If the compiler generates one of
> them for us, that doesn't really much help.

I must admit that the non-x86 per-CPU atomics are, ummm, "interesting".

Thanx, Paul

Re: Memory corruption due to word sharing

2012-02-02 Thread Linus Torvalds

On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney
 wrote:
>>
>> SMP-atomic or percpu atomic? Or both?
>
> Only SMP-atomic.

And I assume that since the compiler does them, that would now make it
impossible for us to gather a list of all the 'lock' prefixes so that
we can undo them if it turns out that we are running on a UP machine.

When we do SMP operations, we don't just add a "lock" prefix to it. We do this:

  #define LOCK_PREFIX_HERE \
".section .smp_locks,\"a\"\n"   \
".balign 4\n"   \
".long 671f - .\n" /* offset */ \
".previous\n"   \
"671:"

  #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "

and I'm sure you know that, but I'm not sure the gcc people realize
the kinds of games we play to make things work better.

Sure, "everything will be SMP" some day, but running UP kernels is
likely still going to remain a good idea in virtualized environments,
for example. And having to make it a compile-time option is *not* a
good idea.

So compiler intrisics are often *worse* than doing it by hand for us
for all these kinds of reasons. They aren't generally geared towards
the very specialized needs that a kernel has.

Of course, maybe even user space would want some kind of way to
automatically strip 'lock' prefixes from a binary, so maybe the
compiler would have some kind of support like this too.

(And no, disassembling the binary in order to find lock prefixes is
*not* the answer, at least not for the kernel)

>> We need both variants in the kernel. If the compiler generates one of
>> them for us, that doesn't really much help.
>
> I must admit that the non-x86 per-CPU atomics are, ummm, "interesting".

Most non-x86 cpu's would probably be better off treating them the same
as smp-atomics (load-locked + store-conditional), but right now we
have this insane generic infrastructure for having versions that are
irq-safe by disabling interrupts etc. Ugh. Mainly because nobody
really is willing to work on and fix up the 25 architectures that
really don't matter.

  Linus

Re: Memory corruption due to word sharing

2012-02-02 Thread Paul E. McKenney

On Thu, Feb 02, 2012 at 11:08:25AM -0800, Linus Torvalds wrote:
> On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney
>  wrote:
> >>
> >> SMP-atomic or percpu atomic? Or both?
> >
> > Only SMP-atomic.
> 
> And I assume that since the compiler does them, that would now make it
> impossible for us to gather a list of all the 'lock' prefixes so that
> we can undo them if it turns out that we are running on a UP machine.
> 
> When we do SMP operations, we don't just add a "lock" prefix to it. We do 
> this:
> 
>   #define LOCK_PREFIX_HERE \
> ".section .smp_locks,\"a\"\n"   \
> ".balign 4\n"   \
> ".long 671f - .\n" /* offset */ \
> ".previous\n"   \
> "671:"
> 
>   #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "
> 
> and I'm sure you know that, but I'm not sure the gcc people realize
> the kinds of games we play to make things work better.
> 
> Sure, "everything will be SMP" some day, but running UP kernels is
> likely still going to remain a good idea in virtualized environments,
> for example. And having to make it a compile-time option is *not* a
> good idea.
> 
> So compiler intrisics are often *worse* than doing it by hand for us
> for all these kinds of reasons. They aren't generally geared towards
> the very specialized needs that a kernel has.
> 
> Of course, maybe even user space would want some kind of way to
> automatically strip 'lock' prefixes from a binary, so maybe the
> compiler would have some kind of support like this too.
> 
> (And no, disassembling the binary in order to find lock prefixes is
> *not* the answer, at least not for the kernel)

So if the gcc guys want the Linux kernel to use their atomics, they
must give it some useful way of finding all the lock prefixes on x86.
Should be a fun conversation.  ;-)

> >> We need both variants in the kernel. If the compiler generates one of
> >> them for us, that doesn't really much help.
> >
> > I must admit that the non-x86 per-CPU atomics are, ummm, "interesting".
> 
> Most non-x86 cpu's would probably be better off treating them the same
> as smp-atomics (load-locked + store-conditional), but right now we
> have this insane generic infrastructure for having versions that are
> irq-safe by disabling interrupts etc. Ugh. Mainly because nobody
> really is willing to work on and fix up the 25 architectures that
> really don't matter.

Again, fair point!

Thanx, Paul

gcc-4.5-20120202 is now available

2012-02-02 Thread gccadmin

Snapshot gcc-4.5-20120202 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20120202/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 183851

You'll find:

 gcc-4.5-20120202.tar.bz2 Complete GCC

  MD5=19e9ef3f7b2862c4f51063bd46c63e7b
  SHA1=445f40ee0d4e4b53886b0c9dcf41c98fa9b70ac4

Diffs from 4.5-20120126 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: Memory corruption due to word sharing

2012-02-02 Thread DJ Delorie


Jan Kara  writes:
>   we've spotted the following mismatch between what kernel folks expect
> from a compiler and what GCC really does, resulting in memory corruption on
> some architectures. Consider the following structure:
> struct x {
> long a;
> unsigned int b1;
> unsigned int b2:1;
> };

If this structure were volatile, you could try
-fstrict-volatile-bitfields, which forces GCC to use the C type to
define the access width, instead of doing whatever it thinks is optimal.

Note: that flag is enabled by default for some targets already, most
notably ARM.

RE: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Dealing with compilers that pretend to be GCC

Re: Dealing with compilers that pretend to be GCC

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: [Patch, fortran] Fix ICE with class array references.

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

Re: Memory corruption due to word sharing

gcc-4.5-20120202 is now available

Re: Memory corruption due to word sharing

21 matches

Site Navigation

Mail list logo

Footer information