RE: Memory corruption due to word sharing
On Mit, 2012-02-01 at 21:04 +, Boehm, Hans wrote: [...] > The C11 memory model potentially adds overhead in only two cases: > > 1. When current code involves touching a field that wouldn't otherwise > be touched. There are odd cases in which this measurably slows down > code, but I think all agree that we need it. In addition to > bitfields, it can affect speculatively promoting a value to a register > in a loop, which at least older versions of gcc also do. Just adding an -f option for this and/or activating it only for -O5 (or whatever the highest level is) and - in case that feature is activated - emit warnings if bitfields (and/or any other data types that might be affected)? Kind regards, Bernd -- Bernd Petrovitsch Email : be...@petrovitsch.priv.at LUGA : http://www.luga.at
Re: Memory corruption due to word sharing
On Wed, 1 Feb 2012, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 9:41 AM, Michael Matz wrote: > > > > One problem is that it's not a new problem, GCC emitted similar code since > > about forever, and still they turned up only now (well, probably because > > ia64 is dead, but sparc64 should have similar problems). The bitfield > > handling code is _terribly_ complex and fixing it is quite involved. So > > don't expect any quick fixes. > > I agree that bitfields are nasty, I've had to handle them myself in > sparse. And we have actually had problems with bitfields before, to > the point where we've replaced use of bitfields with masking and > setting bits by hand. > > But I also think that gcc is simply *buggy*, and has made them much > nastier than they should be. What gcc *should* have done is to turn > bitfield accesses into shift-and-masking of the underlying field as > early as possible, and then do all optimizations at that level. > > In fact, there is another gcc bug outstanding (48696) where I complain > about absolutely horrible code generation, and that one was actually > the exact same issue except in reverse: gcc wouldn't take the > underlying size of the bitfield into account, and use the wrong > (smaller) size for the access, causing absolutely horrendous code > generation that mixes byte and word accesses, and causes slowdowns by > orders of magnitude. Yeah, sorry for dropping the ball on this one (and missing my original GCC 4.7 target ...) > And it really is the same issue: gcc has forgotten what the base type > is, and tries to "compute" some type using the actual bits. Which is > simply *wrong*. Always has been. > > It's stupid, it generates bad code (both from performance *and* from a > correctness angle), and it has actually resulted in gcc having *extra* > complexity because it keeps the bitfield data around much too late. > > > The other problem is specification. While you think that the code you > > wrote is totally obvious it might not actually be so. For instance, what > > about this struct: > > > > {long l:32; int i1:16; short s; int i2:1; char c:7; short s2:8; short s3;} > > > > What are the obviously correct accesses for various writes into this > > struct? > > I really don't think that example matters. You should instead turn the > question around, and look at the real code examples, make *those* > generate good and obviously correct code, and then *after* you've done > that, you start looking at the mixed case where people do odd things. Well, it matters because it shows that simply using the declared type isn't going to work in the end. What we instead need to do is remember the underlying objects the bitfield packing algorithm uses. It then simply becomes obvious how to implement the bitfield accesses. I hope to get back to this for GCC 4.8. > Quite frankly, the layout that makes *sense* for something like the > above is not to pack them. You'd just do Heh, but of course the ABI specifies they are supposed to be packed ... In the end we all agree GCC does something nasty (and I would call it a bug even), but any solution we find in GCC won't be backportable to earlier releases so you have to deal with the GCC bug for quite some time and devise workarounds in the kernel. You'll hit the bug for all structure fields that share the largest aligned machine word with a bitfield (thus the size depends on the alignment of the full object, not that of the struct containing the bitfield in case that struct is nested inside another more aligned one). This situation should be easily(?) detectable with sparse. Thanks, Richard.
Re: Memory corruption due to word sharing
On Wed, 1 Feb 2012, Linus Torvalds wrote: > Just out of morbid curiosity, what happens if you have totally > *separate* variables that just happen to link together? IOW, something > like > >static struct { unsigned bit:1; } onebit; >static volatile int var; > > and they just *happen* to link next to each other (because they were > declared next to each other) in the same 8-byte aligned block? Just to make you run away screaming ... http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48124 where exactly such situation happens ... Richard.
Re: Memory corruption due to word sharing
On Wed, 1 Feb 2012, Linus Torvalds wrote: > IT IS A GCC BUG. I think you are wrong when you assume that we think it is not a bug. Repeating it repeatedly in all-caps doesn't make the fix appear faster though. Richard.
Re: Dealing with compilers that pretend to be GCC
Hi, Chris Lattner skribis: > On Jan 31, 2012, at 5:15 AM, Ludovic Courtès wrote: > >>> >>> Interestingly enough: >>> $ cat q.c >>> __has_builtin >>> $ clang -E q.c >>> >> >> Yes, that’s what I was asking. >> >> It makes me think that the old CPP predicates (info "(gcc) Obsolete >> Features") would be more appropriate than compiler magic, with the >> caveat that they’re presumably not widely supported. > > They are similar in spirit. The major difference between the two is that it > is easy for __has_feature and friends gracefully degrade when a compiler > doesn't support them, because you can do: > > #ifndef __has_builtin > #define __has_builtin(x) 0 > #endif > > in your code, but you can't do anything like this for #assertion. That and > assertions don't have any coverage over the features that you're interested > in, and the grammar doesn't have a good way to handle multiple cases like > features/attributes/extensions/etc. Rather than assertions, one could use predicates: #if #has_builtin(foo) && #has_attribute(bar) ... #endif The difference being that (1) predicates were designed specifically for this purpose, and (2) there’s no magic involved. Thanks, Ludo’.
Re: Dealing with compilers that pretend to be GCC
ludovic.cour...@inria.fr (Ludovic Courtès) writes: > Rather than assertions, one could use predicates: > > #if #has_builtin(foo) && #has_attribute(bar) > ... > #endif > > The difference being that (1) predicates were designed specifically for > this purpose, and (2) there’s no magic involved. but ... predicates are (1) less portable, since they involve magic syntax (whereas it's obvious how to make "magic macros" portable) and (2) deprecated even in gcc... -miles -- You can hack anything you want, with TECO and DDT.
Re: Memory corruption due to word sharing
On 1 February 2012 15:19, Jan Kara wrote: > Hello, > > we've spotted the following mismatch between what kernel folks expect > from a compiler and what GCC really does, resulting in memory corruption on > some architectures. Consider the following structure: > struct x { > long a; > unsigned int b1; > unsigned int b2:1; > }; > > We have two processes P1 and P2 where P1 updates field b1 and P2 updates > bitfield b2. The code GCC generates for b2 = 1 e.g. on ia64 is: > 0: 09 00 21 40 00 21 [MMI] adds r32=8,r32 > 6: 00 00 00 02 00 e0 nop.m 0x0 > c: 11 00 00 90 mov r15=1;; > 10: 0b 70 00 40 18 10 [MMI] ld8 r14=[r32];; > 16: 00 00 00 02 00 c0 nop.m 0x0 > 1c: f1 70 c0 47 dep r14=r15,r14,32,1;; > 20: 11 00 38 40 98 11 [MIB] st8 [r32]=r14 > 26: 00 00 00 02 00 80 nop.i 0x0 > 2c: 08 00 84 00 br.ret.sptk.many b0;; > > Note that gcc used 64-bit read-modify-write cycle to update b2. Thus if P1 > races with P2, update of b1 can get lost. BTW: I've just checked on x86_64 > and there GCC uses 8-bit bitop to modify the bitfield. > > We actually spotted this race in practice in btrfs on structure > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got > corrupted due to update of following bitfield and there seem to be other > places in kernel where this could happen. > > I've raised the issue with our GCC guys and they said to me that: "C does > not provide such guarantee, nor can you reliably lock different > structure fields with different locks if they share naturally aligned > word-size memory regions. The C++11 memory model would guarantee this, > but that's not implemented nor do you build the kernel with a C++11 > compiler." > > So it seems what C/GCC promises does not quite match with what kernel > expects. I'm not really an expert in this area so I wanted to report it > here so that more knowledgeable people can decide how to solve the issue... > > Honza > -- > Jan Kara > SUSE Labs, CR What is the recommended work around for this problem?
Re: Memory corruption due to word sharing
On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote: > We actually spotted this race in practice in btrfs on structure > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got > corrupted due to update of following bitfield and there seem to be other > places in kernel where this could happen. Here's the list of structures where a bitfield is shared with spinlock or atomic/kref within an 8B word, generated from 3.3-rc2: spinlock+bitfield: Struct: struct ak4113; Field: init Struct: struct ak4114; Field: init Struct: struct ak4117; Field: init Struct: struct btrfs_block_rsv; Field: full Struct: struct cm109_dev; Field: buzzer_pending Struct: struct pch_udc_dev; Field: active Struct: struct rds_iw_device; Field: dma_local_lkey Struct: struct sierra_intf_private; Field: suspended Struct: struct sm501_gpio; Field: registered Struct: struct unix_sock; Field: gc_candidate Struct: struct usb_anchor; Field: poisoned Struct: struct usb_wwan_intf_private; Field: suspended atomic/kref+bitfield: Struct: struct dlm_lock_resource; Field: migration_pending Struct: struct extent_map; Field: in_tree Struct: struct kobject; Field: state_initialized Struct: struct page; Field: inuse Struct: struct rds_ib_connection; Field: i_flowctl Struct: struct rds_iw_connection; Field: i_flowctl Struct: struct sctp_transport; Field: dead Struct: struct transaction_s; Field: t_synchronous_commit Struct: struct xfs_ioend; Field: io_isasync Not all listed structs are necessarily subject to the bug. There may be another mechanism preventing concurrent access to the bitfield and spinlock/atomic, or the bitfield is modified from a single cpu, or is not used. But all of them need to be reviewed of course. david
Re: Memory corruption due to word sharing
* Linus Torvalds wrote: > [...] > > And I realize that compiler people tend to think that loop > hoisting etc is absolutely critical for performance, and some > big hammer like "barrier()" makes a compiler person wince. You > think it results in horrible code generation problems. > > It really doesn't. Loops are fairly unusual in the kernel to > begin with, and the compiler barriers are a total non-issue. > We have much more problems with the actual CPU barriers that > can be *very* expensive on some architectures, and we work a > lot at avoiding those and avoiding cacheline ping-pong issues > etc. Just to underline this point, if barriers caused optimization problems when GCC builds the kernel then we'd expect to see various code generation problems: for example the compiler would not be able to cache things well enough and reorder it to make the code faster and (often) more compact. So to test that effect of Linus's claim I picked up a fairly bleeding edge version of GCC: gcc version 4.7.0 20120112 (Red Hat 4.7.0-0.6) (GCC) and performed a test build of the kernel with the majority of optimization barriers removed (using the v3.2 kernel, x86 defconfig, 64-bit, -O2 optimization level): 1600 barriers were removed (!) and GCC's hands were thus freed to create more optimal code [and a very broken kernel], if it could. I compared the resulting kernel image to an unmodified kernel image: text data bss dechex filename 9781555 982328 1118208 11882091 b54e6b vmlinux.vanilla 9780972 982328 1118208 11881508 b54c24 vmlinux.no-barriers So the barriers are costing us only a 0.06% size increase - 583 bytes on an almost 10 MB kernel image. To put that into perspectve: a *single* inline function inlining decision by the compiler has a larger effect than that. Just a couple of days ago we uninlined a function, which had an order of magnitude larger effect than this. The other possible dimension would be the ordering of instructions. To test for that effect I disassembled the two kernel images and performed a function by function, instruction by instruction comparison of instruction ordering. The summary is that GCC was able to remove only 86 instructions (0.005%) and reordered around 2400 instructions (0.15%) - out of about 1,570,000 instructions. Or, put differently, for the 1600 barriers in this particular kernel build, there's about 1.5 instructions reordered and 0.05 instructions removed. I also inspected the type of reordering: the overwhelming majority of reordering happened within a jump-free basic block of instructions and did not affect any loops. Thus much of the effect of barriers kernel is only the crutial effect that we want them to have: to reorder code to have a specific program order sequence - but in the process the barriers() cause very, very small optimization quality side effects. So the numbers support Linus's claim, the kernel incurs very little optimization cost side effects from barriers. Thanks, Ingo
Re: Memory corruption due to word sharing
On Thu, 2 Feb 2012, David Sterba wrote: > On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote: > > We actually spotted this race in practice in btrfs on structure > > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got > > corrupted due to update of following bitfield and there seem to be other > > places in kernel where this could happen. > > Here's the list of structures where a bitfield is shared with spinlock > or atomic/kref within an 8B word, generated from 3.3-rc2: > > spinlock+bitfield: > > Struct: struct ak4113; Field: init > Struct: struct ak4114; Field: init > Struct: struct ak4117; Field: init > Struct: struct btrfs_block_rsv; Field: full > Struct: struct cm109_dev; Field: buzzer_pending > Struct: struct pch_udc_dev; Field: active > Struct: struct rds_iw_device; Field: dma_local_lkey > Struct: struct sierra_intf_private; Field: suspended > Struct: struct sm501_gpio; Field: registered > Struct: struct unix_sock; Field: gc_candidate > Struct: struct usb_anchor; Field: poisoned > Struct: struct usb_wwan_intf_private; Field: suspended > > atomic/kref+bitfield: > > Struct: struct dlm_lock_resource; Field: migration_pending > Struct: struct extent_map; Field: in_tree > Struct: struct kobject; Field: state_initialized > Struct: struct page; Field: inuse > Struct: struct rds_ib_connection; Field: i_flowctl > Struct: struct rds_iw_connection; Field: i_flowctl > Struct: struct sctp_transport; Field: dead > Struct: struct transaction_s; Field: t_synchronous_commit > Struct: struct xfs_ioend; Field: io_isasync > > > Not all listed structs are necessarily subject to the bug. There may be > another mechanism preventing concurrent access to the bitfield and > spinlock/atomic, or the bitfield is modified from a single cpu, or is > not used. But all of them need to be reviewed of course. To hit this bug the containing objects also have to be at least 8-byte aligned. Richard.
Re: Memory corruption due to word sharing
On Thu, 2 Feb 2012, James Courtier-Dutton wrote: > On 1 February 2012 15:19, Jan Kara wrote: > > Hello, > > > > we've spotted the following mismatch between what kernel folks expect > > from a compiler and what GCC really does, resulting in memory corruption on > > some architectures. Consider the following structure: > > struct x { > > long a; > > unsigned int b1; > > unsigned int b2:1; > > }; > > > > We have two processes P1 and P2 where P1 updates field b1 and P2 updates > > bitfield b2. The code GCC generates for b2 = 1 e.g. on ia64 is: > > 0: 09 00 21 40 00 21 [MMI] adds r32=8,r32 > > 6: 00 00 00 02 00 e0 nop.m 0x0 > > c: 11 00 00 90 mov r15=1;; > > 10: 0b 70 00 40 18 10 [MMI] ld8 r14=[r32];; > > 16: 00 00 00 02 00 c0 nop.m 0x0 > > 1c: f1 70 c0 47 dep r14=r15,r14,32,1;; > > 20: 11 00 38 40 98 11 [MIB] st8 [r32]=r14 > > 26: 00 00 00 02 00 80 nop.i 0x0 > > 2c: 08 00 84 00 br.ret.sptk.many b0;; > > > > Note that gcc used 64-bit read-modify-write cycle to update b2. Thus if P1 > > races with P2, update of b1 can get lost. BTW: I've just checked on x86_64 > > and there GCC uses 8-bit bitop to modify the bitfield. > > > > We actually spotted this race in practice in btrfs on structure > > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got > > corrupted due to update of following bitfield and there seem to be other > > places in kernel where this could happen. > > > > I've raised the issue with our GCC guys and they said to me that: "C does > > not provide such guarantee, nor can you reliably lock different > > structure fields with different locks if they share naturally aligned > > word-size memory regions. The C++11 memory model would guarantee this, > > but that's not implemented nor do you build the kernel with a C++11 > > compiler." > > > > So it seems what C/GCC promises does not quite match with what kernel > > expects. I'm not really an expert in this area so I wanted to report it > > here so that more knowledgeable people can decide how to solve the issue... > > What is the recommended work around for this problem? The recommended work around is to re-layout your structures. Richard.
Re: Memory corruption due to word sharing
Hi, On Wed, 1 Feb 2012, Linus Torvalds wrote: > But I also think that gcc is simply *buggy*, and has made them much > nastier than they should be. What gcc *should* have done is to turn > bitfield accesses into shift-and-masking of the underlying field as > early as possible, and then do all optimizations at that level. Indeed. And there's even work going on towards that (stalled in the moment), but you know how it is, should, could, would. > In fact, there is another gcc bug outstanding (48696) where I complain > about absolutely horrible code generation, and that one was actually > the exact same issue except in reverse: gcc wouldn't take the > underlying size of the bitfield into account, and use the wrong > (smaller) size for the access, That's actually one example why not everything is so totally obvious. We really don't want to store into more bytes than "allowed" (and as we are at it, extend this even to reading for volatile members). For some definition for "allowed". For the ia64 problem at hand it has to surely exclude non-adjacent bitfield members. For PR48124 (writing beyond end of decl) it has to be constrained to the size of a decl of such struct type (seems obvious, but it's surprising how long you get away with some less strict rule, namely only excluding everything after the next alignment border). Well, that's all sensible restrictions. But for your optimization problem you have to extend the allowed range a bit again (to at least contain all adjacent bitfields), but not too much to break the next guy screaming "but here you obviously do a wild write". For instance, given these three structs: struct s1 {short a; short b; unsigned c:1;}; struct s2 {short a:16; short b:16; unsigned c:1;}; struct s3 {unsigned a:16; unsigned b:16; unsigned c:1;}; Are the writes to x.b allowed to touch x.a? And writing x.c? I think I know what you will claim (no crossing writes, except for s3.a and s3.b can be combined), but let's say I claim that there's no difference between all three structures, and therefore there should be no difference in what's allowed to be touched and what not. In particular the declared type is not what matters in allowing certain accesses. So, if you want to combine s3.a and s3.b (which is implied by your PR48696), you'll have to give me also combining s2.a and s2.b. (I'm not so nasty to also want combining s1.a and s1.b, because there are other deeper reasons why s1 and the rest differ). The point is, there are simply different opinions and point of views. Using exclamation marks, bold typography and hyperbole doesn't make anyone more correct nor does it make the problem simpler than it is. > struct {long l:32; int i1:16; short s; int i2:1; char c:7; short > s2:8; short s3;} dummy; > > int main(int argc, char **argv) > { > dummy.l = 1; > dummy.i1 = 2; > > and then do a test-linearize (with "-m64" to show the difference > between "long" and "int") I get > > t.c:2:48: error: dubious one-bit signed bitfield > main: > .L0x7f928e378010: > > load.64 %r1 <- 0[dummy] > and.64 %r2 <- %r1, $0x > or.64 %r3 <- %r2, $1 > store.64%r3 -> 0[dummy] Here you store 8 bytes, touching ... > load.32 %r4 <- 4[dummy] ... this int. Both corresponds to members "long l:32; int i1:16;". They don't have the same base type, hence per your own rule they shouldn't be allowed to touch each other (later on you also talk about the definedness for bit-fields only with int, which also hints that you think one better should always use int containers). Dang, still thinking this is easy? > And yes, look at how you can see how sparse mixes different access sizes > for different fields. But that is damn well what the programmer asked > for. > > If programmers write stupid code, the compiler might as well generate > odd code. But what exactly constitutes "stupid" code? Are you really unable to see that we're exactly struggling to find rules to differ between "stupid" and supposedly "non-stupid" code? What if I'm saying that anyone using bitfields writes "stupid" code, and hence the compiler can as well generate odd code? > Actually, "int:96" isn't ok last I saw. Not in original C, at least. GCC supports multiple languages. For C++ it's valid (well, you get a warning, and you can't do much with that member, but there we are). > Just out of morbid curiosity, what happens if you have totally > *separate* variables that just happen to link together? IOW, something > like > >static struct { unsigned bit:1; } onebit; >static volatile int var; > > and they just *happen* to link next to each other (because they were > declared next to each other) in the same 8-byte aligned block? non-struct decls always work. It's only the bit-field expander that willy-nilly invents access modes (for some targets), and sometimes block moves have problems. Unt
Re: [Patch, fortran] Fix ICE with class array references.
On 02/02/2012 02:51 PM, Paul Richard Thomas wrote: > Many thanks for the patch. This makes for a rather complete > implementation of OOP. Excellent! It used to be said that the determined Real Programmer can write Fortran programs in any language; now it seems that they can write any language in Fortran. ;-) Andrew.
Re: Memory corruption due to word sharing
Linus Torvalds writes: > Seriously - is there any real argument *against* just using the base > type as a hint for access size? If I'm on the hook for attempting to fix this again, I'd also like to know if there are any arguments against using the base type.
Re: Memory corruption due to word sharing
Hi, On Thu, 2 Feb 2012, Aldy Hernandez wrote: > > Seriously - is there any real argument *against* just using the base > > type as a hint for access size? > > If I'm on the hook for attempting to fix this again, I'd also like to > know if there are any arguments against using the base type. Sure. Simplest example: struct s {int i:24;} __attribute__((packed)). You must access only three bytes, no matter what. The basetype (int) is four bytes. Ciao, Michael.
Re: Memory corruption due to word sharing
On Thu, Feb 2, 2012 at 8:28 AM, Michael Matz wrote: > > Sure. Simplest example: struct s {int i:24;} __attribute__((packed)). > > You must access only three bytes, no matter what. The basetype (int) is > four bytes. Ok, so here's a really *stupid* (but also really really simple) patch attached. What it does is to simply change the meaning of the SLOW_BYTE_ACCESS thing. What SLOW_BYTE_ACCESS means is currently is not just that byte accesses are slow: it means that *everything* but full-word accesses are slow! Which is (a) not what the name really implies, (b) not even the historical meaning of that #define (why? the meaning of "full word" has changed since: it used to be 32 bits, now it is 64 bits) and (c) stupid, because that's not how hardware with slow byte accesses really work anyway. Finally, it's what causes gcc to do 64-bit accesses to bitfields that aren't 64 bits in size. So because of this, I propose gcc just change the rules for what SLOW_BYTE_ACCESS means. I propose to just change it to mean "accesses smaller than 32 bits are slow". That's actually much closer to what the hardware definition tends to be. It doesn't fix the generic issue, it doesn't fix any C++11/C11 issues, it doesn't really change anything, but what it *does* do is make for a hell of a lot saner model. And it avoids bugs in practice. NOTE! On at least some machines with SLOW_BYTE_ACCESS, accesses smaller than a word cannot possibly be atomic anyway (well, not without jumping through hoops), so the fact that we still extend to 32 bits and the problem of touching too much still exists with 'char' and 'short' variables that are in the same 32-bit word as a bitfield is kind of unavoidable. So this suggested patch doesn't really "fix" anything fundamental, but it is (a) simple, (b) totally untested and (c) at least fixes *some* cases. For example, it might fix the 'sig_atomic_t' shadowning, since that is usually 'int'. It obviously can never fix the issue with volatile char/short, but at least it works around it for 'long'. In other words - I'm not trying to say that this patch really "fixes" anything (except sig_atomic_t interaction, which really does get fixed for the normal 'int' case). But what it does do is to limit the damage of a bad situation. And it is small, and should hopefully be easy to accept even for stable versions of gcc. So can we please do something like this for maintenance releases, and consider the much more complex C++11/C11 issues to be a separate much bigger issue for the future? Because the current SLOW_BYTE_ACCESS meaning really is crap. The *only* thing that architecture define seems to do is literally the bitfield extract semantics, and it does that horribly horribly badly as shown by the bug on both 64-bit POWER and Sparc. For no good reason - both of those have perfectly fine word operations afaik. I did not look at other architectures that define SLOW_BYTE_ACCESS, but if they have a 32-bit integer type, I'm pretty sure they support fast loads/stores of it. Hacky? Yes. Pretty? No. But better than the disaster that is there now? Hell yes. Linus PS. The only reason I hardcoded "32" was that I didn't know how to ask the quesion "is this mode at least as wide as 'int'" in the proper gcc way. So I'm really not suggesting you apply this patch as-is, I'm just sending it out as a "hey, this is a pretty obvious way to work around part of the problem, somebody who really knows what they are doing should probably improve on it". gcc/stor-layout.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c index ea0d44d64d26..1387c0e9e060 100644 --- a/gcc/stor-layout.c +++ b/gcc/stor-layout.c @@ -2486,7 +2486,12 @@ get_best_mode (int bitsize, int bitpos, unsigned int align, && unit <= MIN (align, BIGGEST_ALIGNMENT) && (largest_mode == VOIDmode || unit <= GET_MODE_BITSIZE (largest_mode))) + { wide_mode = tmode; + /* Wide enough? */ + if (unit >= 32) + break; + } } if (wide_mode != VOIDmode)
Re: Memory corruption due to word sharing
On Wed, Feb 01, 2012 at 03:11:00PM -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 2:45 PM, Paul E. McKenney > wrote: > > > > My (perhaps forlorn and naive) hope is that C++11 memory_order_relaxed > > will eventually allow ACCESS_ONCE() to be upgraded so that (for example) > > access-once increments can generate a single increment-memory instruction > > on x86. > > I don't think that is a semantic issue. > > gcc could do it *today* with volatile accesses. It doesn't, because > volatiles are scary and basically disables a lot of optimizations. Why > would memory ordering be substantially different just because it has a > different name? I too would much prefer that gcc volatile worked more sanely. But several people, including me, pushed on that and consistently got back "the standard doesn't say we have to do that". So I got together with the standards people and now there is something (memory_order_relaxed atomics) that is specified to work the way we want it to. Of course, it will likely be quite some time before it appears in usable form in gcc, but probably quite a bit less time than we have been pushing on the gcc folks about volatile. > > New architectures might eventually might define things like atomic_inc() > > in terms of C++11 atomics, but let's start with the straightforward stuff > > as and if it makes sense. > > SMP-atomic or percpu atomic? Or both? Only SMP-atomic. > We need both variants in the kernel. If the compiler generates one of > them for us, that doesn't really much help. I must admit that the non-x86 per-CPU atomics are, ummm, "interesting". Thanx, Paul
Re: Memory corruption due to word sharing
On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney wrote: >> >> SMP-atomic or percpu atomic? Or both? > > Only SMP-atomic. And I assume that since the compiler does them, that would now make it impossible for us to gather a list of all the 'lock' prefixes so that we can undo them if it turns out that we are running on a UP machine. When we do SMP operations, we don't just add a "lock" prefix to it. We do this: #define LOCK_PREFIX_HERE \ ".section .smp_locks,\"a\"\n" \ ".balign 4\n" \ ".long 671f - .\n" /* offset */ \ ".previous\n" \ "671:" #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; " and I'm sure you know that, but I'm not sure the gcc people realize the kinds of games we play to make things work better. Sure, "everything will be SMP" some day, but running UP kernels is likely still going to remain a good idea in virtualized environments, for example. And having to make it a compile-time option is *not* a good idea. So compiler intrisics are often *worse* than doing it by hand for us for all these kinds of reasons. They aren't generally geared towards the very specialized needs that a kernel has. Of course, maybe even user space would want some kind of way to automatically strip 'lock' prefixes from a binary, so maybe the compiler would have some kind of support like this too. (And no, disassembling the binary in order to find lock prefixes is *not* the answer, at least not for the kernel) >> We need both variants in the kernel. If the compiler generates one of >> them for us, that doesn't really much help. > > I must admit that the non-x86 per-CPU atomics are, ummm, "interesting". Most non-x86 cpu's would probably be better off treating them the same as smp-atomics (load-locked + store-conditional), but right now we have this insane generic infrastructure for having versions that are irq-safe by disabling interrupts etc. Ugh. Mainly because nobody really is willing to work on and fix up the 25 architectures that really don't matter. Linus
Re: Memory corruption due to word sharing
On Thu, Feb 02, 2012 at 11:08:25AM -0800, Linus Torvalds wrote: > On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney > wrote: > >> > >> SMP-atomic or percpu atomic? Or both? > > > > Only SMP-atomic. > > And I assume that since the compiler does them, that would now make it > impossible for us to gather a list of all the 'lock' prefixes so that > we can undo them if it turns out that we are running on a UP machine. > > When we do SMP operations, we don't just add a "lock" prefix to it. We do > this: > > #define LOCK_PREFIX_HERE \ > ".section .smp_locks,\"a\"\n" \ > ".balign 4\n" \ > ".long 671f - .\n" /* offset */ \ > ".previous\n" \ > "671:" > > #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; " > > and I'm sure you know that, but I'm not sure the gcc people realize > the kinds of games we play to make things work better. > > Sure, "everything will be SMP" some day, but running UP kernels is > likely still going to remain a good idea in virtualized environments, > for example. And having to make it a compile-time option is *not* a > good idea. > > So compiler intrisics are often *worse* than doing it by hand for us > for all these kinds of reasons. They aren't generally geared towards > the very specialized needs that a kernel has. > > Of course, maybe even user space would want some kind of way to > automatically strip 'lock' prefixes from a binary, so maybe the > compiler would have some kind of support like this too. > > (And no, disassembling the binary in order to find lock prefixes is > *not* the answer, at least not for the kernel) So if the gcc guys want the Linux kernel to use their atomics, they must give it some useful way of finding all the lock prefixes on x86. Should be a fun conversation. ;-) > >> We need both variants in the kernel. If the compiler generates one of > >> them for us, that doesn't really much help. > > > > I must admit that the non-x86 per-CPU atomics are, ummm, "interesting". > > Most non-x86 cpu's would probably be better off treating them the same > as smp-atomics (load-locked + store-conditional), but right now we > have this insane generic infrastructure for having versions that are > irq-safe by disabling interrupts etc. Ugh. Mainly because nobody > really is willing to work on and fix up the 25 architectures that > really don't matter. Again, fair point! Thanx, Paul
gcc-4.5-20120202 is now available
Snapshot gcc-4.5-20120202 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20120202/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch revision 183851 You'll find: gcc-4.5-20120202.tar.bz2 Complete GCC MD5=19e9ef3f7b2862c4f51063bd46c63e7b SHA1=445f40ee0d4e4b53886b0c9dcf41c98fa9b70ac4 Diffs from 4.5-20120126 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Memory corruption due to word sharing
Jan Kara writes: > we've spotted the following mismatch between what kernel folks expect > from a compiler and what GCC really does, resulting in memory corruption on > some architectures. Consider the following structure: > struct x { > long a; > unsigned int b1; > unsigned int b2:1; > }; If this structure were volatile, you could try -fstrict-volatile-bitfields, which forces GCC to use the C type to define the access width, instead of doing whatever it thinks is optimal. Note: that flag is enabled by default for some targets already, most notably ARM.