Re: this code in fold-const.c:fold_single_bit_test looks wrong to me
Quoting Kenneth Zadeck : if (TREE_CODE (inner) == RSHIFT_EXPR && TREE_CODE (TREE_OPERAND (inner, 1)) == INTEGER_CST && TREE_INT_CST_HIGH (TREE_OPERAND (inner, 1)) == 0 && bitnum < TYPE_PRECISION (type) && 0 > compare_tree_int (TREE_OPERAND (inner, 1), bitnum - TYPE_PRECISION (type))) { bitnum += TREE_INT_CST_LOW (TREE_OPERAND (inner, 1)); inner = TREE_OPERAND (inner, 0); } in particular, in the last stanza of the test TREE_OPERAND (inner, 1) is a positive number from the second stanza. bitnum is also always positive and less than the TYPE_PRECISION (type) from the third stanza, so bitnum - TYPE_PRECISION (type) is always negative, Not when you pass it as an "unsigned HOST_WIDE_INT", but then, this doesn't really make for sane code... so the compare will always be positive, so this code will never be executed. ... compare will almost always be negative, so this code will be executed, regardless of the validity of the shift or the bit test being in range. it is hard to believe that this is what you want. I see that this code lived previously in expr.c:store_flag_value, and was modified by a big omnibus patch there: Mon Mar 6 15:22:29 2000 Richard Kenner * expr.c .. (do_store_flag): Use compare_tree_int. .. @@ -10204,8 +10204,9 @@ do_store_flag (exp, target, mode, only_c if (TREE_CODE (inner) == RSHIFT_EXPR && TREE_CODE (TREE_OPERAND (inner, 1)) == INTEGER_CST && TREE_INT_CST_HIGH (TREE_OPERAND (inner, 1)) == 0 - && (bitnum + TREE_INT_CST_LOW (TREE_OPERAND (inner, 1)) - < TYPE_PRECISION (type))) + && bitnum < TYPE_PRECISION (type) + && 0 > compare_tree_int (TREE_OPERAND (inner, 1), + bitnum - TYPE_PRECISION (type))) { bitnum += TREE_INT_CST_LOW (TREE_OPERAND (inner, 1)); inner = TREE_OPERAND (inner, 0); I suppose that should be "TYPE_PRECISION (type) - bitnum" instead.
Re: 4.8.2 -Og vs. -O1
On 07/01/2013 02:05 AM, Joel Sherrill wrote: Have you compared it to -Os? That seems to produce assembly closer to what you would likely write by hand. I haven't benchmarked it much but it gives 7-10% smaller code in general. In many cases, fewer instructions is also a performance win. Hi Joel, Looking for every var and c/c++ instruction line visible and step-able in debugger with fastest possible code. A difficult trade-off no doubt. Currently this only happens at -O0 but speed is really bad. I haven't seen any other feedback on this list about the new -Og option. -gene Gene Smith wrote: I tried -Og optimization on a recent svn snapshot of 4.8 and don't see much difference in the code compared to -O1. If anything, at least for one case, -Og is actually less debuggable than -O1, e.g., for a simple buffer selection like this: uint8_t* buffer; if (condx == true) buffer = buf1; // buf1 is a static external buffer else buffer = buf2; // buf2 is a static external buffer uint8_t foo = buffer[1]; With -O1 there is assembly code associated with each buffer assignment statement. But with -Og there is no code under the first buffer = buf1 with it all under the 2nd buffer = buf2. So, with -Og, when stepping through the code with condx true, it appears that the wrong line is executing since the first buffer = buf1 has no code and never occurs. Of course, the result is still correct and is actually maybe more efficient or at least equal to the -O1 code, but there is no improved debug experience in this case. In this case, the debug experience with -O1 is closer to -O0 than -Og is. Also with -Og, some variables are still optimized away like -O1 and higher, but unlike -O0 where all variables are, of course, visible with the debugger (gdb). -gene
DONT_BREAK_DEPENDENCIES bitmask for scheduling
Hi, Near the start of schedule_block, find_modifiable_mems is called if DONT_BREAK_DEPENDENCIES is not enabled for this scheduling pass. It seems on c6x backend currently uses this. However, it's quite strange that this is not a requirement for all backends since find_modifiable_mems, moves all my dependencies in SD_LIST_HARD_BACK to SD_LIST_SPEC_BACK even though I don't have DO_SPECULATION enabled. Since dependencies are accessed later on from try_ready (for example), I would have thought that it would be always good not to call find_modifiable_mems, given that it seems to 'literally' break dependencies. Is the behaviour of find_modifiable_mems a bug or somehow expected? Cheers, Paulo Matos
Re: Plan for removing global state from GCC's internals
On Thu, 2013-06-27 at 20:23 +, Joseph S. Myers wrote: > On Thu, 27 Jun 2013, David Malcolm wrote: > > > I want to focus on "removal of global state", and I want that to be > > separate from "cleanups of internal APIs". There are several interpretations of the word "global" in this conversation, and I think I was unclear what I meant; sorry about that. The word "global" can refer both to visibility, and to lifetime. I'm interested in *lifetime*: when do variables get written to? Where does the value of a variable get used? For example, consider the three variable declarations in tracer.c: static int probability_cutoff; static int branch_ratio_cutoff; sbitmap bb_seen; It turns out that "bb_seen" is only used in that file, so it can be made "static" there. With that, these variables have file-local visibility, but currently have "global lifetime": they live in the .bss section of the built code, and further study of the code would be needed before a new reader (be they human or a compiler) can say when these variables change state, and where their state is used, beyond just saying "sometime during the lifetime of the process". As it happens, these variables follow a very common read/write pattern: they are initialized near the start of the "execute" hook of a pass, and cleaned up at the end of the hook (in this case, within the "tail_duplicate" function called within the "tracer" execute hook of pass_tracer). Also, none of the variables are GTY-marked. This is one of the common state-management patterns in GCC's passes: http://dmalcolm.fedorapeople.org/gcc/global-state/pass-patterns.html#per-invocation-state-with-no-gty-markings The plan there gives a way of moving this state to the stack for the shared-library case (to allow thread-safe usage), whilst keeping it in the .bss section for the traditional build case (for maximum performance, or, at least, consistent performance), with relatively little patching. By contrast, consider this declaration from tree-ssa.c: static struct pointer_map_t *edge_var_maps; Although this is marked as "static", it's used in the implementation of an internal API "redirect_edge_var_map_*" used in 5 other source files. So although this has file-local visibility, the "lifetime" of the underlying state is considerably more complicated. > Whereas I'm thinking of global state as being a symptom of a problem - > messy interfaces that have accreted over time - rather than the problem in > itself. And moving things into "universe" allows a proof-of-concept of a > shared library build (much like Joern's multi-target patches with > namespaces three years ago provided a proof-of-concept of a multi-target > build) without really addressing the real problem (basically, I think of > state in "universe" as effectively being global state, and moving state > into something passed down to the places needing it - only the relevant > bits of state, not a whole universe pointer if there's a smaller logical > unit - rather than just accessed through TLS, as being the point where > global state is *really* eliminated). >From a "lifetime" meaning of "global", the universe ceases to be "global state" in a shared-library build: there are zero or more "parallel universes" within one process, all independent of each other; such parallel universes can be created and destroyed by client code. You raise a concern about restricting where state can be used: the universe object will indeed contain a grab-bag of pointers to various other objects, and so it's possible to write code that pokes at one aspect of state from another unrelated aspect of state, using the universe object as a nexus. I wrote about this in: http://dmalcolm.fedorapeople.org/gcc/global-state/plan.html#parallel-universes-vs-modularity where my view is that for an initial iteration of this work we *need* to have such a nexus: we have a spaghetti of interactions already; I'm merely trying to support having multiple, independent plates of spaghetti, if you will, prior to distentangling. I think Andrew MacLeod's proposal is really the answer here for these concerns, and I see our proposals as compatible. There are various places in my plan where I use classes to restrict access: for example, I have a "class frontend" and "class backend"; presumably stuff could be placed there in an effort to hide things (either a "ravioli" or "lasagna" model, if that's not stretching the metaphor too far: encapsulation and layering respectively). > Now, the bulk conversion to universes seems a lot more maintainable than > Joern's multi-target patches, and a lot more plausibly an incremental step > to a proper fix, and so a lot more reasonable to go in as an incremental > step, but I'd still think of it as one of the infamous partial transitions > in the absence of a reason to believe, for each formerly-global object > being accessed via the universe (or some other piece of context), that > it's being accessed via
Re: Plan for removing global state from GCC's internals
On Mon, 1 Jul 2013, David Malcolm wrote: > > As for accessing globals directly versus via APIs: yes, I suppose you do > > still have an access to a global class instance in each place you formerly > > had a global variable (that's now a member of that class), so by itself > > such a conversion to a better API doesn't reduce the number of global > > variable accesses, just improves the interface in other ways - and it's > > the changes to pass a pointer to an instance around that reduce the global > > state usage. In the case of dump files, pass-local state may be a better > > place than the universe to keep the instance - it is after all passes.c > > that calls dump_start / dump_finish. > > So a pass instance should have its own dump_flags, and various dump > methods? Perhaps, but as before, I'd prefer to fix the state issue Yes (or rather, the pass instance should contain an instance of the dumper class, which in turn has dump_flags and dump_file members) - as far as I can tell, the lifetime of dump_file and dump_flags is already basically per-pass rather than global. > Would you be in favor killing off these macros: > #define input_line LOCATION_LINE (input_location) > #define input_filename LOCATION_FILE (input_location) > #define in_system_header (in_system_header_at (input_location)) > with patches that make the usage of "input_location" explicit? (by > replacing all uses of these macros with their expansions, cleaning up > line-wraps as needed). Yes. > The only other macro that implicitly uses input_location is > EXPR_LOC_OR_HERE; that could be removed in favor of: > EXPR_LOC_OR_LOC(expr, input_location) > again making input_location explicit. (I suspect then eliminating the input_location from this - ensuring all expressions have meaningful locations so EXPR_LOC_OR_LOC isn't needed at all - will depend on Andrew MacLeod's proposal. It doesn't explicitly mention this, but one thing that would be desirable as part of making front ends generate internal representation closer to the source would be explicitly representing locations for constants, and for references to declarations within expressions, so that everywhere that wants a location for an expression can reliably extract one from it rather than finding there is no location because certain expressions are shared.) -- Joseph S. Myers jos...@codesourcery.com
Re: Plan for removing global state from GCC's internals
I started to do this starting with the C++ parser class'izing it but no one was interested. On 1 July 2013 20:43, Joseph S. Myers wrote: > On Mon, 1 Jul 2013, David Malcolm wrote: > >> > As for accessing globals directly versus via APIs: yes, I suppose you do >> > still have an access to a global class instance in each place you formerly >> > had a global variable (that's now a member of that class), so by itself >> > such a conversion to a better API doesn't reduce the number of global >> > variable accesses, just improves the interface in other ways - and it's >> > the changes to pass a pointer to an instance around that reduce the global >> > state usage. In the case of dump files, pass-local state may be a better >> > place than the universe to keep the instance - it is after all passes.c >> > that calls dump_start / dump_finish. >> >> So a pass instance should have its own dump_flags, and various dump >> methods? Perhaps, but as before, I'd prefer to fix the state issue > > Yes (or rather, the pass instance should contain an instance of the dumper > class, which in turn has dump_flags and dump_file members) - as far as I > can tell, the lifetime of dump_file and dump_flags is already basically > per-pass rather than global. > >> Would you be in favor killing off these macros: >> #define input_line LOCATION_LINE (input_location) >> #define input_filename LOCATION_FILE (input_location) >> #define in_system_header (in_system_header_at (input_location)) >> with patches that make the usage of "input_location" explicit? (by >> replacing all uses of these macros with their expansions, cleaning up >> line-wraps as needed). > > Yes. > >> The only other macro that implicitly uses input_location is >> EXPR_LOC_OR_HERE; that could be removed in favor of: >> EXPR_LOC_OR_LOC(expr, input_location) >> again making input_location explicit. > > (I suspect then eliminating the input_location from this - ensuring all > expressions have meaningful locations so EXPR_LOC_OR_LOC isn't needed at > all - will depend on Andrew MacLeod's proposal. It doesn't explicitly > mention this, but one thing that would be desirable as part of making > front ends generate internal representation closer to the source would be > explicitly representing locations for constants, and for references to > declarations within expressions, so that everywhere that wants a location > for an expression can reliably extract one from it rather than finding > there is no location because certain expressions are shared.) > > -- > Joseph S. Myers > jos...@codesourcery.com