Re: [RISC-V] vector segment load/store width as a riscv_tune_param
I am revisiting an effort to make the number of lanes for vector segment load/store a tunable parameter. A year ago, Robin added minimal and not-yet-tunable common_vector_cost::segment_permute_[2-8] But it is tunable, just not a param? :) We have our own cost structure in our downstream repo, adjusted to our uarch. I suggest you do the same or upstream a separate cost structure. I don't think anybody would object to having several of those, one for each uarch (as long as they are sufficiently distinct). BTW, just tangentially related and I don't know how sensitive your uarch is to scheduling, but with the x264 SAD and other sched issues we have seen you might consider disabling sched1 as well for your uarch? I know that for our uarch we want to keep it on but we surely could have another generic-like mtune option that disables it (maybe even generic-ooo and change the current generic-ooo to generic-in-order?). I would expect this to get more common in the future anyway. Some issues & questions: * Since this pertains only to segment load/store, why is the word "permute" in the name? The vectorizer already performs costing for the segment loads/stores (IIRC as simple loads, though). At some point the idea was to explicitly model the "segment permute/transpose" as a separate operation i.e. v0, v1, v2 = segmented_load3x3 (...) { load vtmp0; load vtmp1; load vtmp2; v0 = {vtmp0[0], v1tmp[0], v2tmp[0]}; v1 = {vtmp0[1], v1tmp[1], v2tmp[1]}; v2 = {vtmp0[2], v1tmp[2], v2tmp[2]}; } and that permute is the expensive part of the operation in 99% of the cases. That's where the wording comes from. * Nit: why are these defined as individual members rather than an array referenced as segment_permute[NF-2]? No real reason. I guess an array is preferable in several ways so feel free to change that. * I implemented tuning as a simple threshold for max NF where segment load/store is profitable. Test cases for vector segment store pass, but tests for load fail. I found that common_cost_vector::segment_permute is properly honored in the store case, but not even inspected in the load case. I will need to spelunk the autovec cost model. Clues are welcome. Could you give an example for that? Might just be a bug. Looking at gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c, however I see that the cost is adjusted for loads, though. -- Regards Robin
Re: Proposed GSoC project to enhance the GCC GNAT Ada front end
Hi Martin, We actually are planning a large project (350) hours, as Ethan plans to work essentially full time over the summer. Sorry for the confusion. I had the terminology a bit confused, and thought "large" was closer to a 6-month commitment. We will be submitting the proposal shortly. Take care, -Tuck On Mon, Mar 24, 2025 at 10:02 PM Tucker Taft wrote: > Thanks, Martin. We expect it to be a medium size project. And we will be > sure that Ethan can build, test, and debug the GNAT-FE and other GCC > components. > > Take care, > -Tuck > > On Mon, Mar 24, 2025 at 7:23 PM Martin Jambor wrote: > >> Hello, >> >> in general the project proposal looks very good. A few comments inline: >> >> On Tue, Mar 18 2025, Tucker Taft via Gcc wrote: >> > >> [...] >> > The GNAT front end is organized into three basic phases, a parser, a >> > semantic analyzer, and an expander. In the sources, these are >> represented >> > by source files par-ch*.adb, sem_ch*.{ads,adb}, and exp_ch*.{ads,adb}, >> > where "ch*" is short-hand for the various chapters of the Ada reference >> > manual. For this project the intent is to augment primarily the >> "chapter >> > 5" components, given that loop and parallel-do statements are defined in >> > chapter 5 of the Ada reference manual. For example, the function >> > Par.Ch5.P_Loop_Statement is where most of the parsing of the loop >> statement >> > takes place, while Par.Ch5.P_Parallel_Do_Statement would be a new local >> > function of Par.Ch5. Similarly, Sem_Ch5.Analyze_Loop_Statement would be >> > augmented, while Sem_Ch5.Analyze_Parallel_Do_Statement would be new. >> And >> > finally, Exp_Ch5.Expand_Loop_Statement would be augmented, while >> > Exp_Ch5.Expand_Parallel_Do_Statement would be new. >> > >> > The project would begin with the parser, simply recognizing the new >> syntax, >> > and producing debug output to report on whether the parse was >> successful. >> > The next phase would be to construct the appropriate tree >> representations, >> > which would mean updating the gen_il* set of files to include new fields >> > and node types to support the parallel for-loop and parallel do >> statement. >> > As part of doing this, pseudo-code would be developed for the semantic >> and >> > expansion phases, to ensure that the new tree components created meet >> the >> > needs of these later phases. Finally the semantic and expansion phases >> > would be implemented in Ada, using the new tree components available. >> > >> > >> > For the expansion phase, the job is primarily to transform the syntax >> into >> > the appropriate calls on out-of-line runtime components providing >> > light-weight-thread management. These routines have already been >> developed >> > and debugged, and are available on the GitHub site for the ParaSail >> > parallel programming language ( >> > https://github.com/parasail-lang/parasail/tree/main/lwt). >> > >> > As far as effort involved, the parsing functions in the GNAT Front End >> tend >> > to be quite short, often less than a page of code, making heavy use of >> > shared parsing components. The semantic phase tends to be significantly >> > more complex, but for this effort relatively little additional code >> will be >> > required, because the additional static semantics for these new features >> > are pretty simple. The major effort will be in the expansion phase, >> > primarily because the out-of-line light-weight-threading library expects >> > the loop body to be represented as a procedure. In fact the GNAT-FE >> > frequently creates simple out-of-line routines as part of implementing >> > certain existing Ada features, so there should be plenty of models to >> work >> > from to turn the loop body into an out-of-line routine. Similarly, >> > generating calls to run-time routines is also quite common as part of >> > various GNAT-FE expansion routines, so we can use those as models as >> well. >> > >> > >> > One feature of the light-weight-threading library we will be interfacing >> > with is that it sits on top of the existing Ada runtime, as well as the >> GCC >> > implementation of OpenMP, without any need to alter these underlying >> > components. >> > >> > Overall we would estimate a couple of weeks for the parsing and initial >> > semantic analysis for these two features, with the bulk of the summer >> spent >> > on the expansion phase, finalizing the semantic analysis, and if there >> is >> > time, working on error recovery. >> >> Do you envision this as a medium-sized (approx. 175 hours) or large (350 >> hours) project? >> >> > >> > As far as mentors, Tucker Taft had agreed to be a mentor. Tucker was >> the >> > lead designer of Ada 2022, and has implemented the >> light-weight-threading >> > library that we will be interfacing with. Richard Wai has also agreed >> to >> > be a mentor. Richard is an active member of the ISO-WG9 committee on >> Ada >> > and its Ada Rapporteur Group (ARG), and has also made contributions to >> GCC >
Re: [RISC-V] vector segment load/store width as a riscv_tune_param
On 3/25/25 3:47 AM, Robin Dapp via Gcc wrote: I am revisiting an effort to make the number of lanes for vector segment load/store a tunable parameter. A year ago, Robin added minimal and not-yet-tunable common_vector_cost::segment_permute_[2-8] But it is tunable, just not a param? :) We have our own cost structure in our downstream repo, adjusted to our uarch. I suggest you do the same or upstream a separate cost structure. I don't think anybody would object to having several of those, one for each uarch (as long as they are sufficiently distinct). Yea, strongly recommend this. From a user interface standpoint it's better to have the compiler select something sensible based on -mcpu or -mtune rather than having to specify a bunch of --params. In general --params should be seen as knobs compilers developers use to adjust behavior, say change a limit. They're not really meant to be something we expect users to twiddle much. For internal testing, sure, a param is great since it allows you to sweep through a bunch of values without rebuilding the toolchain. Once a sensible value is determined, you put that into the costing table for your uarch. BTW, just tangentially related and I don't know how sensitive your uarch is to scheduling, but with the x264 SAD and other sched issues we have seen you might consider disabling sched1 as well for your uarch? I know that for our uarch we want to keep it on but we surely could have another generic-like mtune option that disables it (maybe even generic- ooo and change the current generic-ooo to generic-in-order?). I would expect this to get more common in the future anyway. To expand a bit. The first scheduling pass is important for x264's SAD because by exposing the load->use latency, it will tend to create overlapping lifetimes for the pseudo registers holding the input vectors. With the overlapping lifetimes the register allocator is then forced to allocate the pseudos to different physical registers, thus ensuring parallelism. The more out-of-order capable the target design is, the less useful sched1 will be. Jeff
Re: message has lines too long for transport - Was: Fwd: Mail delivery failed: returning message to sender
On 24/03/2025 22:08, Toon Moene wrote: > On 3/24/25 17:27, Richard Earnshaw (lists) wrote: > >> On 23/03/2025 20:26, Toon Moene wrote: > >>> I had the following message when sending test results to gcc-testresults >>> *starting* today (3 times): >>> >>> Note that the message is generated by *my* exim4 "mail delivery software" >>> (Debian Testing) - it is not the *receiving* side that thinks the lines are >>> too long. >>> >>> Is anyone else seeing this ? >>> >>> >>> Forwarded Message >>> Subject: Mail delivery failed: returning message to sender >>> Date: Sun, 23 Mar 2025 18:18:28 +0100 >>> From: Mail Delivery System >>> To: t...@moene.org >>> >>> This message was created automatically by mail delivery software. >>> >>> A message that you sent could not be delivered to one or more of its >>> recipients. This is a permanent error. The following address(es) failed: >>> >>> gcc-testresu...@gcc.gnu.org >>> message has lines too long for transport >> >> https://datatracker.ietf.org/doc/html/rfc5322#section-2.1.1 >> >> So there's a 998 (sic) character limit on the length of any line. I'm >> guessing your mail posting tool is not reformatting the message and trying >> to pass it straight through. Your SMTP server then rejects it. >> >> I guess your options are to request a 'flowed' encoding type or to reformat >> the message with a filter before passing it on to your SMTP server. >> >> R. > > Thanks. > > I placed a > > fold -s -w 300 | > > filter in front of the > > mail -s > > command. > > At least I got one simple test passed to the mailing list: > > https://gcc.gnu.org/pipermail/gcc-testresults/2025-March/841892.html There's nothing in that post that looks even remotely close to 300 chars, let alone 998. So I'm not sure what would be going on. R. > > Still wonder how I could have used the original for years without having this > problem ... > > Kind regards, >
Re: Status of PDB support
On 3/24/25 9:51 PM, Tom Kacvinsky via Gcc wrote: Hi, I know that PDB support has been worked on in binutils. I thinkt he missing piece is to get GCC to emit CodeView debug information that binutils will turn into a PDBm (not sure if the work is complete in binutils, either). What's the status of this? I ask because our WIndows offering is built with MSVC for C and C++, but a MinGW-w64 built GCC for Ada compilation. Having GCC + binutils generate PDB information would enable us to use Visual Studio for one stop debugging. As it is we have to use gdb for Ada code (DWARF debug symbols) and Visual Studio C/C++ debugging (because PDB), which is a pain. gcc-15 will have a fair amount of CodeView support thanks to Mark Harmstone. I don't offhand recall if any of the early work made it into gcc-14 or not. I'm also not at all familiar with gaps in capabilities. You could always reach out to Mark to get the most up-to-date status. Jeff
Re: bounds checking / VLA types
Am Dienstag, dem 25.03.2025 um 19:09 -0700 schrieb Bill Wendling: > On Tue, Mar 18, 2025 at 3:04 PM Martin Uecker wrote: > > > > It seems clear that using "__self" is most likely going to be part of > any solution we come up with. What we need to avoid is feature skew > between GCC and Clang. It'll revert projects back into the "compiler > war" days, which isn't beneficial to anyone. Is there a compromise > that's acceptable for both GCC and Clang beyond adding "__self", but > not requiring it? Having the syntax but not requiring it does not solve the problem, because it would still require compilers to implement late parsing (a problem for many C compilers). If we require __self__ now, this would allow us to move on while leaving the door open to finding consensus later, e.g. after WG14 has considered it in the next meeting. Martin > > -bw > > > > That said this is just my own impression and whether we can make the > > > usable model this way may need more study. > > > > > > At least now I understand what is your vision. Thanks. > > > > > > > > > > > For GNU C (but not ISO C) there could be size expression in > > > > the declaration referring to automatic variables. > > > > > > > > int foo() > > > > { > > > > int n = 1; > > > > struct { > > > > char buf[n]; // evaluated when the declaration is reached > > > > } x; > > > > }; > > > > > > > > This is something which can not currently happen for ISO C where > > > > this can only refer to named constants and where the size is > > > > evaluated at compile time. > > > > > > > > int foo() > > > > { > > > > enum { N = 1 }; > > > > struct { > > > >char buf[N]; > > > > } x; > > > > } > > > > > > > > Still, I think this is another reason why it would be much > > > > better to have new syntax for this. We also need to restrict > > > > the size expressions in all three scenarios in different ways. > > > > > > > > int foo() > > > > { > > > > int n = 1; > > > > struct { > > > > > > > >int m; > > > >char buf1[n];// evaluated when the declaration is reached > > > >char buf2[.m]; // evaluated on access to buf2, restricted syntax > > > > } x { .m = 10 }; > > > > } > > > > > > > > > > > > Martin > > > > > > Yeoul > > > > >
Re: [RISC-V] vector segment load/store width as a riscv_tune_param
On Tue, Mar 25, 2025 at 2:47 AM Robin Dapp wrote: > > A year ago, Robin added minimal and not-yet-tunable > > common_vector_cost::segment_permute_[2-8] > > But it is tunable, just not a param? :) I meant "param" generically, not necessarily a command-line --param=thingy, though point taken! :) > We have our own cost structure in our > downstream repo, adjusted to our uarch. I suggest you do the same or > upstream > a separate cost structure. I don't think anybody would object to having > several of those, one for each uarch (as long as they are sufficiently > distinct). > Yes, this is what I meant by not-yet-tunable, there is currently no datapath between -mcpu/-mtune and common_vector_cost::segment_permute_*. All CPUs get the same hard-coded value of 1 for all segment_permute_* costs. > BTW, just tangentially related and I don't know how sensitive your uarch > is to > scheduling, but with the x264 SAD and other sched issues we have seen you > might > consider disabling sched1 as well for your uarch? I know that for our > uarch we > want to keep it on but we surely could have another generic-like mtune > option > that disables it (maybe even generic-ooo and change the current > generic-ooo to > generic-in-order?). I would expect this to get more common in the future > anyway. Thanks for the tip. We will look into it. > > Some issues & questions: > > > > * Since this pertains only to segment load/store, why is the word > "permute" > > in the name? > > The vectorizer already performs costing for the segment loads/stores (IIRC > as > simple loads, though). At some point the idea was to explicitly model the > "segment permute/transpose" as a separate operation i.e. > This is a different concept, so I ought to introduce a new cost param which is the threshold value of NF for fast vs. slow. > * I implemented tuning as a simple threshold for max NF where segment > > load/store is profitable. Test cases for vector segment store pass, but > > tests for load fail. I found that common_cost_vector::segment_permute > is > > properly honored in the store case, but not even inspected in the load > > case. I will need to spelunk the autovec cost model. Clues are welcome. > > Could you give an example for that? Might just be a bug. > Looking at gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c, however I > see > that the cost is adjusted for loads, though. You won't see failures in the testsuite. The failures only show-up when I attempt to impose huge costs on NF above threshold. A quick & dirty way to expose the bug is apply the appended patch, then observe that you get output from this only for mask_struct_store-*.c and not for mask_struct_load-*.c G --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -1140,6 +1140,7 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, loop_vec_info loop, int group_size = segment_loadstore_group_size (kind, stmt_info); if (group_size > 1) { + fprintf (stderr, "segment_loadstore_group_size = %d\n", group_size); switch (group_size) { case 2:
Re: bounds checking / VLA types
On Tue, Mar 18, 2025 at 3:04 PM Martin Uecker wrote: > > Am Dienstag, dem 18.03.2025 um 14:03 -0700 schrieb Yeoul Na via Gcc: > > > > > On Mar 18, 2025, at 12:48 PM, Martin Uecker wrote: > > > > > > Am Dienstag, dem 18.03.2025 um 09:52 -0700 schrieb Yeoul Na via Gcc: > > > > > > > > > On Mar 18, 2025, at 12:16 AM, Martin Uecker wrote: > > > > > > > > > > When xp->ptr is accessed, the size expression > > > > > is evaluated and the lvalue which is formed at this point in > > > > > time gets the type int(*)[xp->count], similar to how the size > > > > > expression of function arguments are evaluated when the > > > > > function is entered. The type of this expression > > > > > then does not change anymore. You could implement it > > > > > by rewriting it to the following. > > > > > > > > Yes, that’s effectively similar to how the size of bounds annotations > > > > work too. > > > > > > > > Though my question was, VLA types currently don't work this way. > > > > > > > > The size is evaluated when it’s declared as you all discussed earlier. > > > > > > this is the case only for automatic variables. For function arguments > > > they are evaluated at function entry. > > > > Well, I think saying this way sounds a bit misleading. This works this > > way because function parameters are automatic variables of the function > > and their declarations are reached at the entry of the function. > > There is no control flow which goes through the declarations > of the arguments in the function prototype, which would define > exactly when in what order these "declarations are reached", > while this is clear for function-scope declarations which > are actually reached by control flow. > > The C standard just explicitely states that on function > entry those size expressions are evaluated. > > > And by “only for automatic variables’, then you mean global > > and static variables are not or won't be working in the same > > way as the automatic variables in the future? > > I should have said objects defined at function-scope. > static variables already work like this. > > > For global and static variables, I meant “pointers" to VLA types. > > > > Also, what about a pointer to pointer to an array? Should it > > be fixed to be evaluated at the time of use? > > According to the C standard, the size expression > for the example below is evaluated on function entry. > > We try to not break existing code in C without a very > good reason, so I do not think this can be changed > (but it also seems just fine to me how it is). > > > > > int foo(int len, int (*((*inout_buf)[len])) { > > // use *inout_buf > > // ... > > > > int new_len = 10; > > *out_buf = malloc(new_len); // should it be fixed to be evaluated at the > > time of use? > > > > return new_len; > > } > > > > > > > > > > > So do you mean you want to fix the current behavior or make the > > > > > > I think the current behavior is fine, and changing it > > > would also be problematic for existing code. > > > > > > > behavior inconsistent between if they are struct members, > > > > indirect pointers vs. local variables like below? > > > > > > I think the nature of the problem requires that the size > > > expression must be evaluated at different time points > > > depending on the scenario. > > > > > > For automatic variables, this is when the declaration is > > > reached in the control flow and for function arguments when > > > the function is entered. For structure members where the size > > > expression refers to other members, they need to be > > > evaluated when the member is accessed. > > > > > > > > > > > void test(struct X *xp) { > > > >int len = 10; > > > >int (*ptr)[len] = malloc(sizeof(int) * len); // (1) int (*ptr)[len] > > > > is evaluated here and fixed. > > > > > > > >xp->count = 100; > > > >xp->ptr = malloc(sizeof(int) * 100); // size of xp->ptr is > > > > dynamically evaluated with xp->count, hence, this works fine. > > > > > > > >len = 5; // (2) > > > >ptr = malloc(sizeof(int) * len); // size of ptr is still fixed to be > > > > (1), so this traps at run time. > > > > } > > > > > > Yes. > > > > I think the fact that the behavior diverges in this subtle > > way will make things very confusing. > > > I use it in this way (using a macro for access) and think > it works very nicely. > > > Not sure adding “dot” actually helps that much with the users > > comprehending this code. > > > Based on my humble experience, I would say that it helps people > to avoid mistakes when things that do behave differently do not > look identically. > > > That’s why I was quite skeptical if VLA in C can ever be evolved > > to model the size with a struct member safely, given that VLA made > > a wrong choice from the beginning. > > I don't think a wrong choice was made. > > The alternative would be to have the type of a variable > change whenever a variable in the size expression changes. > I think this would be much harder to understand and caus
Re: message has lines too long for transport - Was: Fwd: Mail delivery failed: returning message to sender
On 3/25/25 11:32, Richard Earnshaw (lists) wrote: On 24/03/2025 22:08, Toon Moene wrote: On 3/24/25 17:27, Richard Earnshaw (lists) wrote: On 23/03/2025 20:26, Toon Moene wrote: I had the following message when sending test results to gcc-testresults *starting* today (3 times): Note that the message is generated by *my* exim4 "mail delivery software" (Debian Testing) - it is not the *receiving* side that thinks the lines are too long. Is anyone else seeing this ? Forwarded Message Subject: Mail delivery failed: returning message to sender Date: Sun, 23 Mar 2025 18:18:28 +0100 From: Mail Delivery System To: t...@moene.org This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: gcc-testresu...@gcc.gnu.org message has lines too long for transport https://datatracker.ietf.org/doc/html/rfc5322#section-2.1.1 So there's a 998 (sic) character limit on the length of any line. I'm guessing your mail posting tool is not reformatting the message and trying to pass it straight through. Your SMTP server then rejects it. I guess your options are to request a 'flowed' encoding type or to reformat the message with a filter before passing it on to your SMTP server. R. Thanks. I placed a fold -s -w 300 | filter in front of the mail -s command. At least I got one simple test passed to the mailing list: https://gcc.gnu.org/pipermail/gcc-testresults/2025-March/841892.html There's nothing in that post that looks even remotely close to 300 chars, let alone 998. So I'm not sure what would be going on. R. But this one has: https://gcc.gnu.org/pipermail/gcc-testresults/2025-March/841925.html (the failing Fortran tests). -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Re: message has lines too long for transport - Was: Fwd: Mail delivery failed: returning message to sender
On 25/03/2025 10:52, Toon Moene wrote: > On 3/25/25 11:32, Richard Earnshaw (lists) wrote: >> On 24/03/2025 22:08, Toon Moene wrote: >>> On 3/24/25 17:27, Richard Earnshaw (lists) wrote: >>> On 23/03/2025 20:26, Toon Moene wrote: >>> > I had the following message when sending test results to gcc-testresults > *starting* today (3 times): > > Note that the message is generated by *my* exim4 "mail delivery software" > (Debian Testing) - it is not the *receiving* side that thinks the lines > are too long. > > Is anyone else seeing this ? > > > Forwarded Message > Subject: Mail delivery failed: returning message to sender > Date: Sun, 23 Mar 2025 18:18:28 +0100 > From: Mail Delivery System > To: t...@moene.org > > This message was created automatically by mail delivery software. > > A message that you sent could not be delivered to one or more of its > recipients. This is a permanent error. The following address(es) failed: > > gcc-testresu...@gcc.gnu.org > message has lines too long for transport https://datatracker.ietf.org/doc/html/rfc5322#section-2.1.1 So there's a 998 (sic) character limit on the length of any line. I'm guessing your mail posting tool is not reformatting the message and trying to pass it straight through. Your SMTP server then rejects it. I guess your options are to request a 'flowed' encoding type or to reformat the message with a filter before passing it on to your SMTP server. R. >>> >>> Thanks. >>> >>> I placed a >>> >>> fold -s -w 300 | >>> >>> filter in front of the >>> >>> mail -s >>> >>> command. >>> >>> At least I got one simple test passed to the mailing list: >>> >>> https://gcc.gnu.org/pipermail/gcc-testresults/2025-March/841892.html >> >> There's nothing in that post that looks even remotely close to 300 chars, >> let alone 998. So I'm not sure what would be going on. >> >> R. > > But this one has: > > https://gcc.gnu.org/pipermail/gcc-testresults/2025-March/841925.html > (the failing Fortran tests). > LOL. Yeah, not the most memorable of test names :) R.