Re: RFC: Formalization of the Intel assembly syntax (PR53929)
On Thu, Jan 18, 2024 at 5:42 PM LIU Hao wrote: > > 在 2024-01-18 17:02, Fangrui Song 写道: > > Thanks for the proposal. I hope that -masm=intel becomes more useful:) > > > > Do you have a list of assembly in the unambiguous cases that fail to > > be parsed today as a gas PR? > > For example, > > Not really. Most of these are results from high-level languages. For example: > > # Expected: `movl shr(%rip), %eax` > # Actual: error: invalid use of operator "shr" > mov eax, DWORD PTR shr[rip] > > # Expected: `movl dword(%rip), %eax` > # Actual: accepted as `movl 4(%rip), %eax` > mov eax, DWORD ptr dword[rip] GCC seems to print a symbol displacement, possibly with a modifier (for a relocation), before the left bracket. mov edx, DWORD PTR bx@GOT[eax] mov edx, DWORD PTR bx[eax] mov edx, DWORD PTR and[eax]# Error: invalid use of operator "and" Technically, assemblers (gas and LLVM integrated assembler) can be made to parse "bx" as a symbol, even if it matches a register name or an operator name ("and"). However, a straightforward approach using one lookahead token cannot disambiguate the following two cases. mov edx, DWORD PTR fs:[eax] # segment override prefix mov edx, DWORD PTR fs[eax]# symbol So, we would need two lookahead tokens... (https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L2534-L2550 needs more code to parse `fs:` correctly.) It is also unfortunate that whether the displacement is an immediate or not change the behavior of brackets. mov eax, DWORD PTR 0 # mov$0x0,%eax mov eax, DWORD PTR [0]# mov0x0,%eax mov eax, DWORD PTR sym# mov0x0,%eax with relocation mov eax, DWORD PTR [sym] # mov0x0,%eax with relocation The above reveals yet another inconsistency. For a memory reference, it seems that we should use [] but [sym] could be ambiguous if sym matches a register name or operator name. Does the proposal change the placement of the displacement depending on whether it is an immediate? This is inconsistent, but perhaps there is not much we can improve... extern int a[2]; int foo() { return a[1]+a[2]; } GCC's PIC -masm=intel output mov eax, DWORD PTR a[rip+8] add eax, DWORD PTR a[rip+4] The displacements (a+8 and a+4) involve a plus expression and `a` and `8`/`4` are printed in two places. > In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to > `.intel_syntax noprefix`: > > $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o > {standard input}: Assembler messages: > {standard input}:1: Error: invalid use of register > > $ as <<< '.intel_syntax noprefix; mov eax, DWORD PTR gs:0x48' -o a.o && > objdump -Mintel -d a.o > ... > <.text>: >0: 65 8b 04 25 48 00 00moveax,DWORD PTR gs:0x48 Confirmed by Jan.
Re: RFC: Formalization of the Intel assembly syntax (PR53929)
On 18.01.2024 17:40, LIU Hao wrote: > 在 2024-01-18 20:54, Jan Beulich 写道: >> I'm sorry, but most of your proposal may even be considered for being >> acceptable only if you would gain buy-off from the MASM guys. Anything >> MASM treats as valid ought to be permitted by gas as well (within the >> scope of certain divergence that cannot be changed in gas without >> risking to break people's code). It could probably be considered to >> introduce a "strict" mode of Intel syntax, following some / most of >> what you propose; making this the default cannot be an option. > > Thanks for your reply. > > I have attached the Markdown source for that page, modified a few hours ago. > I am planning to make > some updates according to your advice tomorrow. Just to mention it: Attaching is in no way better than providing a link, commenting-wise. > And yes, I am proposing a 'strict' mode, however not for humans, only for > compilers. > > My first message references a GCC bug report, where the problematic symbol > `bx` comes from C source. > I have been aware of the `/APP` and `/NO_APP` markers in generated assembly, > so I suspect that GAS > should be able to tell which parts are generated from a compiler and which > parts are composed by > hand. The proposed strict mode may apply only to the output from GCC, which > are much more likely to > contain bad symbols, but are also more controllable on the GCC side. > > I believe that skillful people who write x86 assembly have known that > `offset`, `shr`, `si` etc. are > 'bad' names for symbols. Therefore, it's like an issue there. > > >> Commenting on individual aspects of your proposal is a little difficult, >> as you didn't provide the proposal inline (and hence it cannot be easily >> used as context in a reply). But to mention the imo worst aspect: >> Declaring >> >> mov eax, [rcx] >> >> as invalid is a no-go. > > I agree. I am considering to declare the lack of a symbol as a special case. Well, I took this as the simplest example. But clearly there should never be a need for an assembly programmer to needlessly write "dword ptr" or alike, when operand size is unambiguous. Limiting "strict mode" to compiler output would take away concerns in this regard (as machine generated assembly has no issue with uniformly adding such redundant specifiers, much like in AT&T mode suffixes would typically be emitted even when not needed). But I see a severe issue with your aim at confining strict mode to compiler generated code only: In inline assembly (see your mentioning of APP / NO_APP above) you still potentially reference C symbols. So the ambiguities don't disappear in APP / NO_APP regions. >> I also don't see how this would be related to the >> issue at hand. What's in the square brackets may as well be a symbol >> name, so requiring the "mode specifier" doesn't disambiguate things at >> all. > > If someone declares a variable called `rcx` in C, it has be translated to > > mov eax, DWORD PTR rcx # `movl rcx, %eax` > > instead of > > mov eax, DWORD PTR [rcx]# `movl (%rcx), %eax` And an array happening to be indexed by rcx would then result in mov eax, DWORD PTR rcx[rcx]# `movl rcx(%rcx), %eax` ? That's going to be confusing at best. I think this whole issue needs taking care of differently, and iirc I did already suggest an alternative in one of the bugzilla entries involved: Potentially ambiguous names (which to a compiler may mean: all symbol names) ought to simply be quoted, and it ought to be specified that quoted symbols are never registers. Iirc this will require gas changes, yes, but it'll address all ambiguities afaict. Jan
Re: GCC GSoC 2024: Call for project ideas and mentors
Hi Martin, On 1/15/24 18:48, Martin Jambor wrote: Hello, another year has passed, Google has announced there will be again Google Summer of Code (GsoC) in 2024 and the deadline for organizations to apply is already approaching (February 6th). I'd like to volunteer to be the main org-admin for GCC again but let me know if you think I shouldn't or that someone else should or if you want to do it instead. Otherwise I'll assume that I will and I hope that I can continue to rely on David Edelsohn and Thomas Schwinge to back me up and help me with some decision making along the way as my co-org-admins. I think that'd be good :) we've really appreciated all the work you've done for the past editions. We'll be discussing project ideas with the rest of the gccrs team and will update the page shortly. We'd love to mentor again this year. Thanks for getting in touch, Best, Arthur The most important bit: I would like to ask all (moderately) seasoned GCC contributors to consider mentoring a contributor this year and ideally also come up with a project that they would like to lead. I'm collecting proposal on our wiki page https://gcc.gnu.org/wiki/SummerOfCode - feel free to add yours to the top list there. Or, if you are unsure, post your offer and project idea as a reply here to the mailing list. Additionally, if you have added an idea to the list in the recent years, please review it whether it is still up-to-date or needs adjusting or should be removed altogether. = At this point, we need to collect list of project ideas. Eventually, each listed project idea should have: a) a project title, b) more detailed description of the project (2-5 sentences), c) expected outcomes (we do have a catch-almost-all formulation that outcome is generally patches at the bottom of the list on the wiki), d) skills required/preferred, e) project size - whether it is expected to take approximately 350, 175 or just 90 hours (the last option in new in 2024, see below), f) difficulty (easy, hard or medium, but we don't really have easy projects), and g) expected mentors. Project ideas that come without an offer to also mentor them are always fun to discuss, by all means feel free to reply to this email with yours and I will attempt to find a mentor, but please be aware that we can only use the suggestion it if we actually find one or ideally two. Everybody in the GCC community is invited to go over https://gcc.gnu.org/wiki/SummerOfCode and remove any outdated or otherwise bad project suggestions and help improve viable ones. Finally, please continue helping (prospective) students figure stuff out about GCC like you have always done in the past. As far as I know, GSoC 2024 should be quite similar to the last year, the most important parameters probably are these: - Contributors (formerly students) must either be full-time students or be "beginners to open source." - There are now three project sizes: roughly 90 hors (small), roughly 175 hours (medium-sized) and roughly 350 hours (large) of work in total. The small option is new this year but because our projects usually have a lengthy learning period, I think we will usually want to stick to the medium and large variants. - Timing should be pretty much as flexible as last year. The recommended "standard" duration is 12 weeks but depending on contributor's and mentor's needs and circumstances, projects can take anywhere between 10 and 22 weeks. There will be one mid-term and one final evaluation. For further details you can see: - The announcement of GSoC 2024: https://opensource.googleblog.com/2023/11/google-summer-of-code-2024-celebrating-20th-year.html - GSoC rules: https://summerofcode.withgoogle.com/rules - The detailed GSoC 2024 timeline: https://developers.google.com/open-source/gsoc/timeline - Elaborate project idea guidelines: https://google.github.io/gsocguides/mentor/defining-a-project-ideas-list Thank you very much for your participation and help. Let's hope we attract some great contributors again this year. Martin -- Arthur Cohen Toolchain Engineer Embecosm GmbH Geschäftsführer: Jeremy Bennett Niederlassung: Nürnberg Handelsregister: HR-B 36368 www.embecosm.de Fürther Str. 27 90429 Nürnberg Tel.: 091 - 128 707 040 Fax: 091 - 128 707 077 OpenPGP_0x1B3465B044AD9C65.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
gcc-12-20240119 is now available
Snapshot gcc-12-20240119 is now available on https://gcc.gnu.org/pub/gcc/snapshots/12-20240119/ and on various mirrors, see https://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 12 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-12 revision a7f6fe38d5421e5426da2b3ec34278b9e4146eb2 You'll find: gcc-12-20240119.tar.xz Complete GCC SHA256=d102b0d222360c73fd22309b04c5b3d0fea12d7d31ebaf109583f060ef1f4f2a SHA1=2acbfc02fdb19baf6f61c975b27b691e0dce38b0 Diffs from 12-20240112 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-12 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: lambda coding style
On Wed, 10 Jan 2024, Jason Merrill via Gcc wrote: > On 1/10/24 15:59, Marek Polacek wrote: > > On Wed, Jan 10, 2024 at 02:58:03PM -0500, Jason Merrill via Gcc wrote: > > > What formatting style do we want for non-trivial lambdas in GCC sources? > > > I'm thinking the most consistent choice would be > > > > > > auto l = [&] (parms) // space between ] ( > > >{ // brace on new line, indented two spaces > > > return stuff; > > >}; > > > > Sure, why not. Consistency is what matters. Thus far we seem > > to have been very inconsistent. ;) > > > > > By default, recent emacs lines up the { with the previous line, like an > > > in-class function definition; I talked it into the above indentation with > > > > > > (defun lambda-offset (elem) > > >(if (assq 'inline-open c-syntactic-context) '+ 0)) > > > (add-to-hook 'c++-mode-hook '(c-set-offset 'inlambda 'lambda-offset)) > > > > > > I think we probably want the same formatting for lambdas in function > > > argument lists, e.g. > > > > > > algorithm ([] (parms) > > >{ > > > return foo; > > >}); > > > > And what about lambdas in conditions: > > > > if (foo () > > && [&] (params) mutable > > { > > return 42; > > } ()) > > > > should the { go just below [? Also, what about trailing-type and mutable (above) when needing a line-break? (FTR: in technical terms, trailing-type is known as the pointy-arrow-declaring-return-type thing :) the optional "-> type" between "(parms)" and "{ body }") I suggest the obvious (to me): line up stuff after (params) with the opening brace for body, when needing a line-break before that item, and line-break *before* "->" . brgds, H-P