[Bug c++/56715] New: Explicit Reg Vars are being ignored for consts when using g++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56715 Bug #: 56715 Summary: Explicit Reg Vars are being ignored for consts when using g++ Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: goswin-...@web.de Created attachment 29714 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29714 example source that experiences the bug I'm trying to pass a value to an `asm' operand using a specific register for arm with a freestanding compiler. Following the example from the info pages I have the following code: void foo() { register const int r4 asm("r4") = 0x1000; asm volatile("swi #1" : : "r"(r4)); } void bar() { register int r4 asm("r4") = 0x1000; asm volatile("swi #1" : : "r"(r4)); } Both foo() and bar() compile correct when using gcc. But when using g++ the foo() function suddenly uses the "r3" register instead of "r4". The bar() function remains correct. % arm-none-eabi-g++ -v Using built-in specs. COLLECT_GCC=arm-none-eabi-g++ COLLECT_LTO_WRAPPER=/usr/local/cross/libexec/gcc/arm-none-eabi/4.7.2/lto-wrapper Target: arm-none-eabi Configured with: ../gcc-4.7.2/configure --target=arm-none-eabi --prefix=/usr/local/cross --disable-nls --enable-languages=c,c++ --without-headers Thread model: single gcc version 4.7.2 (GCC) % arm-none-eabi-gcc -O2 -save-temps -S bug.c good code % arm-none-eabi-g++ -O2 -save-temps -S bug.c bad code -- _Z3foov: .fnstart .LFB0: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #4096 @ 3 "bug.c" 1 swi #1 @ 0 "" 2 bx lr -- The source explicitly asked for "r4" but g++ uses r3 instead.
[Bug c++/56715] Explicit Reg Vars are being ignored for consts when using g++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56715 --- Comment #2 from Goswin von Brederlow 2013-03-25 00:07:19 UTC --- (In reply to comment #1) > const is a bit special in C++, it can be used as part of a const integer > expression which is what is happening here. How does that make it right to ignore the register specification? Or how do you specify which register to use to pass the constant to asm in a specific register? To me it seems wrong to ignore the asm("r4") without even a warning. This does break asm() statements that expect specific registers to be used.
[Bug c++/56715] Explicit Reg Vars are being ignored for consts when using g++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56715 Goswin von Brederlow changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | --- Comment #5 from Goswin von Brederlow 2013-03-25 11:11:52 UTC --- If it is invalid, as in not allowed, then I would expect an error. If it is undefined behaviour then I would expect a warning. For example: register const int r4 asm("r4") = 0x1000; Warning: const expression wont be bound to a specific register.
[Bug target/66960] Add interrupt attribute to x86 backend
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66960 --- Comment #20 from Goswin von Brederlow --- So it's been a year since my last comment. Is this dead or is someone still working on it? It would be a nice addition to gcc.
[Bug c/65668] New: gcc does not know how to use __eabi_uldivmod properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65668 Bug ID: 65668 Summary: gcc does not know how to use __eabi_uldivmod properly Product: gcc Version: 4.9.2 URL: https://gist.github.com/mrvn/0c79b146f74c28da401f Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de Build: arm-none-eabi I have a uint64_t free running counter with a frequenzy of 1Mhz and I want to print that as hours, minutes, seconds and fraction: volatile uint64_t count = 0x62a54bc4 // for example uint64_t t = count; uint32_t frac, seconds, minutes, hours; frac = t % 100; t /= 100; seconds = t % 60; t /= 60; minutes = t % 60; t /= 60; hours = t; This results in 6 calls to __eabi_uldivmod, one for every modulo and every division, instead of just 3 calls. With long division being rather expensive that is a substantial waste of time.
[Bug web/65699] New: online docs lacks version that it documents
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65699 Bug ID: 65699 Summary: online docs lacks version that it documents Product: gcc Version: unknown URL: https://gcc.gnu.org/onlinedocs/gccint/ Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: web Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de CC: goswin-v-b at web dot de The online docs do not mention what version of the compiler they document. When something doesn't work as documented this makes it hard to see if that something is no longer valid in the local version or describes something not yet present in the local version.
[Bug web/65700] New: Documentation of internals is inconsistent in itself and diverges from reality
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65700 Bug ID: 65700 Summary: Documentation of internals is inconsistent in itself and diverges from reality Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: web Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de https://gcc.gnu.org/onlinedocs/gccint/Collect2.html says when collect2 is used it generates a temporary file listing the constructors and destructors and that the actual calls to the constructors are done from __main(). https://gcc.gnu.org/onlinedocs/gccint/Initialization.html now tells a quite different storry, including the .ctros/.dtors that are actually used on x86/x86_64. It still mentions __main() in connection with collect2 being used. On ARM what actually happens is that there is a .init_array section and the libc startup files are supposed to process that itself. Despite collect2 being used there is no __main() function that gets called for this. There is no .init section but still gcc does NOT insert a call to __main() when compiling main() like the docs say it would. Further the .init_array does not hold the constructors in reverse order. It actually holds a automatic constructor generated by gcc first and then all the functions manually declared as constructors. Care must be taken by the linker script to sort them by priority or they are random. So in the case of ARM the cinstructors need to be called in order, not in reverse order. Overall I have to say the documentation confuses things more than it actually helps. I don't know if that is because it hasn't been updated in a long time or never was complete or internally consistent in the first place. But it sure could use some love. If they can't be improved please at least add a comment where they are outdated or when they where last synced against the source so it becomes clear to the reader where they are lacking.
[Bug web/65699] online docs lacks version that it documents
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65699 --- Comment #4 from Goswin von Brederlow --- Yes, a simple statement like that was exactly what I had in mind.
[Bug c++/65199] New: Linker failure with -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65199 Bug ID: 65199 Summary: Linker failure with -flto Product: gcc Version: 4.8.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de Host: x86_64-linux-gnu Target: x86_64-linux-gnu Build: arm-none-eabi I'm building a bare-metal kernel for a Raspberry Pi 2 (armv7) in c++. At some point this failed with "undefined reference to `memcpy'" so I implemented one as extern "C" void * memcpy(void *dest, const void *src, uint32_t n). But that gives a different error: % arm-none-eabi-g++ -O2 -W -Wall -fPIE -flto -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -ffreestanding -nostdlib -std=gnu++11 -fno-exceptions -fno-rtti -c -o main.o main.cc % arm-none-eabi-g++ -fPIE -nostdlib -O2 -flto boot.o font.o main.o -lgcc -Tlink-arm-eabi.ld -o kernel.elf `memcpy' referenced in section `.text' of /tmp/cc7IkgU6.ltrans0.ltrans.o: defined in discarded section `.text' of main.o (symbol from plugin) collect2: error: ld returned 1 exit status Running the same command to link but without -flto succeeds: % arm-none-eabi-g++ -fPIE -nostdlib -O2 boot.o font.o main.o -lgcc -Tlink-arm-eabi.ld -o kernel.elf
[Bug c++/65199] Linker failure with -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65199 --- Comment #2 from Goswin von Brederlow --- That fixes it. Isn't it a gcc bug though not to detect that itself?
[Bug lto/65252] New: Link time optimization breaks use of filenames in linker scripts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65252 Bug ID: 65252 Summary: Link time optimization breaks use of filenames in linker scripts Product: gcc Version: 4.8.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de CC: goswin-v-b at web dot de Host: x86_64-linux-gnu Target: x86_64-linux-gnu Build: arm-none-eabi I'm building a kernel for a Rapsberry Pi 2 with -flto. Most of the code will be linked to 0x8000. The kernel image will be loaded to 0x8000 and I have set up LMA and VMA in my linker script accordingly. But I have some bootstrap code (boot.S and early.cc) that needs to at the physical address. So I put the following in my linker script: ENTRY(_start) PHYS_TO_VIRT = 0x8000; SECTIONS { . = 0x8000; .early : { boot.o(.*) early.o(.*) } /* rest of the code runs in higher half virtual address */ . = . + PHYS_TO_VIRT; .text : AT(ADDR(.text) - PHYS_TO_VIRT) { ... Using objdump -d I see the boot.o contents show up at 0x8000 exactly as it should. But all the code from early.o only appears later in the .text section and at the virtual adress. If I drop the -flto then everything works as expected. It would be nice if -flto could preserve which file each function and variable comes from so the linker can place them properly.
[Bug lto/65252] Link time optimization breaks use of filenames in linker scripts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65252 --- Comment #2 from Goswin von Brederlow --- As long as it's only one C/C++ file that works. But if one has multiple files then -fno-lto would optimize less. I was thinking of a more general case than mine.
[Bug lto/65262] New: Link time optimization breaks use __attribute__((section("..."))) in templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65262 Bug ID: 65262 Summary: Link time optimization breaks use __attribute__((section("..."))) in templates Product: gcc Version: 4.8.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de CC: goswin-v-b at web dot de Created attachment 34911 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34911&action=edit Simple testcase I'm trying to put a template member functions of a class into a different section. Without -flto this works but with -flto the function reverts to the .text section. g++ -O2 -W -Wall -fvisibility=hidden -fno-inline -Tlink.ld -c -o main.o main.cc g++ -O2 -W -Wall -fvisibility=hidden -fno-inline -Tlink.ld -o main main.o g++ -O2 -W -Wall -fvisibility=hidden -fno-inline -Tlink.ld -flto -c -o main.lto.o main.cc g++ -O2 -W -Wall -fvisibility=hidden -fno-inline -Tlink.ld -flto -o main.lto main.lto.o Without link time optimization: 0200 ld .text.foo .text.foo 0210 g F .text.foo 0006 .hidden foo() 0200 wF .text.foo 0006 .hidden int foobar() With link time optimization: 0820 ld .text.foo .text.foo 0100 l F .text 0006 int foobar() 0820 l F .text.foo 0006 foo()
[Bug lto/65262] Link time optimization breaks use __attribute__((section("..."))) in templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65262 --- Comment #2 from Goswin von Brederlow --- The linker script is only there because the default script combines all .text.* into one hiding the effect. One could use different section names that the default script does not combine and work without a custom linker script. LTO is free to privatizes template instantiations. But if it doesn't inline the template then it should preserve the section attribute on it like it does for normal functions. All optimized clones of a normal functions are still in the same section the original function was in. I could understand if a template would end up in the section of the function causing the instantiation (although what if functions from different sections use the same instance?). But templates simply end up in the .text section no matter what they where originally or where they get instantiated. I don't know the internals but it looks to me like something should copy the section attribute from the template to the privatized function in LTO mode. You can't set a section on the template, you can't use a file scope in the linker, you can't even use __attribute__((always_inline)) and the behaviour differs from without -flto. How is that a WONTFIX?
[Bug target/66960] Add interrupt attribute to x86 backend
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66960 Goswin von Brederlow changed: What|Removed |Added CC||goswin-v-b at web dot de --- Comment #11 from Goswin von Brederlow --- I think the design is fundamentally lacking in the following points: 1. interrupt handler must be declared with a mandatory pointer argument: struct interrupt_frame; __attribute__ ((interrupt)) void f (struct interrupt_frame *frame) { ... } and user must properly define the structure the pointer pointing to. First how does one define the struct interrupt_frame properly? What is in there? Is that just the data the CPU pushes to the stack? If so then gcc should define the structure somewhere so code can be written cpu independent. Since the frame pointer is passed as argument I assume the function prolog will save the first argument register (on amd64) to stack. Is that to be included in the struct interrupt_frame? Secondly how does one access the original register contents? Some kernel designs use a single kernel stack and switch tasks when returning to user space. That means that one has to copy all the user registers into the thread structure and reload a new set of user registers from the new thread on exit from the interrupt handler. The above interface would not allow this. 2. exception handler: The exception handler is very similar to the interrupt handler with a different mandatory function signature: typedef unsigned int uword_t __attribute__ ((mode (__word__))); struct interrupt_frame; __attribute__ ((interrupt)) void f (struct interrupt_frame *frame, uword_t error_code) { ... } and compiler pops the error code off stack before the 'IRET' instruction. In a kernel there will always be some exception that simply prints a register dump and stack backtrace. So again how do you access the original register contents? Secondly why pass error_code as argument if is already on the stack and could be accessed through the frame pointer? The argument register (amd64) would have to be saved on the stack too causing an extra push/pop. But if it is passed as argument then why not pop it before the call to keep the stack frame the same as for interrupts (assuming the saved registers are not included in the frame)? If it is not poped or saved registers are included in the frame then the exception stack frame differs from the interrupt frame (extra error_code and extra register). They should not use the same structure, that's just confusing. 'no_caller_saved_registers' attribute Use this attribute to indicate that the specified function has no caller-saved registers. That is, all registers are callee-saved. Does that include the argument registers (if the function takes arguments)? Wouldn't it be more flexible to define a list of registers that the function will clobber?
[Bug target/66960] Add interrupt attribute to x86 backend
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66960 --- Comment #13 from Goswin von Brederlow --- (In reply to H.J. Lu from comment #12) > (In reply to Goswin von Brederlow from comment #11) > > I think the design is fundamentally lacking in the following points: > > > > 1. interrupt handler must be declared with a mandatory pointer argument: > > > > struct interrupt_frame; > > > > __attribute__ ((interrupt)) > > void > > f (struct interrupt_frame *frame) > > { > > ... > > } > > > > and user must properly define the structure the pointer pointing to. > > > > First how does one define the struct interrupt_frame properly? What is in > > there? Is that just the data the CPU pushes to the stack? If so then gcc > > should define the structure somewhere so code can be written cpu > > independent. > > interrupt data is pushed onto stack by CPU: > > struct interrupt_frame > { > uword_t ip; > uword_t cs; > uword_t flags; > uword_t sp; > uword_t ss; > }; > > However, void * works if you need to access interrupt data. Interrupt > handler should provide its working definition. > > > Since the frame pointer is passed as argument I assume the function prolog > > will save the first argument register (on amd64) to stack. Is that to be > > included in the struct interrupt_frame? > > No. The interrupt frame pointer points to interrupt data on stack > pushed by CPU. > > > Secondly how does one access the original register contents? Some kernel > > designs use a single kernel stack and switch tasks when returning to user > > space. That means that one has to copy all the user registers into the > > thread structure and reload a new set of user registers from the new thread > > on exit from the interrupt handler. The above interface would not allow > > this. > > The interrupt attribute provides a way to access interrupt data on stack > pushed by CPU, nothing more and nothing less. That design seriously limits the usability of this feature. > > > > 2. exception handler: > > > > The exception handler is very similar to the interrupt handler with a > > different mandatory function signature: > > > > typedef unsigned int uword_t __attribute__ ((mode (__word__))); > > > > struct interrupt_frame; > > > > __attribute__ ((interrupt)) > > void > > f (struct interrupt_frame *frame, uword_t error_code) > > { > > ... > > } > > > > and compiler pops the error code off stack before the 'IRET' > > instruction. > > > > In a kernel there will always be some exception that simply prints a > > register dump and stack backtrace. So again how do you access the original > > register contents? > > You need to do that yourself. Which means __attribute__ ((interrupt)) can't be used for exceptions half the time. > > Secondly why pass error_code as argument if is already on the stack and > > could be accessed through the frame pointer? The argument register (amd64) > > would have to be saved on the stack too causing an extra push/pop. But if it > > is passed as argument then why not pop it before the call to keep the stack > > frame the same as for interrupts (assuming the saved registers are not > > included in the frame)? > > error_code is a pseudo parameter, which is mapped to error code on stack > pushed by CPU. You can write a simple code to see it yourself. Couldn't the same trick be used for registers? Pass them as pseudo parameters and they either resolve to the location on the stack where gcc did save them or become the original register unchanged. > > If it is not poped or saved registers are included in the frame then the > > exception stack frame differs from the interrupt frame (extra error_code and > > extra register). They should not use the same structure, that's just > > confusing. > > > > 'no_caller_saved_registers' attribute > > > > Use this attribute to indicate that the specified function has no > > caller-saved registers. That is, all registers are callee-saved. > > > > Does that include the argument registers (if the function takes arguments)? > > Yes. > > > Wouldn't it be more flexible to define a list of registers that the function > > will clobber? > > How do programmer know which registers will be clobbered? The programmer writes the function. He declares what registers will be clobbered and gcc will add the necessary code to preserve any other registers it uses inside the function.
[Bug target/66960] Add interrupt attribute to x86 backend
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66960 --- Comment #15 from Goswin von Brederlow --- (In reply to H.J. Lu from comment #14) > (In reply to Goswin von Brederlow from comment #13) > > > > Secondly why pass error_code as argument if is already on the stack and > > > > could be accessed through the frame pointer? The argument register > > > > (amd64) > > > > would have to be saved on the stack too causing an extra push/pop. But > > > > if it > > > > is passed as argument then why not pop it before the call to keep the > > > > stack > > > > frame the same as for interrupts (assuming the saved registers are not > > > > included in the frame)? > > > > > > error_code is a pseudo parameter, which is mapped to error code on stack > > > pushed by CPU. You can write a simple code to see it yourself. > > > > Couldn't the same trick be used for registers? Pass them as pseudo > > parameters and they either resolve to the location on the stack where gcc > > did save them or become the original register unchanged. > > No. We only do it for data pushed onto stack by CPU. I was thinking of something like: __attribute__ ((interrupt("save_regs"))) void f (struct interrupt_frame *frame, uword_t error_code, struct regs regs) { kprintf("user SP = %#016x\n", regs.sp); }
[Bug target/66960] Add interrupt attribute to x86 backend
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66960 --- Comment #17 from Goswin von Brederlow --- (In reply to H.J. Lu from comment #16) > (In reply to Goswin von Brederlow from comment #15) > > > > No. We only do it for data pushed onto stack by CPU. > > > > I was thinking of something like: > > > > __attribute__ ((interrupt("save_regs"))) > > void > > f (struct interrupt_frame *frame, uword_t error_code, struct regs regs) > > { > > kprintf("user SP = %#016x\n", regs.sp); > > } > > It is an interesting idea. But frame and err_code are created by caller, > which is CPU, not by callee. You want to not only save all original > registers of interrupted process, but also make them available to interrupt > handler. This won't be supported without significant changes in > infrastructure. Is it a significant change? On a normal function gcc creates a stackframe and pushes callee saved registers that it later uses onto the stack. I'm suggesting doing much the same with 2 small changes: 1) push all registers unconditionally 2) make the address where the registers got pushed to known to the function
[Bug c/104828] New: Wrong out-of-bounds array access warning on literal pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104828 Bug ID: 104828 Summary: Wrong out-of-bounds array access warning on literal pointers Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de Target Milestone: --- Trying to access a pointer cast from an integer literal gives a out of bounds warning: -- #define UART0_BASE 0x3F201000 void putc(char c) { volatile unsigned int *UART0_DR = (volatile unsigned int *)(UART0_BASE); volatile unsigned int *UART0_FR = (volatile unsigned int *)(UART0_BASE + 0x18); while (*UART0_FR & (1 << 5) ) { } *UART0_DR = c; } -- :5:3: warning: array subscript 0 is outside array bounds of 'volatile unsigned int [0]' [-Warray-bounds] 5 | *UART0_DR = c; | ^ The error goes away if the pointer is global or static. The error remains if the pointer is returned from a function with alloc_size attribute: -- #include #include #define UART0_BASE 0x3F201000 volatile uint32_t * make(uintptr_t addr, size_t size = 4) __attribute__ ((alloc_size (2))); volatile uint32_t * make(uintptr_t addr, size_t size) { (void)size; return (volatile uint32_t *)addr; } void putc(char c) { volatile uint32_t *UART0_DR = make(UART0_BASE); volatile uint32_t *UART0_FR = make(UART0_BASE + 0x18); while (*UART0_FR & (1 << 5) ) { } *UART0_DR = c; } -- :16:3: warning: array subscript 0 is outside array bounds of 'volatile uint32_t [0]' [-Warray-bounds] 16 | *UART0_DR = c; | ^ The warning goes away if the "make" helper is extern and can't be inlined. Gcc 11.2 and before do not give this warning.
[Bug middle-end/99578] [11/12 Regression] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578 --- Comment #29 from Goswin von Brederlow --- (In reply to Jakub Jelinek from comment #26) > That is nonsense. The amount of code in the wild that relies on (type > *)CONSTANT > working is insane, you can't annotate it all. And it has worked fine for > decades. The pointers aren't invalid, they point to valid objects in the > address space. > POSIX supports MAP_FIXED for a reason (and in many embedded cases one > doesn't even have an MMU and I/O or other special areas are mapped directly). A cast from integer to pointer is implementation defined behavior except for 1) 0 which must cast to NULL / nullptr 2) if the integer value was constructed by casting a pointer to integer of suitable size There is no garantee in the C standard that '(type *)CONSTANT' will actually point to the hardware address 'CONSTANT'. It's just how gcc happens to do it in most cases. So no, your code is not fine. It is fragile. It relies on implementation details of gcc. But lets not argue about that. Detecting NULL pointer access and offsets to it is a good thing, except where it isn't. It's unfortunate it also catches other stuff. Under AmigaOS the pointer to the exec.library (no program can work without that) is stored in address 4. So there isn't an universal value of "this is big enough not to be an offset to NULL". Detecting if an expression involves NULL might be hard. If it starts as NULL->member then it's easy. What about (&x - &x)+offsetof(X.member) or (uintptr_t)&x.member - (uintptr_t)&x or similar stuff you easily get with macros. On the other side (T*)0x45634534 should be easy to detect as not being NULL+offset. It's a literal. But the grey zone inbetween the easy cases might be to big to be useful. Alternatively an annotation for this would actually go nicely with another bug I reported: 'add feature to create a pointer to a fixed address as constexpr' [1]. The annotation would avoid the warning and also make it a pointer literal that can be used in constexpr (appart from accessing the address). It could also cause gcc to handle the case where CONSTANT can't just be cast to pointer and work. Like when using address authentication on ARMv8 CPUs, to name something modern. And the size of the object the pointer points to can be taken from its type, i.e. the pointer is to a single object and never an (infinite) array. If you want a pointer to an array then cast it to an array of the right size. -- [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104514
[Bug middle-end/99578] [11 Regression] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578 --- Comment #38 from Goswin von Brederlow --- (In reply to Jonathan Wakely from comment #34) > (In reply to Goswin von Brederlow from comment #29) > > There is no garantee in the C standard that '(type *)CONSTANT' will actually > > point to the hardware address 'CONSTANT'. It's just how gcc happens to do it > > in most cases. So no, your code is not fine. It is fragile. It relies on > > implementation details of gcc. But lets not argue about that. > > Actually, lets. It relies on guaranteed behaviour of GCC: > https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html > That's not going to change, and neither is the fact that the Linux kernel > depends on implementation-defined properties of GCC (where > "implementation-defined" is used in the formal sense, not "just an > implementation detail that might change tomorrow"). Thank you for agreeing with me that "It relies on implementation details of gcc". That's exactly what I said.
[Bug c++/104514] New: add feature to create a pointer to a fixed address as constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104514 Bug ID: 104514 Summary: add feature to create a pointer to a fixed address as constexpr Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de Target Milestone: --- In the embedded and micro controller world memory mapped registers are very common. They can be declared as external object and fudged in using linker scripts, which prevents a lot of optimizations. Or they can be declared as pointers, in the most reduced form like this: int *p = (int*)0x12345678; My problem now is that this isn't a constexpr and can't be used in any constexpr code: foo.cc:1:20: error: ‘reinterpret_cast’ from integer to pointer 1 | constexpr int *p = (int*)0x12345678; |^~~~ While this is the right thing in general there should be a way to allow this special case. A way to tell the compiler that an object exists at a fixed address and still be a constexpr.
[Bug c/105521] New: missed optimization in modulo arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105521 Bug ID: 105521 Summary: missed optimization in modulo arithmetic Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de Target Milestone: --- I'm trying to compute (a*a)%n for uint64_t types on x86_64 using "gcc -O2 -W -Wall" like this: #include #include uint64_t sqrmod(uint64_t a, uint64_t n) { assert(a < n); unsigned __int128 x = a; x *= a; return x % n; } I expected to get the following code: sqrmod: cmpq%rsi, %rdi jnb .L13 // assert(a < n) failure movq%rdi, %rax mul %rdi div %rsi movq%rdx, %rax ret The compiler does get the "mul" right but instead of the "div" it throws in a call to "__umodti3". The "__umodti3" function is horribly long code that will be worlds slower than a simple div. Note: The "asset(a < n);" should tell the compiler that the "div" instruction can not overflow and will not cause a #DivisionError. Without the assert the compiler could (conditionally) add "a %= n;" for the same effect. https://godbolt.org/z/cd57Wd4oo
[Bug target/105521] missed optimization in modulo arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105521 --- Comment #3 from Goswin von Brederlow --- (In reply to Andrew Pinski from comment #1) > This requires having a, 64bit/32bit (and 128bit/64bit) pattern really. So > this is both a middle-end issue and a target issue. > > Note there might be another bug asking for the same optimization. > > Also note x86_64 might be the only popular target which has this kind of div > instruction so this might not get any attention as it is also a small > peephole where most people don't use 128bit integers either (they are > non-standard even). I know m68k had a 64bit/32bit pattern but it is indeed rare. On x86_64 a (32bit * 32bit = 64bit) % 32bit uses the 128bit/64bit DIV instruction and two extra truncation to 32bit for the input registers. On many cpus that is significantly (factor 3-10) slower than the 64bit/32bit version. This could potentially affect every / and % operation and preceding *, allowing for the faster opcodes with fewer bits to be used where the compiler can reason about the magnitude of the arguments.
[Bug libstdc++/105844] New: std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844 Bug ID: 105844 Summary: std::lcm(5, 4) is UB but accepted in a constexpr due to cast to unsigned Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de Target Milestone: --- Running "gcc-12.1 -std=c++20 -O2 -W -Wall" on #include constinit int t = std::lcm(5, 4); produces t: .long -1795017296 The standard says: > The behavior is undefined if |m|, |n|, or the least common multiple of |m| > and |n| is not representable as a value of type std::common_type_t. Which is the case here, the lvm overflows and is undefined. The negative number produced is not correct and the compile should fail. The problem is the __absu function in casting the arguments to an unsigned type: // std::abs is not constexpr, doesn't support unsigned integers, // and std::abs(std::numeric_limits::min()) is undefined. template constexpr _Up __absu(_Tp __val) { static_assert(is_unsigned<_Up>::value, "result type must be unsigned"); static_assert(sizeof(_Up) >= sizeof(_Tp), "result type must be at least as wide as the input type"); return __val < 0 ? -(_Up)__val : (_Up)__val; } /// Least common multiple template constexpr common_type_t<_Mn, _Nn> lcm(_Mn __m, _Nn __n) noexcept { static_assert(is_integral_v<_Mn>, "std::lcm arguments must be integers"); static_assert(is_integral_v<_Nn>, "std::lcm arguments must be integers"); static_assert(_Mn(2) == 2, "std::lcm arguments must not be bool"); static_assert(_Nn(2) == 2, "std::lcm arguments must not be bool"); using _Up = make_unsigned_t>; return __detail::__lcm(__detail::__absu<_Up>(__m), __detail::__absu<_Up>(__n)); } __lvm is called with unsigned arguments which do not overflow for the given numbers. And any unsigned overflow would not be undefined behavior. The result of __lcm is then converted back to the signed type, which is not UB. I suggest the following changes: // LCM implementation template constexpr _Tp __lcm(_Tp __m, _Tp __n) { static_assert(__m == 0 || __n == 0 || __m / __detail::__gcd(__m, __n) <= std::numeric_limits<_Tp>::max() / __n, "std::lcm not representable in commont type"); return (__m != 0 && __n != 0) ? (__m / __detail::__gcd(__m, __n)) * __n : 0; } /// Least common multiple template constexpr common_type_t<_Mn, _Nn> lcm(_Mn __m, _Nn __n) noexcept { static_assert(is_integral_v<_Mn>, "std::lcm arguments must be integers"); static_assert(is_integral_v<_Nn>, "std::lcm arguments must be integers"); static_assert(_Mn(2) == 2, "std::lcm arguments must not be bool"); static_assert(_Nn(2) == 2, "std::lcm arguments must not be bool"); using _Cp = common_type_t<_Mn, _Nn>; using _Up = make_unsigned_t>; _Up t = __detail::__lcm(__detail::__absu<_Up>(__m), __detail::__absu<_Up>(__n)); static_assert(t <= (_Up)std::numeric_limits<_Cp>::max(), "std::lcm not representable in commont type"); return t; }
[Bug libstdc++/105844] std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844 Goswin von Brederlow changed: What|Removed |Added CC||goswin-v-b at web dot de --- Comment #1 from Goswin von Brederlow --- Created attachment 53081 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53081&action=edit Patch for numeric Patch for the proposed changes
[Bug libstdc++/105844] std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844 --- Comment #2 from Goswin von Brederlow --- I know the patch doesn't work yet, the static_asserts aren't constexpr. But hopefully it gives someone enough of an idea to fix it.
[Bug libstdc++/105844] std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844 --- Comment #3 from Goswin von Brederlow --- Created attachment 53082 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53082&action=edit Working patch for detecting UB This will abort if the arguments are too large instead of static_assert, best I could figure out that would work.