[Bug target/118342] `a == 0 ? 32 : __builtin_ctz(a)` for Intel and AMD cores could be implemented even without BMI1

2025-01-10 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118342 --- Comment #13 from Mayshao-oc at zhaoxin dot com --- (In reply to Jakub Jelinek from comment #4) > Well, there is also the > "On some older processors, use of a 32-bit operand size may clear the upper > 32 bits of a 64-bit destination while lea

[Bug target/118342] `a == 0 ? 32 : __builtin_ctz(a)` for Intel and AMD cores could be implemented even without BMI1

2025-01-09 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118342 --- Comment #3 from Mayshao-oc at zhaoxin dot com --- Zhaoxin could confirm that for the bsf instruction, if the source register is zero, the destination register is unchanged.

[Bug target/118342] `a == 0 ? 32 : __builtin_ctz(a)` for Intel and AMD cores could be implemented even without BMI1

2025-01-09 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118342 --- Comment #2 from Mayshao-oc at zhaoxin dot com --- Zhaoxin could confirm that for the bsf instruction, if the source register is zero, the destination register is unchanged.

[Bug rtl-optimization/117438] New: pass_align_tight_loops may cause performance regression in nested loops

2024-11-03 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117438 Bug ID: 117438 Summary: pass_align_tight_loops may cause performance regression in nested loops Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-15 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #38 from Mayshao-oc at zhaoxin dot com --- vmovdqu is also atomic in Zhaoxin processors if it meets three requirements: 1. the address of its memory operand must be 16-byte aligned 2. vmovdqu is vex.128 not vex.256 3. the memory type

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-15 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #37 from Mayshao-oc at zhaoxin dot com --- vmovdqu is also atomic in Zhaoxin processors if it meets three requirements: 1. the address of its memory operand must be 16-byte aligned 2. vmovdqu is vex.128 not vex.256 3. the memory type

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-09 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #34 from Mayshao-oc at zhaoxin dot com --- (In reply to Jakub Jelinek from comment #17) > Fixed for AMD on the library side too. > We need a statement from Zhaoxin and VIA for their CPUs. Sorry for the late reply. We guarantee that V

[Bug target/100758] __builtin_cpu_supports does not (always) detect "sse2"

2023-02-20 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100758 --- Comment #24 from Mayshao-oc at zhaoxin dot com --- Hi Jakub: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100758 Thanks for your patch. We test it works on all zhaoxin platforms. We find the same bug still exi