[Bug ipa/94656] New: target_clones on alias leads to segfault in the compiler

2020-04-18 Thread yyc1992 at gmail dot com
Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Compiling the following code with `gcc -c` leads to a segfault in the compiler targetclone pass

[Bug lto/94659] New: Missing symbol with LTO and target_clones

2020-04-19 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- This is basically the same as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 except now it only happens with LTO enabled. It seems

[Bug target/95775] New: Command line argument for target_clones?

2020-06-19 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Would it make sense to add a command line argument that is roughly equivalent to to adding `target_clones` to all functions? In terms of usefulness, I believe it will be a very

[Bug lto/95776] New: Reduce indirection with target_clones at link time (with LTO)

2020-06-19 Thread yyc1992 at gmail dot com
Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Currently, if a function is not not visible outside the final library (static, or internal or

[Bug c/95777] New: Allow specifying more than one target options at the same time in target and target_clones attribute

2020-06-19 Thread yyc1992 at gmail dot com
: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Currently it seems that (document and own tests) only a single option is allowed for each version of the

[Bug other/95778] New: target_clones indirection eliminates requires noinline

2020-06-19 Thread yyc1992 at gmail dot com
Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Compiling ``` static __attribute__((noinline,target_clones("default,avx2"))) int f2(int *p) { asm volatile ("" :: "r"

[Bug other/95779] New: Unnecessary dispatch function for static target_clones function.

2020-06-19 Thread yyc1992 at gmail dot com
Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Using the code in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 the full assembly generated (the version with both noinline) is

[Bug other/95780] New: target_clones treats internal visibility different from static functions

2020-06-19 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Again using the code in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778. If the static function `f2` is changed to

[Bug other/95781] New: Missing dead code elimination when a recursive function is inlined.

2020-06-19 Thread yyc1992 at gmail dot com
Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Code, ``` static int 2(int *p, int k) { int res = 0; if (k > 0) res += 2(p, k - 1); return *p +

[Bug other/95778] target_clones indirection eliminates requires noinline

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 --- Comment #1 from Yichao Yu --- Ah, I think this might be the fix for both this issue and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95780 . I'll test more and will try to submit it later. ``` diff --git a/gcc/multiple_target.c b/gcc/multipl

[Bug other/95778] target_clones indirection eliminates requires noinline

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 --- Comment #2 from Yichao Yu --- Also, the original code example had an error, the code that works properly was ``` static __attribute__((noinline,target_clones("default,avx2"))) int f2(int *p) { asm volatile ("" :: "r"(p) : "memory");

[Bug tree-optimization/95786] New: Too aggressive target indirection elimination

2020-06-20 Thread yyc1992 at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I realize this issue when debugging PR95778 and PR95780 (ref https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548631.html) It seems that the indirection

[Bug ipa/95790] New: Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- The indirection elimination code currently only check for match of the target for the specific version but doesn't check if all the target

[Bug other/95778] target_clones indirection eliminates requires noinline

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95778 --- Comment #4 from Yichao Yu --- Yeah, after digging further the two issue are indeed the same. I initially didn't think they are since I didn't realize PR95786 (that the visibility attribute is simply ignored completely...) and thought static w

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #2 from Yichao Yu --- The C++ code attached above produces the following incorrect code with `g++ -O2 -S` .file "a.c" .text .p2align 4 .globl _Z3barv .type _Z3barv, @function _Z3barv: .LFB

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #3 from Yichao Yu --- And the assembly showing the correct dispatch is .file "a.c" .text .p2align 4 .type _ZL3fooPKcj, @function _ZL3fooPKcj: .LFB0: .cfi_startproc movl$1, %eax

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #5 from Yichao Yu --- It’s wrong when running on a target that has avx512f. The unoptimuzed version will call the correct foo but the unoptimized case won’t. As I said, this is an issue when the total targets are different between th

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #7 from Yichao Yu --- > Your testcase has nested function multi-versioning. I don't think it works at all. I opened PR 95793. I'm sorry but what is nested function multi-versioning? and what's the difference between the test case h

[Bug ipa/95790] Incorrect static target dispatch

2020-06-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #8 from Yichao Yu --- And the reason I reported this as a mis-optimization rather than something completely unsupported is that the following code. ``` #include // #define disable_opt __attribute__((flatten)) #define disable_opt d

[Bug ipa/95796] New: Inlining works between functions with the same target attribute but not target_clones

2020-06-20 Thread yyc1992 at gmail dot com
Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- If two functions with the same target attribute calls each other, GCC

[Bug ipa/95775] Command line argument for target_clones?

2020-06-22 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95775 --- Comment #2 from Yichao Yu --- > But it will blow up code-size considerably. > So without some major work I don't think simply slapping target_clones on > each function is going to fly in practice. I mean, it'll blow up not much more than th

[Bug c/95777] Allow specifying more than one target options at the same time in target and target_clones attribute

2020-06-22 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95777 --- Comment #2 from Yichao Yu --- I only tested this with `target_clones` and it seems that I misread the document for `target`. So this is only an issue with `target_clones` attribute. `target` support this just fine. So to be more clear, using

[Bug c/95777] Allow specifying more than one target options at the same time in target and target_clones attribute

2020-06-22 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95777 --- Comment #3 from Yichao Yu --- And for backward compatibility maybe `target_clones("(sse4.1,arch=core2),default")` would work?

[Bug ipa/95775] Command line argument for target_clones?

2020-06-23 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95775 --- Comment #4 from Yichao Yu --- > Hey. My opinion is similar to Richi's. If you really want a highly optimized > library, you should rather use a dlopen mechanism with pre-built set of > options. Well, a few things, 1. That sounds like an a

[Bug fortran/96069] New: -ffile-prefix-map does not affect print in gfortran

2020-07-05 Thread yyc1992 at gmail dot com
Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Compiling the following code `a.f` ``` subroutine f(name) implicit none character*(*) name print *,name return end ``` with

[Bug fortran/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #2 from Yichao Yu --- Why should this feature be c only?

[Bug preprocessor/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #4 from Yichao Yu --- > Apparently it is. Yes, but my question is about why should this be "WONTFIX". This feature (reproducible build) is certainly as useful in fortran as it is in C family. > Let move the component to 'preprocesso

[Bug preprocessor/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #6 from Yichao Yu --- https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549411.html and https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549413.html

[Bug preprocessor/96069] -ffile-prefix-map does not affect print in gfortran

2020-07-08 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96069 --- Comment #8 from Yichao Yu --- OK, done. It would be nice to mention it on https://gcc.gnu.org/contribute.html#patches

[Bug rtl-optimization/96539] New: Unnecessary no-op copy with Os and tail call with struct argument

2020-08-08 Thread yyc1992 at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Test C code, ``` struct A { int a; int b; int c; int d; int e; int f; void *p1; void *p2

[Bug rtl-optimization/96539] Unnecessary no-op copy with Os and tail call with struct argument

2020-08-11 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96539 --- Comment #4 from Yichao Yu --- Wow that was fast... thx.

[Bug c/96629] New: spurious uninitialized variable warning with branches at -O1 and higher

2020-08-15 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Reduced test code: ``` int mem(char *data); int cond(void); void f(char *data, unsigned idx, unsigned inc) { char *d2; int c

[Bug c/96629] spurious maybe uninitialized variable warning with difficult control-flow analysis

2020-09-03 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96629 --- Comment #3 from Yichao Yu --- Just curious, is it some particular structure that is upsetting it or did it simply hit some depth limit.

[Bug c/96990] New: Regression in aarch64 struct vector member initialization

2020-09-08 Thread yyc1992 at gmail dot com
Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code used to work on gcc 9.3 but stops working with 10.2 with an error ``` a.c: In function ‘test_aa64_vec_2’: a.c:19:24: error

[Bug libstdc++/92759] New: Typo in libstdcxx/v6/xmethods.py

2019-12-02 Thread yyc1992 at gmail dot com
++ Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I get the following warning when running gdb/rr. ``` /usr/lib/../share/gcc-9.2.0/python/libstdcxx/v6/xmethods.py:731: SyntaxWarning: list indices must be integers or slices, not str

[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows

2019-08-25 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 --- Comment #29 from Yichao Yu --- See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412#c25 GCC is fully capable of aligning the stack. It just seems that different part of it disagrees on what the current stack alignment is and whether a real

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM (aarch32)

2018-01-30 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #20 from Yichao Yu --- Just want to mention that the lack of a way to locally change the arch settings without lying to the compiler is exactly why I reported this issue.

[Bug c/89485] New: Support vectorcall calling convention on windows

2019-02-24 Thread yyc1992 at gmail dot com
: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I'm very surprised that I didn't find an issue for this so sorry if this is discussed/rejected somewhere else. It appears that both MSVC and clang supports a vectorca

[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows

2019-02-27 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 Yichao Yu changed: What|Removed |Added CC||yyc1992 at gmail dot com --- Comment #23

[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows

2019-02-27 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 --- Comment #24 from Yichao Yu --- Oh, and the test case above was compiled with -O3 (and -g -Wall -Wextra).

[Bug target/89581] New: Unneeded stack alignment on windows x86

2019-03-04 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- On windows, when compiling the following code with ` gcc -mavx2 a.c -o - -S -O3 -g0 -fno-asynchronous-unwind-tables -fomit-frame-pointer -Wall -Wextra` ``` typedef struct

[Bug target/89582] New: Suboptimal code generated for floating point struct in -O3 compare to -O2

2019-03-04 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- When testing the code for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89581 on linux, I noticed that the code seems suboptimum

[Bug target/89581] Unneeded stack alignment on windows x86

2019-03-04 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89581 --- Comment #1 from Yichao Yu --- The problem is still there when compiled with -O2 ``` f: pushq %rbp vmovq (%r8), %xmm1 movq%rcx, %rax vmovq 8(%r8), %xmm0 vaddsd (%rdx), %xmm1, %xmm1 va

[Bug target/89597] New: Inconsistent vector calling convention on windows with Clang and MSVC

2019-03-05 Thread yyc1992 at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- For 256bit and 512bit vector return values, Clang and MSVC always returns them in the corresponding registers even without

[Bug target/89606] New: Extra mov after structure load instructions on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Code to reproduce, ``` #include #ifdef __aarch64__ float64x2x2_t f(const double *p1, const double *p2) { float64x2x2_t v = vld2q_f64(p1); return

[Bug target/89607] New: Missing optimization for store of multiple registers on arm and aarch64

2019-03-06 Thread yyc1992 at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Test code, Compiled for arm/aarch64 with -O1/-O2/-O3/-Os/-Ofast ``` #include void f4(float32x4x2_t *p, const float *p1

[Bug target/89607] Missing optimization for store of multiple registers on arm and aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #2 from Yichao Yu --- Sure. I'll do that.

[Bug target/89614] New: Missing optimization for store of multiple registers on arm

2019-03-06 Thread yyc1992 at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Separated from pr89607 as requested. Test code and result compiled with any non-zero optimization levels, ``` #include void f4

[Bug target/89607] Missing optimization for store of multiple registers on arm and aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #3 from Yichao Yu --- Done pr89614

[Bug target/89607] Missing optimization for store of multiple registers on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #5 from Yichao Yu --- I just compiled the 9-20190303 snapshot and this is indeed seems to be fixed. Should this be closed now or after GCC 9 is released?

[Bug target/89607] Missing optimization for store of multiple registers on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #6 from Yichao Yu --- > For aarch64, there was talk about adding stp for q registers. What do you mean? I was initially unsure about it too but I assume it already exist since clang (and now GCC 9) emits it and the arm arch reference

[Bug target/89607] Missing optimization for store of multiple registers on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89607 --- Comment #8 from Yichao Yu --- I see. I don't imagine this to cause a major local speed up though I assume it should at least not be slower? That's also why I mentioned that this should at least be done for `-Os`.

[Bug target/89606] Extra mov after structure load instructions on aarch64

2019-03-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89606 --- Comment #1 from Yichao Yu --- Compiled a GCC 9 snapshot for pr89607 and the issue is still present.

[Bug tree-optimization/89582] Suboptimal code generated for floating point struct in -O3 compare to -O2

2019-04-04 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89582 --- Comment #6 from Yichao Yu --- For the vfloat test case, isn't the optimum code just ``` addps %xmm2, %xmm0 addps %xmm3, %xmm1 retq ``` It's not making full use of the vector but I assume not having to spill is a

[Bug c/90728] New: False positive Wmemset-elt-size with zero size array

2019-06-03 Thread yyc1992 at gmail dot com
Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The code below comes from a template expansion (when certain cache feature is disabled) and all the operation on the `buff` member are no-op. ``` #include struct A

[Bug target/90826] New: Weak symbol does not work reliably on windows

2019-06-10 Thread yyc1992 at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code does not link correctly with all optimization levels on windows with the mingw-w64-x86_64-g++ compiler. ``` #include extern "C" void f() __a

[Bug target/90826] Weak symbol does not work reliably on windows

2019-06-10 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90826 --- Comment #1 from Yichao Yu --- Oh, forgot to mention that the first assembly was generated with -O3 and adding `.weak f` to the generated file fixes the issue as well.

[Bug target/90826] Weak symbol does not work reliably on windows

2019-06-10 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90826 --- Comment #2 from Yichao Yu --- Also, I just upgraded the compiler on this computer from 7.x to 9.1.0. The issue appeared before the upgrade as well but I didn't investigate until the upgrade finished.

[Bug c++/69550] Need a way to disable "flexible array member in an otherwise empty struct" error on GCC 6

2016-01-29 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550 Yichao Yu changed: What|Removed |Added CC||yyc1992 at gmail dot com --- Comment #17

[Bug target/67458] x86: atomic store with memory_order_release doesn't order other stores

2016-02-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67458 Yichao Yu changed: What|Removed |Added CC||yyc1992 at gmail dot com --- Comment #1

[Bug target/70814] New: atomic store of __int128 is not lock free on aarch64

2016-04-26 Thread yyc1992 at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- `std::atomic<__int128>::store` on aarch64 is not lock free and generates a function call `__atomic_store_16`. However, required atomic instructions (`stlxp`) exi

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-04-27 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #2 from Yichao Yu --- As an update, it seems that the llvm trunk recently switched to using `__atomic_*` function call for int128 on all platforms (that I can test anyway). I'm not sure how that decision is made or if there's any comm

[Bug target/71056] New: __builtin_bswap32 NEON instruction error with -O3

2016-05-10 Thread yyc1992 at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code generate a NEON instruction not available error when compiling with `gcc -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O3 -o /dev/null -c a.c` on ARM

[Bug target/71056] [6/7 Regression] __builtin_bswap32 NEON instruction error with -O3

2016-05-21 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71056 --- Comment #4 from Yichao Yu --- (Sorry I'm not sure how to understand that cross link). Is the fix merged?

[Bug other/71414] New: 2x slower than clang summing small float array

2016-06-04 Thread yyc1992 at gmail dot com
: other Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Ref https://llvm.org/bugs/show_bug.cgi?id=28002 C source code. ```c __attribute__((noinline)) float sum32(float *a, size_t n) { /* a = (float*)__builtin_assume_aligned

[Bug other/71414] 2x slower than clang summing small float array

2016-06-06 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414 --- Comment #4 from Yichao Yu --- The C code is in the gist linked `a` is a cacheline aligned pointer and `n` is 1024 so `a` should even fits in L1d, which is 32kB on both processors I benchmarked. More precise timing (ns per loop) 6700K ``` %

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

2016-06-07 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414 --- Comment #7 from Yichao Yu --- If I add `-fvariable-expansion-in-unroller` (omg this options is like half the command line ;-p ...), the performance matches the clang one after the clang 3.8 regression. ``` % gcc -funroll-loops -fvariable-exp

[Bug target/77728] [5/6 Regression] Miscompilation multiple vector iteration on ARM

2017-04-25 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #48 from Yichao Yu --- Thanks for fixing this. I didn't follow all the comments since I'm not familiar with the C++ ABI so just to make sure I understand what's happening is it that the bug is caused by a inconsistency in C++ ABI for

[Bug target/80732] New: target_clones does not work with dlsym

2017-05-12 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Compiling the code below to a executable with `gcc -Wall -Wextra -O3 -fPIC -ldl -rdynamic`. On a haswell+ system, the output is ``` 1: 0, 4.93038e-32, 0 2: 4.93038e-32, 4.93038e-32

[Bug target/80732] target_clones does not work with dlsym

2017-05-17 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 --- Comment #4 from Yichao Yu --- `double (*pf1)(double, double, double) = dlsym(hdl, "f1.ifunc");` Wouldn't it be better if GCC generates local functions `f1.default`, `f1.fma` as implementation and `f1` to replace `f1.ifunc`? It's quite incont

[Bug target/80732] target_clones does not work with dlsym

2017-05-17 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 --- Comment #6 from Yichao Yu --- Good to know. Thanks.

[Bug target/80732] target_clones does not work with dlsym

2017-06-19 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80732 --- Comment #9 from Yichao Yu --- Thanks for the fix! Does it fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78366 at the same time?

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2017-01-13 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #5 from Yichao Yu --- Ping again? Anything new or I can help with here?

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2017-03-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #6 from Yichao Yu --- Anything new here?

[Bug target/82641] New: Unable to enable crc32 for a certain function with target attribute on ARM

2017-10-20 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The assembler complains about the target not supporting CRC32 instructions for certain (generic) targets on ARM and AArch64

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM

2017-10-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #1 from Yichao Yu --- I've found a workaround in https://sourceware.org/ml/binutils/2017-04/msg00171.html but it's extremely ugly (albeit also very clever...).

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM (aarch32)

2017-10-24 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #3 from Yichao Yu --- > ARMv8-a is the only architecture variant where the CRC extension is optional Not really. There's also armv8-r and armv8-m. Also, I believe code compiled for armv7-a can run on armv8-a hardware and can also opt

[Bug target/82641] Unable to enable crc32 for a certain function with target attribute on ARM (aarch32)

2017-11-02 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82641 --- Comment #7 from Yichao Yu --- It would be great if `+crc` can work if it's not ambiguous. Requiring `arch=armv8-a+crc` works for me too, and it'll just require more preprocessor checks.

[Bug target/83110] New: Relocation error when taking address of protected function in shared library.

2017-11-22 Thread yyc1992 at gmail dot com
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- This is very similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 although that one is marked as fixed. (This

[Bug target/83110] Relocation error when taking address of protected function in shared library.

2017-11-23 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83110 --- Comment #2 from Yichao Yu --- What might be invalid about the source?

[Bug target/77728] New: Miscompilation multiple vector iteration on ARM

2016-09-24 Thread yyc1992 at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Code to reproduce is at https://gist.github.com/yuyichao/a66edb9d05d18755fb7587b12e021a8a. The two cpp files are ```c++ #include #include typedef std::vector

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2016-09-26 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #2 from Yichao Yu --- I should add that turning on lto works around the issue both in the simple code attached and for the original issue I was having in julia (i.e. compiling llvm with LTO makes the issue go away).

[Bug lto/77996] New: Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I'm seeing a miscompilation of LLVM's tablegen on AArch64 by gcc 6.2.1 when LTO is enabled. I've tried very hard to reduce it but unfortunately it wasn't very successf

[Bug lto/77997] New: Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- I'm seeing a miscompilation of LLVM's tablegen on AArch64 by gcc 6.2.1 when LTO is enabled. I've tried very hard to reduce it but unfortunately it wasn't very successf

[Bug lto/77997] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77997 --- Comment #2 from Yichao Yu --- . Sorry the first submission gave me a time out so I did again..

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #3 from Yichao Yu --- > What exact version of LLVM are you trying to compile? Revision of the LLVM > sources including revision of clang, etc. I was compiling the trunk version. The version I started reducing from was https://githu

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #5 from Yichao Yu --- Compiling current llvm trunk (r284322) still shows the same error. The script I used to compile LLVM is here https://github.com/yuyichao/arch-pkg/blob/master/pkg/all/llvm-svn/PKGBUILD. Compiling gcc 951db45 now

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #6 from Yichao Yu --- I've compiled a gcc at 951db45 using the same configuration as archlinux arm PKGBUILD and I can reproduce the problem using the `code/` in https://gist.github.com/yuyichao/6c24d4a4bc374425906138359a44479c/raw/f5e

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #8 from Yichao Yu --- > Can you try with -fno-strict-aliasing ? That seems to fix it for both the original case (LLVM) and the reduced case (the linked tarball). Is there a way to figure out the problematic (either bug in LLVM's code

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-15 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #10 from Yichao Yu --- That does look like an violation (this particular one should be hidden behind shared library boundary in the reduced case though). Reported to LLVM at https://llvm.org/bugs/show_bug.cgi?id=30711 .

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-16 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #11 from Yichao Yu --- The case pointed out is fixed in https://reviews.llvm.org/rL284336 although as expected that doesn't fix the error. Still not sure whose bug is this...

[Bug target/77728] [5/6/7 Regression] Miscompilation multiple vector iteration on ARM

2016-10-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728 --- Comment #4 from Yichao Yu --- Ping. Anything I can help with debugging this?

[Bug middle-end/77996] Miscompilation due to LTO on aarch64

2016-10-20 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77996 --- Comment #12 from Yichao Yu --- Since the LLVM miscompilation isn't fixed, is there any way to check the alias assumptions more programmatically? (I can see that the TrailingObject might easily introduce something like this but given the compl

[Bug c++/61400] New: suffix return type doesn't work for member functions

2014-06-02 Thread yyc1992 at gmail dot com
y: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com The following code compiles on clang++ but not on g++ 4.9.0. ``` struct A { template inline int a() { return 0; } template inline auto b() -> de

[Bug c++/61400] suffix return type doesn't work for template member functions with explict specialization

2014-06-02 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61400 --- Comment #1 from Yichao Yu --- Additional info. It seems that G++ only raise this error when all of the following are met: 1. Both of the functions (the one that uses suffix return type and the one used in the argument of decltype) must be no

[Bug c++/61400] suffix return type doesn't work for template member functions with explict specialization

2014-06-02 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61400 --- Comment #2 from Yichao Yu --- Sorry I have just noticed that I forgot to include the error message... ``` gcc.cpp: In function 'int main()': gcc.cpp:20:12: error: no matching function for call to 'A::b()' a.b<2>(); ^ gcc.cpp

[Bug middle-end/68336] New: False positive Wreturn-type warning

2015-11-13 Thread yyc1992 at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code gives a warning about control flow reaching the end of function without returning a value even though the function will always reach the `return 1;` statement and

[Bug middle-end/68336] False positive Wreturn-type warning

2015-11-13 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68336 --- Comment #1 from Yichao Yu --- Ref clang bug report https://llvm.org/bugs/show_bug.cgi?id=25521

[Bug c++/65255] New: std::thread does not work for cross compiling on ARM

2015-02-28 Thread yyc1992 at gmail dot com
Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Created attachment 34903 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34903&action=edit PKGBUILD used to compile the cross compiling version of gcc Duplicate of the problem repo

[Bug c++/65255] std::thread does not work for cross compiling on ARM

2015-02-28 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65255 --- Comment #1 from Yichao Yu --- Created attachment 34904 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34904&action=edit Source and output programs

  1   2   >