[Bug c/101261] New: clones for target_clones attribute cannot be created when compilling with -fno-semantic-interposition

2021-06-29 Thread johnnybit at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101261

Bug ID: 101261
   Summary: clones for target_clones attribute cannot be created
when compilling with -fno-semantic-interposition
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: johnnybit at gmail dot com
  Target Milestone: ---

darktable's user reported problem building darktable with "optimized" flags and
I could confirm the error is present on GCC 10.3.0

When compilling whole darktable with CFLAGS set to 
`-march=native -O3 -fno-semantic-interposition -flto=3 -fno-plt
-fgraphite-identity -floop-nest-optimize -fuse-linker-plugin -pipe -Wl,-O1
-Wl,--as-needed`

the compiler errors with:

src/common/iop_profile.c:1015:6: error: clones for «target_clones» attribute
cannot be created
 1015 | void dt_ioppr_transform_image_colorspace(struct dt_iop_module_t *self,
const float *const image_in,
  |  ^

I've managed to narrow it down and it was enough to just set the
-fno-semantic-interposition flag to make darktable compillation error out on
the same point.

I haven't been able to create minimum reproducer for this problem (single file
project compiles and runs ok)

for details please check https://github.com/darktable-org/darktable/issues/9303

[Bug c/101262] New: GCC11 OpenMP optimization causes sigsegv on aligned constant array in darktable

2021-06-29 Thread johnnybit at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101262

Bug ID: 101262
   Summary: GCC11 OpenMP optimization causes sigsegv on aligned
constant array in darktable
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: johnnybit at gmail dot com
  Target Milestone: ---

A loop with constant length arrays with openmp and simd optimization crashes
with sigsegv when compilled with GCC11. Same code works with previous versions
and with various Clang versions

the code in question looks like

#define PIXEL_CHAN 8

static const float centers_ops[PIXEL_CHAN] DT_ALIGNED_ARRAY = {-56.0f / 7.0f,
// = -8.0f
   -48.0f / 7.0f,
   -40.0f / 7.0f,
   -32.0f / 7.0f,
   -24.0f / 7.0f,
   -16.0f / 7.0f,
-8.0f / 7.0f,
 0.0f / 7.0f};

#pragma omp simd aligned(centers_ops, factors:64) safelen(PIXEL_CHAN)
reduction(+:result)
  for(int i = 0; i < PIXEL_CHAN; ++i)
result += gaussian_func(expo - centers_ops[i], gauss_denom) * factors[i];


centers_ops is static const float [PIXEL_CHAN]
factors is const float *const restric factors (which is PIXEL_CHAN length)

the crash is on

result += gaussian_func(expo - centers_ops[i], gauss_denom) * factors[i];

line.

The reports in darktable are 
https://github.com/darktable-org/darktable/issues/9340
https://github.com/darktable-org/darktable/issues/9002

[Bug middle-end/101262] GCC11 OpenMP optimization causes sigsegv on aligned constant array in darktable

2021-06-29 Thread johnnybit at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101262

--- Comment #2 from Hubert Kowalski  ---
I've tried producing a minimum reproducer in form of code below, however I run
on gcc 10.3. And it depends on optimization type.

According to user reports - it's enough to compile darktable using GCC 11 with
RelWithDebInfo target (it applies -O2). Builds with Release target (-O3) are
apparently "fine"

(below code theoretically reproduces issue, but afaik it might not reliably
reproduce the problem)

#include 
#include 
#include 

#if defined(__GNUC__)
#pragma GCC optimize ("unroll-loops", "tree-loop-if-convert", \
  "tree-loop-distribution", "no-strict-aliasing", \
  "loop-interchange", "loop-nest-optimize", "tree-loop-im",
\
  "unswitch-loops", "tree-loop-ivcanon",
"ira-loop-pressure", \
  "split-ivs-in-unroller",
"variable-expansion-in-unroller", \
  "split-loops", "ivopts", "predictive-commoning",\
  "tree-loop-linear", "loop-block", "loop-strip-mine", \
  "finite-math-only", "fp-contract=fast", "fast-math")
#endif

#define dt_omp_firstprivate(...) firstprivate(__VA_ARGS__)
#define __DT_CLONE_TARGETS__ __attribute__((target_clones("default", "sse2",
"sse3", "sse4.1", "sse4.2", "popcnt", "avx", "avx2", "avx512f", "fma4")))
#define DT_ALIGNED_ARRAY __attribute__((aligned(64)))
#define PIXEL_CHAN 8
#define UI_SAMPLES 256

// radial distances used for pixel ops
static const float centers_ops[PIXEL_CHAN] DT_ALIGNED_ARRAY = {-56.0f / 7.0f,
// = -8.0f
   -48.0f / 7.0f,
   -40.0f / 7.0f,
   -32.0f / 7.0f,
   -24.0f / 7.0f,
   -16.0f / 7.0f,
-8.0f / 7.0f,
 0.0f / 7.0f};

typedef struct dt_iop_toneequalizer_gui_data_t
{
  // Mem arrays 64-bits aligned - contiguous memory
  float factors[PIXEL_CHAN] DT_ALIGNED_ARRAY;
  float gui_lut[UI_SAMPLES] DT_ALIGNED_ARRAY; // LUT for the UI graph
  float sigma;
} dt_iop_toneequalizer_gui_data_t;

#pragma omp declare simd
__DT_CLONE_TARGETS__
static inline float fast_clamp(const float value, const float bottom, const
float top)
{
  // vectorizable clamping between bottom and top values
  return fmaxf(fminf(value, top), bottom);
}

#pragma omp declare simd
__DT_CLONE_TARGETS__
static float gaussian_denom(const float sigma)
{
  // Gaussian function denominator such that y = exp(- radius^2 / denominator)
  // this is the constant factor of the exponential, so we don't need to
recompute it
  // for every single pixel
  return 2.0f * sigma * sigma;
}

#pragma omp declare simd
__DT_CLONE_TARGETS__
static float gaussian_func(const float radius, const float denominator)
{
  // Gaussian function without normalization
  // this is the variable part of the exponential
  // the denominator should be evaluated with `gaussian_denom`
  // ahead of the array loop for optimal performance
  return expf(- radius * radius / denominator);
}

__DT_CLONE_TARGETS__
static inline float pixel_correction(const float exposure,
 const float *const restrict factors,
 const float sigma)
{
  // build the correction for the current pixel
  // as the sum of the contribution of each luminance channel
  float result = 0.0f;
  const float gauss_denom = gaussian_denom(sigma);
  const float expo = fast_clamp(exposure, -8.0f, 0.0f);

#pragma omp simd aligned(centers_ops, factors:64) safelen(PIXEL_CHAN)
reduction(+:result)
  for(int i = 0; i < PIXEL_CHAN; ++i)
result += gaussian_func(expo - centers_ops[i], gauss_denom) * factors[i];

  return fast_clamp(result, 0.25f, 4.0f);
}

__DT_CLONE_TARGETS__
static inline void compute_lut_correction(struct
dt_iop_toneequalizer_gui_data_t *g,
  const float offset,
  const float scaling)
{
  // Compute the LUT of the exposure corrections in EV,
  // offset and scale it for display in GUI widget graph

  float *const restrict LUT = g->gui_lut;
  const float *const restrict factors = g->factors;
  const float sigma = g->sigma;

#pragma omp parallel for simd schedule(static) default(none) \
  dt_omp_firstprivate(factors, sigma, offset, scaling, LUT) \
  aligned(LUT, factors:64)
  for(int k = 0; k < UI_SAMPLES; k++)
  {
// build the inset graph curve LUT
// the x range is [-14;+2] EV
const float x = (8.0f * (((float)k) / ((float)(UI_SAMPLES - 1 - 8.0f;
LUT[k] = offset - log2f(pixel_correction(x, factors, sigma)) / scaling;
  }
}

int main() {
dt_iop_tone

[Bug middle-end/101262] GCC11 OpenMP optimization causes sigsegv on aligned constant array in darktable

2021-06-29 Thread johnnybit at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101262

--- Comment #4 from Hubert Kowalski  ---
I tried to generate a reproducer based of off code that in linked reports is
"guaranteed to crash" when compilled with GCC 11. My feel is that since it
doesn't work in isolation there's more moving parts to it. Common points in
linked reports are: same piece of code crashes when compilled with GCC11,
RelWithDebInfo target and the same code works correct when compilled with gcc
10.