[Bug libstdc++/81806] Split in pbds works in O(n) instead of O(log n)

2024-06-08 Thread adamant.pwn at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81806

--- Comment #14 from Oleksandr Kulkov  ---
Hi all, I wanted to bump this thread very briefly.

I think going with the fix at
https://gcc.gnu.org/bugzilla/attachment.cgi?id=47615&action=diff from Xi would
make sense, unless someone proposes a solution that also improves the
performance further (which is unlikely). Even though, according to Xi, it's
still quite slow, at least this would fix the discrepancy with official
documentation that says "red-black trees are split and joined in
poly-logarithmic complexity"
(https://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/tree_based_containers.html).

I think if the patch is accepted, it would be sufficient to close this bug,
even if the result is still not that useful for competitive programming.

[Bug c++/115399] New: std::tr2::dynamic_bitset shift behaves differently from std::bitset

2024-06-08 Thread adamant.pwn at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115399

Bug ID: 115399
   Summary: std::tr2::dynamic_bitset shift behaves differently
from std::bitset
   Product: gcc
   Version: 14.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: adamant.pwn at gmail dot com
  Target Milestone: ---
Target: x86-64 Linux
 Build: 20240522

Posting a bug on behalf of someone who didn't manage to create an account at
GCC bugzilla due to automatic registration being disabled...

The bug is due to the line
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/tr2/dynamic_bitset.tcc#L63
being commented out in dynamic_bitset implementation (unlike std::bitset, where
it is present). Uncommenting it should fix the problem.

A simple example program to highlight the discrepancy:

#include 
#include 
#include 

int main() {
std::tr2::dynamic_bitset<> test(91);
test[50] = true;
std::cout << test << "\n";
std::cout << (test << 78) << "\n";
std::cout << "\n";

std::bitset<91> test2;
test2[50] = true;
std::cout << test2 << "\n";
std::cout << (test2 << 78) << "\n";
}

Output:

100
100

100
000

[Bug libstdc++/115399] std::tr2::dynamic_bitset shift behaves differently from std::bitset

2024-06-08 Thread adamant.pwn at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115399

--- Comment #1 from Oleksandr Kulkov  ---
The bug also affects right shift for a similar reason
(https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/tr2/dynamic_bitset.tcc#L91):

#include 
#include 
#include 
#include 

int main() {
std::tr2::dynamic_bitset<> test(128);
for (int i = 64; i < 128; i++) test[i] = 1;
std::cout << test << "\n";
std::cout << (test >> 65) << "\n\n";

boost::dynamic_bitset<> test2(128);
for (int i = 64; i < 128; i++) test2[i] = 1;
std::cout << test2 << "\n";
std::cout << (test2 >> 65) << "\n\n";

std::bitset<128> test3;
for (int i = 64; i < 128; i++) test3[i] = 1;
std::cout << test3 << "\n";
std::cout << (test3 >> 65) << "\n\n";
}

Output:


0111


0111


0111

See also https://github.com/emsr/tr2/issues/1.

[Bug c++/120456] New: __builtin_shuffle produces unnecessary vperm2i128

2025-05-28 Thread adamant.pwn at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120456

Bug ID: 120456
   Summary: __builtin_shuffle produces unnecessary vperm2i128
   Product: gcc
   Version: 14.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: adamant.pwn at gmail dot com
  Target Milestone: ---

Consider the following snippet:

auto test1(uint32_t bits) {
auto bytes = u8x32(u32x8() + bits);
u8x32 shuffler = {
0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3
};
auto shuffle = __builtin_shuffle(bytes, shuffler);
return shuffle;
}

It produces an unnecessary vperm2i128 command in the output. Same if I try to
use __builtin_shufflevector. Also if I try to change one of the bytes to the
second half, e.g. 3 to 31, it produces an unnecessary vpermq instead of
vperm2i128.

See https://godbolt.org/z/eEnd673e6 for details, and comparison with manual
implementation of the same function with intrinsics.