adamreeve commented on PR #47998:
URL: https://github.com/apache/arrow/pull/47998#issuecomment-3465950738
I'm not sure this is the best approach for fixing this because it does slow
down the RLE encoding benchmarks on my machine. Although it also seems to speed
up some test cases:
```
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (9)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender
change %
counters
BM_RleEncoding/32768/1 2.866 GiB/sec 3.402 GiB/sec
18.680 {'family_index': 0,
'per_family_instance_index': 2, 'run_name': 'BM_RleEncoding/32768/1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 33348}
BM_RleEncoding/4096/1 2.857 GiB/sec 3.363 GiB/sec
17.740 {'family_index': 0,
'per_family_instance_index': 1, 'run_name': 'BM_RleEncoding/4096/1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 263675}
BM_RleEncoding/65536/1 2.895 GiB/sec 3.381 GiB/sec
16.763 {'family_index': 0,
'per_family_instance_index': 3, 'run_name': 'BM_RleEncoding/65536/1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 16491}
BM_RleEncoding/1024/1 2.775 GiB/sec 3.114 GiB/sec
12.186 {'family_index': 0,
'per_family_instance_index': 0, 'run_name': 'BM_RleEncoding/1024/1',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1047638}
BM_RleEncodingSpacedBoolean/32768/10000 62.305 GiB/sec 62.721 GiB/sec
0.668 {'family_index': 1, 'per_family_instance_index': 4, 'run_name':
'BM_RleEncodingSpacedBoolean/32768/10000', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1427195, 'null_percent':
100.0}
BM_RleEncodingBoolean/1024 410.025 MiB/sec 406.502 MiB/sec
-0.859 {'family_index': 0,
'per_family_instance_index': 0, 'run_name': 'BM_RleEncodingBoolean/1024',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 294385}
BM_RleEncodingBoolean/4096 426.461 MiB/sec 421.442 MiB/sec
-1.177 {'family_index': 0,
'per_family_instance_index': 1, 'run_name': 'BM_RleEncodingBoolean/4096',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 76216}
BM_RleEncodingBoolean/32768 435.241 MiB/sec 425.014 MiB/sec
-2.350 {'family_index': 0,
'per_family_instance_index': 2, 'run_name': 'BM_RleEncodingBoolean/32768',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 9715}
BM_RleEncodingBoolean/65536 435.624 MiB/sec 425.030 MiB/sec
-2.432 {'family_index': 0,
'per_family_instance_index': 3, 'run_name': 'BM_RleEncodingBoolean/65536',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4871}
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (12)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender
change %
counters
BM_RleEncoding/4096/16 644.740 MiB/sec 604.668 MiB/sec
-6.215 {'family_index': 0,
'per_family_instance_index': 9, 'run_name': 'BM_RleEncoding/4096/16',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 57555}
BM_RleEncoding/65536/16 646.133 MiB/sec 601.672 MiB/sec
-6.881 {'family_index': 0,
'per_family_instance_index': 11, 'run_name': 'BM_RleEncoding/65536/16',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3644}
BM_RleEncoding/32768/16 649.870 MiB/sec 601.706 MiB/sec
-7.411 {'family_index': 0,
'per_family_instance_index': 10, 'run_name': 'BM_RleEncoding/32768/16',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7281}
BM_RleEncoding/1024/16 650.748 MiB/sec 601.229 MiB/sec
-7.610 {'family_index': 0,
'per_family_instance_index': 8, 'run_name': 'BM_RleEncoding/1024/16',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 230323}
BM_RleEncodingSpacedBoolean/32768/5000 228.351 MiB/sec 207.755 MiB/sec
-9.020 {'family_index': 1, 'per_family_instance_index': 3, 'run_name':
'BM_RleEncodingSpacedBoolean/32768/5000', 'repetitions': 1, 'repetition_index':
0, 'threads': 1, 'iterations': 5085, 'null_percent': 50.0}
BM_RleEncodingSpacedBoolean/32768/1000 172.863 MiB/sec 154.948 MiB/sec
-10.364 {'family_index': 1, 'per_family_instance_index': 2, 'run_name':
'BM_RleEncodingSpacedBoolean/32768/1000', 'repetitions': 1, 'repetition_index':
0, 'threads': 1, 'iterations': 3833, 'null_percent': 10.0}
BM_RleEncodingSpacedBoolean/32768/1 175.361 MiB/sec 155.387 MiB/sec
-11.390 {'family_index': 1, 'per_family_instance_index': 0, 'run_name':
'BM_RleEncodingSpacedBoolean/32768/1', 'repetitions': 1, 'repetition_index': 0,
'threads': 1, 'iterations': 3953, 'null_percent': 0.01}
BM_RleEncodingSpacedBoolean/32768/100 173.656 MiB/sec 153.307 MiB/sec
-11.718 {'family_index': 1, 'per_family_instance_index': 1, 'run_name':
'BM_RleEncodingSpacedBoolean/32768/100', 'repetitions': 1, 'repetition_index':
0, 'threads': 1, 'iterations': 3910, 'null_percent': 1.0}
BM_RleEncoding/32768/8 650.268 MiB/sec 573.097 MiB/sec
-11.868 {'family_index': 0,
'per_family_instance_index': 6, 'run_name': 'BM_RleEncoding/32768/8',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7267}
BM_RleEncoding/1024/8 644.999 MiB/sec 566.453 MiB/sec
-12.178 {'family_index': 0,
'per_family_instance_index': 4, 'run_name': 'BM_RleEncoding/1024/8',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 232475}
BM_RleEncoding/4096/8 653.119 MiB/sec 570.380 MiB/sec
-12.668 {'family_index': 0,
'per_family_instance_index': 5, 'run_name': 'BM_RleEncoding/4096/8',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 56561}
BM_RleEncoding/65536/8 651.448 MiB/sec 567.123 MiB/sec
-12.944 {'family_index': 0,
'per_family_instance_index': 7, 'run_name': 'BM_RleEncoding/65536/8',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3611}
```
I tried some alternative approaches that avoid the cast to int64 and operate
on int number of bytes instead of bits, but those were similarly slower and
could also potentially still overflow if the max buffer size was close to int32
max. Eg:
```C++
int new_bit_offset = bit_offset_ + num_bits;
if (ARROW_PREDICT_FALSE(byte_offset_ +
(new_bit_offset == 0 ? 0 : (1 +
(new_bit_offset - 1) / 8)) >
max_bytes_))
```
Another alternative solution could be to limit the maximum buffer size to
something like `int32_max / 8 - 8` which I think should also prevent any
overflow without needing to change this if condition.
I also noticed that the return value from `BitWriter::PutValue` only appears
to be used in debug builds, so it could possibly make sense to make this check
only enabled in debug builds too. But removing this check for release builds
would mean that rather than silently failing to write out some data, there
could be invalid memory writes. And the encoders are public so could be used
outside of this codebase by consumers that do check that the writes succeed,
and that would be a breaking change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]