daniel-adam-tfs opened a new issue, #472:
URL: https://github.com/apache/arrow-go/issues/472
### Describe the enhancement requested
We’re migrating some data from a internal format to parquets, and as I was
profiling our new code I discovered a source of a large number of allocations:
the casting of a `[]T` to an `interface{}`.
In our data we have RLE encoding for some columns and it seems that for our
data the `DictionaryConverter` methods get called a lot. (I’ve tested this with
a parquet file with 10 000 000 rows overall, and for example the Fill method
gets called 60 000+ times on one column and 100 000+ times on another). The
interface methods all use `interface{}` parameters and the casting from a typed
slice to an `interface{}` results in a heap allocation. Overall this causes a
significant overhead.
Using generics for the interface avoids the allocations. In my fork -
https://github.com/daniel-adam-tfs/arrow-go- I tested some modifications:
* I’ve created a generic interface:
```go
type DictionaryConverter2[T int32 | int64 | float32 | float64] interface {
Copy2([]T, []IndexType) error
Fill2([]T, IndexType) error
FillZero2([]T)
IsValid(...IndexType) bool
IsValidSingle(IndexType) bool
}
```
* I’ve created `IsValidSingle` because when using a single value for
variadic parameter then the runtime allocates a slice on the heap and this
method again gets called a lot for our data.
* I’ve added the methods from the `DictionaryConverter2` to the
dictConverter[T] and the simply casted `DictionaryConverter` to
`DictionaryConverter2` before use and invoked the new methods instead of the
old ones.
I’ve used to PyArrow to generate a parquet file with 10 000 000 rows, with
max 1000 unique values and a value has repeating runs 1-50 and here are the
results:
Implementation using DictionaryConverter:
```
% benchstat testdata/BenchmarkInt32Column.txt
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow-go/v18/parquet/file
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
│ testdata/BenchmarkInt32Column.txt │
│ sec/op │
Int32Column-8 86.50m ± 6%
│ testdata/BenchmarkInt32Column.txt │
│ B/op │
Int32Column-8 12.88Mi ± 0%
│ testdata/BenchmarkInt32Column.txt │
│ allocs/op │
Int32Column-8 712.9k ± 0%
```
Implementation using DictionaryConverter2:
```
% benchstat testdata/BenchmarkInt32Column.new.txt
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow-go/v18/parquet/file
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
│ testdata/BenchmarkInt32Column.new.txt │
│ sec/op │
Int32Column-8 61.27m ± 7%
│ testdata/BenchmarkInt32Column.new.txt │
│ B/op │
Int32Column-8 2.932Mi ± 0%
│ testdata/BenchmarkInt32Column.new.txt │
│ allocs/op │
Int32Column-8 1.301k ± 0%
```
Comparison by benchstat:
```
% benchstat testdata/BenchmarkInt32Column.txt
testdata/BenchmarkInt32Column.new.txt
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow-go/v18/parquet/file
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
│ testdata/BenchmarkInt32Column.txt │
testdata/BenchmarkInt32Column.new.txt │
│ sec/op │ sec/op vs base
│
Int32Column-8 86.50m ± 6% 61.27m ± 7% -29.16%
(p=0.000 n=8)
│ testdata/BenchmarkInt32Column.txt │
testdata/BenchmarkInt32Column.new.txt │
│ B/op │ B/op vs base
│
Int32Column-8 12.878Mi ± 0% 2.932Mi ± 0% -77.23%
(p=0.000 n=8)
│ testdata/BenchmarkInt32Column.txt │
testdata/BenchmarkInt32Column.new.txt │
│ allocs/op │ allocs/op vs base
│
Int32Column-8 712.870k ± 0% 1.301k ± 0% -99.82%
(p=0.000 n=8)
```
So it got rid of 99%+ of allocations for this use case, which is pretty good
I’d say.
I can take a stab at it, but rewriting DictionaryConverter to
DictionaryConverter[T] might end up being a significant amount of code changes,
so I wanna know what do maintainers think about it.
### Component(s)
Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]