willtemperley opened a new issue, #154:
URL: https://github.com/apache/arrow-swift/issues/154
### Describe the enhancement requested
Arrow Swift as it stands has significant architectural problems.
# The problems
### Buffers can only be a single type
Every ArrowArray holds an ArrowData instance, which holds an array of
ArrowBuffers. An ArrowBuffer allocates its own memory. This means that IPC
cannot be zero-copy, because the buffer cannot be any other type, such as a
view over another binary object like Data.
This is compounded by the fact that the ArrowBuffer exposes its memory via:
public let rawPointer: UnsafeMutableRawPointer
Pointer arithmetic is then
being done within the subscript of arrays. This is actually duplicated in each
array.
Second, having a public raw pointer like that is dangerous, e.g. it
could easily cause use-after-free errors.
Allowing multiple buffer types, e.g. variable length type buffers and fixed
type buffers can improve performance and memory safety.
### ArrowTypes
Currently two reference types are used to describe an ArrowType. There is
no reason I can think of for An arrow type to be mutable. Arrow types can be
represented as a single value type, using an enum. This is likely improve
performance and make the library far more usable, with enums offering much
better ergonomoics and safety, with exhaustiveness checking and simpler
equality checks, for example.
# Feasibility
This is definitely feasible because I have already done most of this in
Swift Arrow:
https://github.com/willtemperley/swift-arrow/
This version includes support for all types that Arrow Swift covers, plus
schema metadata, map types, binary views, binary with 64 bit offsets and nested
types with 64 bit offsets.
It also passes Arrow Gold integration tests for these types.
I have compeltely rewritten the buffer infrastructure. Zero-copy IPC is
available using views over memory-mapped Data objects.
I have replaced ArrowType with an enum which was adapted from from arrow-rs.
# The challenge
Changing ArrowType to an enum would completely change the public API.
Replacing the buffer infrastructure means an almost complete rewrite of
arrays and array builder.
# Caveats
My version does not include C import / export and Arrow Flight support is
pending.
# Suggested way forward
I think this would simply need to be a clean-break fork, with a major
version bump and a migration guide.
I'm happy to donate any code I've written in Swift Arrow back to Arrow Swift.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]