willtemperley opened a new issue, #154:
URL: https://github.com/apache/arrow-swift/issues/154

   ### Describe the enhancement requested
   
   Arrow Swift as it stands has significant architectural problems.
   
   # The problems
   
   ###  Buffers can only be a single type
   
   Every ArrowArray holds an ArrowData instance, which holds an array of 
ArrowBuffers. An ArrowBuffer allocates its own memory. This means that IPC 
cannot be zero-copy, because the buffer cannot be any other type, such as a 
view over another binary object like Data. 
   
   This is compounded by the fact that the ArrowBuffer exposes its memory via:


    public let rawPointer: UnsafeMutableRawPointer

Pointer arithmetic is then 
being done within the subscript of arrays. This is actually duplicated in each 
array. 
Second, having a public raw pointer like that is dangerous, e.g. it 
could easily cause use-after-free errors.

   
   Allowing multiple buffer types, e.g. variable length type buffers and fixed 
type buffers can improve performance and memory safety.
   
   ### ArrowTypes
   
   Currently two reference types are used to describe an  ArrowType. There is 
no reason I can think of for An arrow type to be mutable.  Arrow types can be 
represented as a single value type, using an enum. This is likely improve 
performance and make the library far more usable, with enums offering much 
better ergonomoics and safety, with exhaustiveness checking and simpler 
equality checks, for example.
   
   # Feasibility
   
   This is definitely feasible because I have already done most of this in 
Swift Arrow:
   https://github.com/willtemperley/swift-arrow/
   
   This version includes support for all types that Arrow Swift covers, plus 
schema metadata, map types, binary views, binary with 64 bit offsets and nested 
types with 64 bit offsets. 
   
   It also passes Arrow Gold integration tests for these types. 
   
   I have compeltely rewritten the buffer infrastructure. Zero-copy IPC is 
available using views over memory-mapped Data objects.
   
   I have replaced ArrowType with an enum which was adapted from from arrow-rs. 
   
   # The challenge
   
   Changing ArrowType to an enum would completely change the public API.
   Replacing the buffer infrastructure means an almost complete rewrite of 
arrays and array builder.
   
   # Caveats
   
   My version does not include C import / export and Arrow Flight support is 
pending. 
   
   # Suggested way forward
   
   I think this would simply need to be a clean-break fork, with a major 
version bump and a migration guide.
   I'm happy to donate any code I've written in Swift Arrow back to Arrow Swift.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to