ollemartensson opened a new issue, #564: URL: https://github.com/apache/arrow-julia/issues/564
### Overview The addition of Tensor support is valuable for positioning Arrow.jl as a great library for machine learning and scientific computing. These multi-dimensional data structures are currently unsupported but are defined within the Arrow format specification. The Arrow specification provides distinct formats for dense and sparse multi-dimensional arrays, this issue are focusing on dense multi-dimensional arrays ### Dense Tensors The canonical extension for dense tensors is [arrow.fixed_shape_tensor](https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor). This format is designed to represent a column where every element is a tensor of the same shape and element type. - Storage Type: The underlying storage is a FixedSizeList. The elements of the tensor are flattened into a single, contiguous list in row-major (C-style) order. The list_size of the FixedSizeList is therefore the total number of elements in the tensor (i.e., the product of its dimensions). - Metadata: The multi-dimensional structure is described in the extension metadata. This metadata is a JSON string containing: - shape: A required array of integers defining the dimensions of the tensor. - dim_names: An optional array of strings providing names for each dimension. - permutation: An optional array of integers to describe a logical layout that is a permutation of the physical row-major layout. ### Proposed Design New struct types could be defined that act as zero-copy views over the underlying Arrow memory buffers. This struct would not own the data itself but would provide a multi-dimensional interpretation of an underlying Arrow.FixedSizeList. ``` struct DenseTensor{T, N} <: AbstractArray{T, N} parent::Arrow.FixedSizeList{T} shape::NTuple{N, Int} dim_names::Union{Nothing, NTuple{N, Symbol}} end ``` This design allows the DenseTensor to leverage Julia's rich AbstractArray interface for slicing and other operations while ensuring data is never copied from the Arrow buffer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
