ollemartensson opened a new issue, #564:
URL: https://github.com/apache/arrow-julia/issues/564

   ### Overview
   
   The addition of Tensor support is valuable for positioning Arrow.jl as a 
great library for machine learning and scientific computing. These 
multi-dimensional data structures are currently unsupported but are defined 
within the Arrow format specification. The Arrow specification provides 
distinct formats for dense and sparse multi-dimensional arrays, this issue are 
focusing on dense multi-dimensional arrays
   
   ### Dense Tensors
   
   The canonical extension for dense tensors is 
[arrow.fixed_shape_tensor](https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor).
 This format is designed to represent a column where every element is a tensor 
of the same shape and element type.
   - Storage Type: The underlying storage is a FixedSizeList. The elements of 
the tensor are flattened into a single, contiguous list in row-major (C-style) 
order. The list_size of the FixedSizeList is therefore the total number of 
elements in the tensor (i.e., the product of its dimensions).
   - Metadata: The multi-dimensional structure is described in the extension 
metadata. This metadata is a JSON string containing:
     - shape: A required array of integers defining the dimensions of the 
tensor.
     - dim_names: An optional array of strings providing names for each 
dimension.
     - permutation: An optional array of integers to describe a logical layout 
that is a permutation of the physical row-major layout.
   
   ### Proposed Design
   
   New struct types could be defined that act as zero-copy views over the 
underlying Arrow memory buffers.
   This struct would not own the data itself but would provide a 
multi-dimensional interpretation of an underlying Arrow.FixedSizeList.
   
   ```
   struct DenseTensor{T, N} <: AbstractArray{T, N}
       parent::Arrow.FixedSizeList{T}
       shape::NTuple{N, Int}
       dim_names::Union{Nothing, NTuple{N, Symbol}}
   end
   ```
   
   This design allows the DenseTensor to leverage Julia's rich AbstractArray 
interface for slicing and other operations while ensuring data is never copied 
from the Arrow buffer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to