[PR] Add array data type support for Python [fluss-rust]

via GitHub Wed, 01 Apr 2026 16:43:21 -0700


qzyu999 opened a new pull request, #474:
URL: https://github.com/apache/fluss-rust/pull/474


   <!--
   *Thank you very much for contributing to Fluss - we are happy that you want 
to help us improve Fluss. To help the community review your contribution in the 
best possible way, please go through the checklist below, which will get the 
contribution into a shape in which it can be best reviewed.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GitHub 
issue](https://github.com/apache/fluss-rust/issues). Exceptions are made for 
typos in JavaDoc or documentation files, which need no issue.
   
     - Name the pull request in the format "[component] Title of the pull 
request", where *[component]* should be replaced by the name of the component 
being changed. Typically, this corresponds to the component label assigned to 
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are 
unsure about which is the best component.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Make sure that the change passes the automated tests, i.e., `mvn clean 
verify` passes.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
   
   **(The sections below can be removed for hotfixes or typos)**
   -->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   Linked issue: close #469 
   
   <!-- What is the purpose of the change -->
   The purpose of this change is to complete the Python implementation for 
Array types by adding support for deterministic-length arrays (FixedSizeList). 
This ensures the Python client can interface with all standard Arrow list 
layouts while providing an idiomatic, programmatic way to construct nested 
schemas.
   
   ### Brief change log
   * **Core Engine Refinement (`crates/fluss/src/row/column.rs`):** Added 
explicit downcasting for `FixedSizeListArray`. This preserves performance by 
calculating element positions via direct multiplication, avoiding the memory 
fetch overhead of an offsets buffer.
   * **Programmatic Schema API (`bindings/python/src/metadata.rs`):** 
Introduced a `DataTypes` factory class for Python. This enables the 
construction of nested types (e.g., `DataTypes.array(DataTypes.int())`) which 
was previously impossible for FFI-backed types.
   * **Idiomatic FFI Bindings:** Replaced inherent `.to_string()` methods with 
the standard Rust `fmt::Display` trait for `DataType`. This standardizes string 
representation across Rust and Python (`__str__` and `__repr__`). This should 
also be reusable for when adding Map<Key, Value> or Struct/Row.
   * **Arrow Translator Update (`crates/fluss/src/record/arrow.rs`):** Updated 
the type translator to unify `List`, `LargeList`, and `FixedSizeList` into a 
single logical Fluss `Array` type.
   * **Linter & Precision Pass:** Cleaned up `clippy::clone_on_copy` warnings 
in integration tests and replaced hardcoded float approximations with 
`std::f32::consts::PI` and `std::f64::consts::E` to prevent precision drift in 
streaming aggregations.
   
   <!-- Please describe the changes made in this pull request and explain how 
they address the issue -->
   
   ### Tests
   * **Rust Unit Tests:** Added `test_from_arrow_type_fixed_size_list` to 
verify the Arrow-to-Fluss type translation.
   * **Python Metadata Tests:** Added tests in `test_schema.py` to verify the 
`DataTypes` factory and string representation logic.
   * **Integration Tests:** * `test_append_and_scan_with_array`: Verifies 
round-trip for variable-length arrays.
       * `test_append_and_scan_with_fixed_size_array`: Verified client-side but 
currently **SKIPPED** in CI. This test requires a Fluss server version >= 0.9.1 
to handle the new storage layout.
   <!-- List UT and IT cases to verify this change -->
   
   ### API and Format
   * **API:** This change adds a new `DataTypes` factory to the Python API. It 
also improves the `__repr__` output for schemas, making nested types 
human-readable.
   * **Format:** This PR introduces support for the `FixedSizeList` Arrow 
storage format within the Fluss engine, which is more space-efficient and 
performant for fixed-length vector data (like coordinates or embeddings).
   
   <!-- Does this change affect API or storage format -->
   
   ### Documentation
   This change introduces a new feature: Support for the `Array` data type in 
the Python client, including support for `FixedSizeList`. Users can now define, 
write, and read array-based columns using the Python SDK.
   <!-- Does this change introduce a new feature -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Add array data type support for Python [fluss-rust]

Reply via email to