Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23485 )

Change subject: KUDU-1261 Add array type support to Python client
......................................................................


Patch Set 1: Code-Review+1

(9 comments)

Thanks a lot for implementing this!  I took a quick look: overall it looks 
good, just a few questions and nits.

http://gerrit.cloudera.org:8080/#/c/23485/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23485/1//COMMIT_MSG@27
PS1, Line 27: - Decimal arrays: blocked on C++ API int128 overload support
            :   (existing decimal uses int128, but SetArrayDecimal only has
            :   int32/int64 overloads - needs API design review)
Is it a practical restriction (i.e. something Cython-related) or that's 
something else?


http://gerrit.cloudera.org:8080/#/c/23485/1//COMMIT_MSG@30
PS1, Line 30: - Fix: Empty BOOL array read triggers C++ assertion failure
            :   (requires C++ client fix)
The current C++'s client's test coverage is spotty, indeed.  I'll work on 
adding more tests in coming days.

Interesting, if that were something that's addressed by 
https://gerrit.cloudera.org/#/c/23486/ , that would affect all the types.  
Anyway, I'll take a closer look, adding more scenarios in the set of test cases 
in https://gerrit.cloudera.org/#/c/23220

I'm planning to keep you posted on my findings and on the fix once it's 
available.


http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/client.pyx
File python/kudu/client.pyx:

http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/client.pyx@3234
PS1, Line 3234: value
Is this supposed to be Python list in case of arrays?


Maybe a topic for a separate patch, but does it make sense to provide an 
interface where Python array (not list) is used as a parameter to write into 
array columns?  That might be quite beneficial from performance reasons.


http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/client.pyx@3344
PS1, Line 3344:         for elem in value:
              :             cpp_validity.push_back(elem is not None)
For performance reasons, is it possible to reserve the necessary space in the 
cpp_validity and the typed arrays before calling 'push_back' for all the 
elements?


http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/client.pyx@3472
PS1, Line 3472:
nit: remove stray spaces


http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/libkudu_client.pxd
File python/kudu/libkudu_client.pxd:

http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/libkudu_client.pxd@473
PS1, Line 473:
nit: remove the trailing space


http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/tests/test_array_datatype.py
File python/kudu/tests/test_array_datatype.py:

http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/tests/test_array_datatype.py@47
PS1, Line 47: API inconsistency
Does it mean it's not possible to supply int32_t and int64_t from Python to 
reach the C++ client API with overloaded methods for decimal arrays (e.g., due 
to Cython-related issues or alike) or that's something else?


http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/tests/test_schema.py
File python/kudu/tests/test_schema.py:

http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/tests/test_schema.py@628
PS1, Line 628:             self.assertEqual(col.type.name, 'nested')
nit: in C++ tests there also verification for the type of array elements (at 
least, in string representation, e.g. TestSchema.SimpleSchemaWithScalarArrays 
from schema-test.cc); does it make sense to do here as well?


http://gerrit.cloudera.org:8080/#/c/23485/1/python/kudu/tests/test_schema.py@683
PS1, Line 683:         empty_test_data = [
             :             ('arr_int8', []),
             :             ('arr_int16', []),
             :             ('arr_int32', []),
             :             ('arr_int64', []),
             :             ('arr_float', []),
             :             ('arr_double', []),
             :             ('arr_bool', []),
             :             ('arr_string', []),
             :             ('arr_binary', []),
             :             ('arr_unixtime_micros', []),
             :             ('arr_date', []),
             :             ('arr_varchar', []),
             :         ]
nit: does it make sense to add a similar bunch, but with nulls instead of empty 
arrays?



--
To view, visit http://gerrit.cloudera.org:8080/23485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2329c7466534bd4961860c05b600e7d4b4a11507
Gerrit-Change-Number: 23485
Gerrit-PatchSet: 1
Gerrit-Owner: Marton Greber <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Fri, 03 Oct 2025 00:17:16 +0000
Gerrit-HasComments: Yes

Reply via email to