korbit-ai[bot] commented on code in PR #34808:
URL: https://github.com/apache/superset/pull/34808#discussion_r2292629334
##########
superset/result_set.py:
##########
@@ -135,21 +135,21 @@ def __init__( # pylint: disable=too-many-locals # noqa:
C901
if data and (not isinstance(data, list) or not isinstance(data[0],
tuple)):
data = [tuple(row) for row in data]
array = np.array(data, dtype=numpy_dtype)
- if array.size > 0:
- for column in column_names:
- try:
- pa_data.append(pa.array(array[column].tolist()))
- except (
- pa.lib.ArrowInvalid,
- pa.lib.ArrowTypeError,
- pa.lib.ArrowNotImplementedError,
- ValueError,
- TypeError, # this is super hackey,
- # https://issues.apache.org/jira/browse/ARROW-7855
- ):
- # attempt serialization of values as strings
- stringified_arr = stringify_values(array[column])
- pa_data.append(pa.array(stringified_arr.tolist()))
+
+ for column in column_names:
+ try:
+ pa_data.append(pa.array(array[column].tolist()))
+ except (
+ pa.lib.ArrowInvalid,
+ pa.lib.ArrowTypeError,
+ pa.lib.ArrowNotImplementedError,
+ ValueError,
+ TypeError, # this is super hackey,
+ # https://issues.apache.org/jira/browse/ARROW-7855
+ ):
+ # attempt serialization of values as strings
+ stringified_arr = stringify_values(array[column])
+ pa_data.append(pa.array(stringified_arr.tolist()))
Review Comment:
### Inefficient Array Type Conversions <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
Converting array to list then back to Arrow array with intermediate
stringification creates unnecessary memory allocations.
###### Why this matters
Multiple conversions between array types and unnecessary string conversions
can significantly impact performance with large datasets.
###### Suggested change ∙ *Feature Preview*
Try to minimize conversions by directly creating Arrow arrays or batching
conversions:
```python
# Convert data directly to Arrow array when possible
try:
pa_data.append(pa.array(array[column], type=infer_arrow_type(column)))
except:
# Fallback to string conversion only when necessary
stringified_arr = stringify_values(array[column])
pa_data.append(pa.array(stringified_arr))
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/05549741-5b18-4ae6-9d4e-dc5f742e3263/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/05549741-5b18-4ae6-9d4e-dc5f742e3263?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/05549741-5b18-4ae6-9d4e-dc5f742e3263?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/05549741-5b18-4ae6-9d4e-dc5f742e3263?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/05549741-5b18-4ae6-9d4e-dc5f742e3263)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:6fca1a94-175d-498f-93c6-422867826ce2 -->
[](6fca1a94-175d-498f-93c6-422867826ce2)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]