lesterfan opened a new issue, #45534: URL: https://github.com/apache/arrow/issues/45534
### Describe the bug, including details regarding any error messages, version, and platform. This is half a bug report regarding the [RunEndEncodeTableColumns](https://github.com/apache/arrow/blob/6a47e4d28cdc4592fe6a458dbe5efe3b17a090e5/cpp/src/arrow/testing/gtest_util.cc#L476-L492) gtest util and half a usage question. If a string column in an `arrow::Table` is run-end encoded, should the corresponding schema type be `arrow::utf8()` or `arrow::run_end_encoded(arrow::int32(), arrow::utf8())`? The [RunEndEncodeTableColumns](https://github.com/apache/arrow/blob/6a47e4d28cdc4592fe6a458dbe5efe3b17a090e5/cpp/src/arrow/testing/gtest_util.cc#L476-L492) gtest util currently returns a table like ``` ree_table = col: string ---- col: [ -- run_ends: [ 1, 2, 3, 4 ] -- values: [ "a", "b", "c", "d" ] ] ``` whereas I would have expected a table like ``` ree_table = col: run_end_encoded<run_ends: int32, values: string> child 0, run_ends: int32 not null child 1, values: string ---- col: [ -- run_ends: [ 1, 2, 3, 4 ] -- values: [ "a", "b", "c", "d" ] ] ``` I'm not sure which is more correct here. My instinct is that the second is more correct since I see in the codebase that certain features are disabled for run-end-encoded types ([example](https://github.com/apache/arrow/blob/6a47e4d28cdc4592fe6a458dbe5efe3b17a090e5/python/pyarrow/src/arrow/python/arrow_to_pandas.cc#L1383)), so we would want the schema to be accurate to reflect what the library currently supports on the column. I definitely don't have a lot of context here though, so I may be missing something 🙂 ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org