JazJaz426 opened a new issue, #44944: URL: https://github.com/apache/arrow/issues/44944
### Describe the usage question you have. Please include as many useful details as possible. Hi folks, I read online that in PyArrow a string column would have a column-level size limit of 2GB. However, in my work I noticed this doesn't hold. ``` def some_function( self, raw_table: pa.Table, ): schema = raw_table.schema df = pl.DataFrame(raw_table) ``` In the code above, the table `raw_table` has some `document` column with size over 2GB and I used `sum(buf.size if buf is not None else 0 for buf in arrow_array.buffers()` for size checking. But when I check the schema it says that column is String type. I later cast it to polar then cast back to arrow, which automatically turns all string to large string due to polar's default setting. Trying to cast the `document` column back to string type but it got casting error and it always stop at some fixed number of rows. I calculated the size processed and its roughly 2GB. So considering both cases, pretty sure the column is def over 2GB and there's no calculation error in the first place. However super curious why it's showing string type in the first place? Is there's something subtle I'm not aware of? ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org