el-hult opened a new issue, #45087:
URL: https://github.com/apache/arrow/issues/45087

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The R arrow library cannot load a file with schema
   ```
   schema: codes: large_list<element: dictionary<values=string, indices=int32, 
ordered=0>>
     child 0, element: dictionary<values=string, indices=int32, ordered=0>
   ```
   if the table is chunked. To reproduce, run below python script. in an 
environment that also has R with arrow installed
   
   ```python
   import pyarrow as pa
   import pyarrow.parquet as pq
   import subprocess
   
   def test_load_parquet(table,label):
       pq.write_table(table, "t.parquet", row_group_size=1)
       res = subprocess.run(
           ["Rscript", "-e", 
'library(arrow);t=arrow::read_parquet("t.parquet");'],
           capture_output=True,
       )
       print(f'{label}\n#####')
       if res.returncode != 0:
           stdErr = res.stderr.decode()
           assert "NotImplemented: Nested data conversions not implemented for 
chunked array outputs" in stdErr
           print('R  failed')
       else:
           print('R      ok')
   
       pq.read_table("t.parquet") # no error!
       print("python ok")
       print("schema:",pq.read_schema("t.parquet"))
   
   codes = [["a"],["a"]]
   t1 = pa.table({"codes": codes})
   t2 = pa.table({"codes": codes}).cast(
       pa.schema({"codes": pa.large_list(pa.dictionary(pa.int32(), 
pa.string()))})
   )
   t3 = pa.table({"codes": codes}).cast(
       pa.schema({"codes": pa.list_(pa.dictionary(pa.int32(), pa.string()))})
   )
   test_load_parquet(t1,'t1')
   test_load_parquet(t2,'t2')
   test_load_parquet(t3,'t3')
   
   ```
   
   
   to get the output
   ```
   t1
   #####
   R      ok
   python ok
   schema: codes: list<element: string>
     child 0, element: string
   t2
   #####
   R  failed
   python ok
   schema: codes: large_list<element: dictionary<values=string, indices=int32, 
ordered=0>>
     child 0, element: dictionary<values=string, indices=int32, ordered=0>
   t3
   #####
   R  failed
   python ok
   schema: codes: list<element: dictionary<values=string, indices=int32, 
ordered=0>>
     child 0, element: dictionary<values=string, indices=int32, ordered=0>
   ```
   
   I have verified this is an issue in R library versions 13.0.0.0 and 18.1.0. 
both list_ and large_list fails. 
   
   The error reported by the R library is discussed in #32723 , but since this 
works in pyarrow, I guess this is a separate issue from the C++ issue.
   
   ### Component(s)
   
   Parquet, R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to