asfimport opened a new issue, #352:
URL: https://github.com/apache/arrow-java/issues/352

   Am facing an endianness issue on s390x(big endian) when converting the data 
read through flight to pandas data frame.
   
   (1) table.validate() fails with error
   ```Java
   
   Traceback (most recent call last):
     File "/tmp/2.py", line 51, in <module>
       table.validate()
     File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
     File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
binary array
   ```
   
   (2) table.to_pandas() gives a segmentation fault
   ____________
   Here is a sample code that I am using:
   ```python
   
   from pyarrow import flight
   import os
   import json
   
   flight_endpoint = os.environ.get("flight_server_url", 
"grpc+tls://...local:443")
   print(flight_endpoint)
   
   #
   class TokenClientAuthHandler(flight.ClientAuthHandler):
       """An example implementation of authentication via handshake.
          With the default constructor, the user token is read from the 
environment: TokenClientAuthHandler().
          You can also pass a user token as parameter to the constructor, 
TokenClientAuthHandler(yourtoken).
       """
       def \_\_init\_\_(self, token: str = None):
           super().\_\_init\__()
           if( token != None):
               strToken = strToken = 'Bearer {}'.format(token)
           else:
               strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
           self.token = strToken.encode('utf-8')
           #print(self.token)
   
       def authenticate(self, outgoing, incoming):
           outgoing.write(self.token)
           self.token = incoming.read()
   
       def get_token(self):
           return self.token
       
   readClient = flight.FlightClient(flight_endpoint)
   readClient.authenticate(TokenClientAuthHandler())
   
   cmd = json.dumps(\{...})
   
   descriptor = flight.FlightDescriptor.for_command(cmd)
   flightInfo = readClient.get_flight_info(descriptor)
   
   reader = readClient.do_get(flightInfo.endpoints[0].ticket)
   table = reader.read_all()
   
   print(table)
   print(table.num_columns)
   print(table.num_rows)
   table.validate()
   table.to_pandas()
   ```
   
   **Environment**: Linux s390x (big endian)
   **Reporter**: [Ravi 
Gummadi](https://issues.apache.org/jira/browse/ARROW-15645)
   
   <sub>**Note**: *This issue was originally created as 
[ARROW-15645](https://issues.apache.org/jira/browse/ARROW-15645). Please see 
the [migration documentation](https://github.com/apache/arrow/issues/14542) for 
further details.*</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to