danepitkin opened a new issue, #37943:
URL: https://github.com/apache/arrow/issues/37943

   ### Describe the enhancement requested
   
   Arrow and Parquet does not have exhaustive integration testing for all 
possible Parquet data types. 
   
   For example, it would be useful if there was a single simple sample Parquet 
file that had only 1 or 2 rows of data, but covered as much of the type feature 
space as possible. This would also be useful for testing backwards 
compatibility of versions e.g. to help catch issues like these[1].
   
   The arrow testing data currently lives in a separate repo[2].
   
   We should:
   * Put together a directory/list/repo of parquet file(s) that can hit the 
cross section of features/types/encodings to be a good test suite
   * Create the infrastructure for actually testing against them e.g. Parquet 
reader tests
   
   [1]https://lists.apache.org/thread/4sw2vfmdx60kl2psolwvch8h2297zdkb
   
[2]https://github.com/apache/arrow-testing/tree/47f7b56b25683202c1fd957668e13f2abafc0f12
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to