Re: [I] Add test coverage for grouping on dictionary encoded columns [datafusion]

via GitHub Tue, 07 Apr 2026 08:19:07 -0700


Rich-T-kid commented on issue #8791:
URL: https://github.com/apache/datafusion/issues/8791#issuecomment-4200102916


   @alamb I’m a bit confused by what you meant by:
   
   "However, those are not reading from Parquet files (as I could not figure 
out how to make a parquet file have the same schema I wanted)"
   
   In dictionary.slt I see that this statement exists:
   ```
   statement ok
   COPY (SELECT arrow_cast(column1, 'Dictionary(Int32, Utf8)') AS column1, 
column2 FROM test0) TO 'test_files/scratch/dictionary/part_dict_test' STORED AS 
PARQUET PARTITIONED BY (column1);
   ```
   which writes out the table as a partitioned Parquet dataset.
   
   I’ve also included my own test in a [draft 
PR](https://github.com/apache/datafusion/pull/21444)  where I create a 
dictionary-encoded column, write it out to a Parquet file, then read it back in 
again and do a schema check (confirm it remains Dictionary(...)) as well as a 
basic filter.
   Is this what you meant by “reading from a Parquet file”? Or were you 
referring to a more specific Parquet scenario (e.g., a particular 
schema/layout/encoding that you weren’t able to reproduce) for the coverage 
you’re looking for?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add test coverage for grouping on dictionary encoded columns [datafusion]

Reply via email to