Copilot commented on code in PR #60938:
URL: https://github.com/apache/doris/pull/60938#discussion_r2870216576
##########
regression-test/suites/external_table_p0/tvf/test_parquet_meta_tvf.groovy:
##########
@@ -33,37 +33,51 @@ suite("test_parquet_meta_tvf",
"p0,external,external_docker,tvf") {
// parquet_metadata (S3)
// Note: Prefer asserting on stable metadata columns; avoid relying on
host-specific/local-only paths.
order_qt_parquet_metadata_s3 """
- select * from parquet_meta(
+ select
+ row_group_id, row_group_num_rows, row_group_num_columns,
row_group_bytes, column_id,
+ file_offset, num_values, path_in_schema, type, stats_min,
stats_max, stats_null_count,
+ stats_distinct_count, stats_min_value
+ from parquet_meta(
"uri" = "${basePath}/meta.parquet",
"s3.access_key" = "${ak}",
"s3.secret_key" = "${sk}",
"endpoint" = "${endpoint}",
"region" = "${region}",
"mode" = "parquet_metadata"
- );
+ )
+ order by row_group_id, column_id;
"""
// default mode: parquet_metadata
order_qt_parquet_metadata_default_mode """
- select * from parquet_meta(
+ select
+ row_group_id, row_group_num_rows, row_group_num_columns,
row_group_bytes, column_id,
+ file_offset, num_values, path_in_schema, type, stats_min,
stats_max, stats_null_count,
+ stats_distinct_count, stats_min_value
Review Comment:
The stable column projection list is duplicated across the S3 and
default-mode parquet_metadata queries. To reduce maintenance overhead (and
avoid future drift), consider extracting this shared column list into a single
Groovy variable/string and interpolating it into both query strings.
##########
regression-test/suites/external_table_p0/tvf/test_parquet_meta_tvf.groovy:
##########
@@ -33,37 +33,51 @@ suite("test_parquet_meta_tvf",
"p0,external,external_docker,tvf") {
// parquet_metadata (S3)
// Note: Prefer asserting on stable metadata columns; avoid relying on
host-specific/local-only paths.
order_qt_parquet_metadata_s3 """
- select * from parquet_meta(
+ select
+ row_group_id, row_group_num_rows, row_group_num_columns,
row_group_bytes, column_id,
+ file_offset, num_values, path_in_schema, type, stats_min,
stats_max, stats_null_count,
+ stats_distinct_count, stats_min_value
Review Comment:
The stable column projection list is duplicated across the S3 and
default-mode parquet_metadata queries. To reduce maintenance overhead (and
avoid future drift), consider extracting this shared column list into a single
Groovy variable/string and interpolating it into both query strings.
##########
regression-test/suites/external_table_p0/tvf/test_parquet_meta_tvf.groovy:
##########
@@ -104,7 +119,7 @@ suite("test_parquet_meta_tvf",
"p0,external,external_docker,tvf") {
// file metadata (S3 glob)
order_qt_parquet_file_metadata_s3_glob """
- select file_name from parquet_meta(
+ select count(*) from parquet_meta(
"uri" = "${basePath}/*meta.parquet",
Review Comment:
Switching the glob assertion from `select file_name` to `count(*)` makes the
test significantly weaker (it can pass even if the *wrong* set of files is
returned, as long as the total count matches). A more robust, bucket-stable
assertion is to project a stable derived value from `file_name` (e.g., the
basename via a regex) and order by it, so the test still validates which files
are present without depending on the bucket/URI prefix.
##########
regression-test/suites/external_table_p0/tvf/test_parquet_meta_tvf.groovy:
##########
@@ -33,37 +33,51 @@ suite("test_parquet_meta_tvf",
"p0,external,external_docker,tvf") {
// parquet_metadata (S3)
// Note: Prefer asserting on stable metadata columns; avoid relying on
host-specific/local-only paths.
order_qt_parquet_metadata_s3 """
- select * from parquet_meta(
+ select
+ row_group_id, row_group_num_rows, row_group_num_columns,
row_group_bytes, column_id,
+ file_offset, num_values, path_in_schema, type, stats_min,
stats_max, stats_null_count,
+ stats_distinct_count, stats_min_value
Review Comment:
The projection includes `stats_min_value` but not the corresponding
`stats_max_value`. If the intent is to keep the assertion symmetric and
interpretable (especially for non-numeric types), consider selecting both
`stats_min_value` and `stats_max_value` (or neither) so the test coverage of
stats doesn’t become lopsided.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]