Re: [PR] [feature](filecache) persist table/partition context for cache meta for external table [doris]

via GitHub Tue, 02 Jun 2026 08:15:51 -0700


github-actions[bot] commented on code in PR #61518:
URL: https://github.com/apache/doris/pull/61518#discussion_r3342311626



##########
fe/be-java-extensions/java-common/src/main/java/org/apache/doris/common/jni/vec/VectorColumn.java:
##########
@@ -1605,6 +1605,7 @@ public byte[] getBytesVarbinary(int rowId) {
     }
 
     public void updateMeta(VectorColumn meta) {
+        meta.appendLong(isConst ? 1L : 0L);

Review Comment:
   This changes the per-column metadata layout that BE reads from 
`VectorTable.getMetaAddress()`, but the BE reader still expects the first slot 
for each column to be the null-map pointer. `JniDataBridge::fill_column()` 
calls `address.next_meta_as_ptr()` as `null_map_ptr` before reading any 
type-specific data, so a non-const column now sends `0` and BE returns 
`Unsupported type ... in java side`; a const column sends address `1`. Please 
update the BE metadata reader/writer contract in the same change, or do not 
prepend this field on the Java-to-BE path.



##########
fe/fe-core/src/main/java/org/apache/doris/catalog/SchemaTable.java:
##########
@@ -738,6 +738,8 @@ public class SchemaTable extends Table {
                                     .column("TABLET_ID", 
ScalarType.createType(PrimitiveType.BIGINT))
                                     .column("SIZE", 
ScalarType.createType(PrimitiveType.BIGINT))
                                     .column("TYPE", 
ScalarType.createStringType())
+                                    .column("TABLE_NAME", 
ScalarType.createStringType())

Review Comment:
   Adding these columns in the middle of `information_schema.file_cache_info` 
shifts the ordinal positions of the existing `REMOTE_PATH`, `CACHE_PATH`, and 
`BE_ID` columns. `SELECT *` consumers and clients reading by column index will 
now see different data in previously stable positions, and mixed FE/BE 
expectations are harder during upgrades. Please append the new columns after 
the existing columns instead.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/FileScanNode.java:
##########
@@ -93,12 +94,93 @@ protected void toThrift(TPlanNode planNode) {
         TFileScanNode fileScanNode = new TFileScanNode();
         fileScanNode.setTupleId(desc.getId().asInt());
         if (desc.getTable() != null) {
-            fileScanNode.setTableName(desc.getTable().getName());
+            
fileScanNode.setTableName(desc.getTable().getNameWithFullQualifiers());
         }
         planNode.setFileScanNode(fileScanNode);
         super.toThrift(planNode);
     }
 
+    public static void fillTablePartitionContext(TFileRangeDesc range, TableIf 
table, String partitionName) {
+        if (range == null) {
+            return;
+        }
+        if (partitionName != null && !partitionName.isEmpty()) {
+            range.setPartitionName(partitionName);
+        }
+    }
+
+    public static String buildPartitionName(List<String> partitionKeys, 
List<String> partitionValues) {
+        List<TPartitionKeyValue> partitionKeyValues = 
buildPartitionKeyValues(partitionKeys, partitionValues);
+        if (partitionKeyValues.isEmpty()) {
+            return "";
+        }
+        return buildPartitionName(partitionKeyValues);
+    }
+
+    public static List<TPartitionKeyValue> buildPartitionKeyValues(
+            List<String> partitionKeys, List<String> partitionValues) {
+        if (partitionKeys == null || partitionValues == null) {
+            return Collections.emptyList();
+        }
+        if (partitionKeys.isEmpty() || partitionKeys.size() != 
partitionValues.size()) {
+            return Collections.emptyList();
+        }
+        List<TPartitionKeyValue> partitionKeyValues = 
Lists.newArrayListWithCapacity(partitionKeys.size());
+        for (int i = 0; i < partitionKeys.size(); i++) {

Review Comment:
   The list-based partition-value path does not handle null values, unlike the 
map-based overload below. `PluginDrivenSplit.getPartitionValuesInKeyOrder()` 
can add `partValues.get(partitionKey)` directly, including null, and this 
constructor sets `TPartitionKeyValue.value` (a required thrift string) without 
setting `is_null`. That can either serialize a null required field or make 
partition columns/cache partition names treat SQL NULL as an ordinary 
empty/missing value. Please normalize nulls here the same way as the map 
overload (`value = ""`, `is_null = true`).



##########
fe/fe-filesystem/fe-filesystem-cos/src/main/java/org/apache/doris/filesystem/cos/CosObjStorage.java:
##########
@@ -111,6 +111,18 @@ static Map<String, String> toS3Props(Map<String, String> 
cosProps) {
         return s3Props;
     }
 
+    private static Map<String, String> toValidatedS3Props(Map<String, String> 
cosProps) {
+        Map<String, String> s3Props = toS3Props(cosProps);

Review Comment:
   This validation runs before `S3FileSystemProperties.of()` normalizes aliases 
and derives a region from the endpoint. As a result, COS configurations using 
S3-compatible aliases accepted by the parent storage path, such as `s3.region` 
or `region`, fail here because `toS3Props()` only maps `COS_REGION` to 
`AWS_REGION`. Please validate through the normalized `S3FileSystemProperties` 
result or include the same aliases/derivation rules, otherwise previously valid 
COS/S3-compatible properties are rejected at construction time.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -362,21 +365,29 @@ private void setIcebergParams(TFileRangeDesc rangeDesc, 
IcebergSplit icebergSpli
         tableFormatFileDesc.setIcebergParams(fileDesc);
         Map<String, String> partitionValues = 
icebergSplit.getIcebergPartitionValues();
         if (partitionValues != null) {
-            List<String> fromPathKeys = new ArrayList<>();
-            List<String> fromPathValues = new ArrayList<>();
-            List<Boolean> fromPathIsNull = new ArrayList<>();
-            for (Map.Entry<String, String> entry : partitionValues.entrySet()) 
{
-                fromPathKeys.add(entry.getKey());
-                fromPathValues.add(entry.getValue() != null ? entry.getValue() 
: "");
-                fromPathIsNull.add(entry.getValue() == null);
-            }
-            rangeDesc.setColumnsFromPathKeys(fromPathKeys);

Review Comment:
   This new cache partition-context fill only runs when 
`icebergSplit.getIcebergPartitionValues()` was populated, but 
`createIcebergSplit()` only calls `setIcebergPartitionValues()` inside 
`sessionVariable.isEnableRuntimeFilterPartitionPrune()`. If that session 
variable is false, partitioned Iceberg scans still read the same files but 
never send `partition_values`, and `FileQueryScanNode` cannot recover them from 
the path because Iceberg path partition keys are empty. File-cache 
observability should not depend on enabling runtime-filter partition pruning; 
please populate identity partition values for the cache context independently 
of that optimization flag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feature](filecache) persist table/partition context for cache meta for external table [doris]

Reply via email to