morningman commented on PR #59984:
URL: https://github.com/apache/doris/pull/59984#issuecomment-3980358713

   ## Improvement Suggestions
   
   ### High Priority
   
   1. **Raw pointer safety for `_equality_delete_blocks`**
   
      
[EqualityDeleteBase](cci:1://file:///Users/morningman/workspace/git/doris/be/src/vec/exec/format/table/equality_delete.h:44:4-45:77)
 holds a raw `Block*` pointer with an implicit constraint that the 
`_equality_delete_blocks` vector must not be modified (and thus reallocated) 
after the 
[EqualityDeleteBase](cci:1://file:///Users/morningman/workspace/git/doris/be/src/vec/exec/format/table/equality_delete.h:44:4-45:77)
 instances are created. While the current code is safe, this implicit 
dependency is fragile for future maintainers.
   
      **Suggestion**: Add a comment documenting this constraint, or consider 
using `std::unique_ptr<Block>` with reference passing to make lifetime 
management explicit.
   
   2. **Code duplication between Parquet and ORC paths**
   
      `IcebergParquetReader::_process_equality_delete()` and 
`IcebergOrcReader::_process_equality_delete()` share approximately 70% similar 
logic, including:
      - Building `data_file_id_to_field` mappings
      - Iterating over delete files, validating field IDs, constructing 
`eq_file_node`
      - Merging blocks and creating `equality_delete_impl`
   
      **Suggestion**: Extract common logic into the base class 
[IcebergTableReader](cci:1://file:///Users/morningman/workspace/git/doris/be/src/vec/exec/format/table/iceberg_reader.h:72:4-75:50),
 keeping only format-specific differences in the subclasses.
   
   3. **Similar duplication in `init_reader()` methods**
   
      Both `IcebergParquetReader::init_reader()` and 
`IcebergOrcReader::init_reader()` contain highly duplicated branching logic for 
`exist_field_id` / `!exist_field_id`. Consider applying the Template Method 
pattern to reduce redundancy.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to