dataroaring commented on PR #59646:
URL: https://github.com/apache/doris/pull/59646#issuecomment-3774165268

   # Code Review: [fix](filecache) add check and exception handle for empty 
block file
   
   ## Summary
   
   This PR adds defensive handling for cases where a file cache block has zero 
downloaded bytes but attempts to transition to `DOWNLOADED` state. This could 
happen when a download fails silently or is interrupted.
   
   ## Analysis
   
   ### Changes in `file_block.cpp`
   
   **1. `set_downloaded()` - Early exit for empty blocks:**
   ```cpp
   if (_downloaded_size == 0) {
       _download_state = State::EMPTY;
       _downloader_id = 0;
       return Status::InternalError("Try to set empty block {} as downloaded",
                                    _block_range.to_string());
   }
   ```
   
   **2. `finalize()` - Handle zero-byte finalization:**
   ```cpp
   if (_downloaded_size == 0) {
       std::lock_guard block_lock(_mutex);
       _download_state = State::EMPTY;
       _downloader_id = 0;
       _cv.notify_all();  // Wake up waiters
       return Status::InternalError("Try to finalize an empty file block {}",
                                    _block_range.to_string());
   }
   ```
   
   ### Positive Aspects
   
   1. **Proper State Transition**: Sets state to `EMPTY` and clears 
`_downloader_id`, allowing the block to be re-acquired
   2. **Notifies Waiters**: Calls `_cv.notify_all()` so other threads waiting 
for this block don't deadlock
   3. **Good Test Coverage**: Includes 3 test cases covering:
      - `finalize_empty_block` - finalize without any data
      - `finalize_partial_block` - finalize with partial data (verifies 
shrinking works)
      - `set_downloaded_empty_block_branch` - direct test of the early exit
   
   ### Potential Issue
   
   **In `finalize()`, the condition check order changed:**
   
   Before:
   ```cpp
   if (_downloaded_size != 0 && _downloaded_size != _block_range.size()) {
   ```
   
   After:
   ```cpp
   if (_downloaded_size == 0) { /* error */ }
   if (_downloaded_size != _block_range.size()) { /* shrink */ }
   ```
   
   This is semantically equivalent but now the shrinking logic will also run 
when `_downloaded_size == _block_range.size()` (though it's a no-op in that 
case). This is fine but slightly less efficient.
   
   ### Question
   
   The `EMPTY` state is used for both:
   1. Initial state (block not yet downloaded)
   2. Error state (download failed with zero bytes)
   
   Is there value in distinguishing these with a separate `FAILED` state? This 
would allow cleaner error handling and potentially different retry policies.
   
   ## Verdict
   
   **LGTM** - Good defensive fix with proper test coverage. The changes 
correctly handle the edge case of empty block finalization while maintaining 
proper synchronization semantics.
   
   **Approved** for merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to