dataroaring commented on PR #59646:
URL: https://github.com/apache/doris/pull/59646#issuecomment-3774165268
# Code Review: [fix](filecache) add check and exception handle for empty
block file
## Summary
This PR adds defensive handling for cases where a file cache block has zero
downloaded bytes but attempts to transition to `DOWNLOADED` state. This could
happen when a download fails silently or is interrupted.
## Analysis
### Changes in `file_block.cpp`
**1. `set_downloaded()` - Early exit for empty blocks:**
```cpp
if (_downloaded_size == 0) {
_download_state = State::EMPTY;
_downloader_id = 0;
return Status::InternalError("Try to set empty block {} as downloaded",
_block_range.to_string());
}
```
**2. `finalize()` - Handle zero-byte finalization:**
```cpp
if (_downloaded_size == 0) {
std::lock_guard block_lock(_mutex);
_download_state = State::EMPTY;
_downloader_id = 0;
_cv.notify_all(); // Wake up waiters
return Status::InternalError("Try to finalize an empty file block {}",
_block_range.to_string());
}
```
### Positive Aspects
1. **Proper State Transition**: Sets state to `EMPTY` and clears
`_downloader_id`, allowing the block to be re-acquired
2. **Notifies Waiters**: Calls `_cv.notify_all()` so other threads waiting
for this block don't deadlock
3. **Good Test Coverage**: Includes 3 test cases covering:
- `finalize_empty_block` - finalize without any data
- `finalize_partial_block` - finalize with partial data (verifies
shrinking works)
- `set_downloaded_empty_block_branch` - direct test of the early exit
### Potential Issue
**In `finalize()`, the condition check order changed:**
Before:
```cpp
if (_downloaded_size != 0 && _downloaded_size != _block_range.size()) {
```
After:
```cpp
if (_downloaded_size == 0) { /* error */ }
if (_downloaded_size != _block_range.size()) { /* shrink */ }
```
This is semantically equivalent but now the shrinking logic will also run
when `_downloaded_size == _block_range.size()` (though it's a no-op in that
case). This is fine but slightly less efficient.
### Question
The `EMPTY` state is used for both:
1. Initial state (block not yet downloaded)
2. Error state (download failed with zero bytes)
Is there value in distinguishing these with a separate `FAILED` state? This
would allow cleaner error handling and potentially different retry policies.
## Verdict
**LGTM** - Good defensive fix with proper test coverage. The changes
correctly handle the edge case of empty block finalization while maintaining
proper synchronization semantics.
**Approved** for merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]