dataroaring commented on PR #53540:
URL: https://github.com/apache/doris/pull/53540#issuecomment-3284902437

   **⚠️ MAJOR: Debug Point Production Risk**
   
   **File**: `be/src/cloud/cloud_internal_service.cpp` lines 260-270
   
   ```cpp
   DBUG_EXECUTE_IF("CloudInternalServiceImpl::warm_up_rowset.download_segment", 
{
       auto sleep_time = dp->param<int>("sleep", 3);
       std::this_thread::sleep_for(std::chrono::seconds(sleep_time));  // ❌ 
PRODUCTION RISK
   });
   ```
   
   **Issue**: This debug point can cause **indefinite thread blocking** in 
production if accidentally enabled.
   
   **Production Risks**:
   - **Thread Pool Exhaustion**: Long sleeps can exhaust the thread pool
   - **Query Timeouts**: Clients will experience unexpected query failures
   - **Cascading Failures**: Blocked threads can cause resource starvation
   - **Accidental Activation**: Debug points can be accidentally enabled in 
production
   - **DoS Vector**: If exposed, could be used to DoS the system
   
   **Specific Concerns**:
   1. **No Maximum Sleep Limit**: `sleep_time` could be set to hours/days
   2. **No Timeout Protection**: Sleep runs unconditionally once activated
   3. **Resource Lock**: Holds thread resources during sleep
   4. **Hard to Debug**: Production issues would be very hard to trace back to 
debug points
   
   **Mitigation Options**:
   
   **Option 1** - Add safety limits:
   ```cpp
   DBUG_EXECUTE_IF("CloudInternalServiceImpl::warm_up_rowset.download_segment", 
{
       auto sleep_time = dp->param<int>("sleep", 3);
       // ✅ Add safety limits
       sleep_time = std::min(sleep_time, 30); // Max 30 seconds
       if (sleep_time > 0) {
           std::this_thread::sleep_for(std::chrono::seconds(sleep_time));
       }
   });
   ```
   
   **Option 2** - Remove from production builds:
   ```cpp
   #ifndef NDEBUG  // Only in debug builds
   DBUG_EXECUTE_IF("CloudInternalServiceImpl::warm_up_rowset.download_segment", 
{
       auto sleep_time = dp->param<int>("sleep", 3);
       sleep_time = std::min(sleep_time, 10); // Even in debug, limit to 10s
       std::this_thread::sleep_for(std::chrono::seconds(sleep_time));
   });
   #endif
   ```
   
   **Option 3** - Use non-blocking alternatives:
   ```cpp
   // Instead of sleeping, just add delay to metrics or logging
   DBUG_EXECUTE_IF("CloudInternalServiceImpl::warm_up_rowset.download_segment", 
{
       LOG(INFO) << "Debug: Simulated delay in rowset warmup";
       // Don't actually block the thread
   });
   ```
   
   **Recommendation**: Either add strict time limits (≤ 30 seconds) or remove 
from production builds entirely. The current implementation poses too much risk 
for production environments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to