dang-stripe opened a new pull request, #14469:
URL: https://github.com/apache/pinot/pull/14469

   We set the `invalidRecords` threshold too low on one of our tables and 
noticed the same segments were getting compacted over and over since new 
upserts were coming in. This adds some logs around segment selection and task 
execution to make it easier to figure out what the right threshold should be.
   
   We've deployed this internally and have these logs on our clusters:
   
   ```
   [2024-11-15 20:40:00.441000] INFO [UpsertCompactionTaskGenerator] 
[DefaultQuartzScheduler_Worker-8:164] Segment test_segment1 contains 163 
invalid records out of 14549 total records (count threshold: 1, percent 
threshold: 0.0), adding it to the compaction list
   ```
   
   ```
   [2024-11-15 20:40:59.865849] INFO [UpsertCompactionTaskExecutor] 
[TaskStateModelFactory-task_thread-33:17] Finished task: UpsertCompactionTask 
with configs: {uploadURL=http://controller1:9000/segments, crc=1736789201, 
validDocIdsType=SNAPSHOT, authToken=null, 
downloadURL=s3://deep_store/test_segment1, segmentName=test_segment1, 
TASK_ID=Task_UpsertCompactionTask_31961e82-0ccb-445f-8bd9-f1d88c339ac0_1731703200437_24,
 tableName=test_Table1}. Total time: 25077ms. Total docs before compaction: 
26624. Total docs after compaction: 25804. Valid doc IDs count: 25804
   ```
   
   cc @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to