dang-stripe opened a new pull request, #14469: URL: https://github.com/apache/pinot/pull/14469
We set the `invalidRecords` threshold too low on one of our tables and noticed the same segments were getting compacted over and over since new upserts were coming in. This adds some logs around segment selection and task execution to make it easier to figure out what the right threshold should be. We've deployed this internally and have these logs on our clusters: ``` [2024-11-15 20:40:00.441000] INFO [UpsertCompactionTaskGenerator] [DefaultQuartzScheduler_Worker-8:164] Segment test_segment1 contains 163 invalid records out of 14549 total records (count threshold: 1, percent threshold: 0.0), adding it to the compaction list ``` ``` [2024-11-15 20:40:59.865849] INFO [UpsertCompactionTaskExecutor] [TaskStateModelFactory-task_thread-33:17] Finished task: UpsertCompactionTask with configs: {uploadURL=http://controller1:9000/segments, crc=1736789201, validDocIdsType=SNAPSHOT, authToken=null, downloadURL=s3://deep_store/test_segment1, segmentName=test_segment1, TASK_ID=Task_UpsertCompactionTask_31961e82-0ccb-445f-8bd9-f1d88c339ac0_1731703200437_24, tableName=test_Table1}. Total time: 25077ms. Total docs before compaction: 26624. Total docs after compaction: 25804. Valid doc IDs count: 25804 ``` cc @Jackie-Jiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org