freemandealer opened a new pull request, #12866:
URL: https://github.com/apache/doris/pull/12866

   Implement segmentwise compaction during rowset write to reduce the number of 
segments produced by load jobs, otherwise may cause OLAP_ERR_TOO_MANY_SEGMENTS 
(-238).
   
   Signed-off-by: freemandealer <freeman.zhang1...@gmail.com>
   
   # Proposed changes
   
   Issue Number: close #12609
   
   ## Problem summ
   
   ## Intro
   
   The default limit is 200 segment perf rowset. Too many segments may fail the 
whole load process (OLAP_ERR_TOO_MANY_SEGMENTS -238). If we increase the limit, 
the load will succeed but the pressure is transferred to the subsequential 
rowsetwise  compaction. Things get worse when the user issue a query, e.g. 
insert into select stmt, right after load job but before rowsetwise compaction, 
he/she will suffer the performance disaster or maybe end up with OOM.
   
   So we are introducing segmentwise compaction which will compact data DURING 
the write process, instead of waiting for rowsetwise compaction until txn has 
been committed.
   
   
   
   ## Design
   
   ### Tigger
   
   Every time when a rowset writer produces more than N (e.g. 10) segments, we 
trigger segment compaction. Note that only one segment compaction job for a 
single rowset at a time to ensure no recursing/queuing nightmare.
   
   ### Target Selection
   
   We collect segments during every trigger. We skip big segments whose row num 
> M (e.g. 10000) coz we get little benefits from compacting them comparing our 
effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do 
actual compaction.
   
   ### Compaction Process
   
   A new thread pool is introduced to help do the job. We submit the 
above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the 
worker thread does the followings:
   
   - build a MergeIterator from the target segments
   - create a new segment writer
   - for each block readed from MergeIterator, the Writer append it
   
   ### SegID handling
   
   SegID must remain consecutive after segment compaction. 
   
   If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big 
segment seg_4:
   
   - we create a segment named "seg_0-3" to save compacted data for seg_0, 
seg_1, seg_2 and seg_3
   - delete seg_0, seg_1, seg_2 and seg_3
   - rename seg_0-3 to seg_0
   - rename seg_4 to seg_1
   
   It is worth noticing that we should wait inflight segment compaction tasks 
to finish before building rowset meta and committing this txn.
   
   
   
   ## Test results
   
   ### The amount of data can Doris load 
   
   First, we test the data amount that we can successfully load into doris 
disable/enable segment compaction.Tests are based on TPCH. Table is created as 
1 bucket and no parallel. We trigger segment compaction every 10 segments 
produced by rowset writer.
   
   | cases                 | data amount                |
   | --------------------- | -------------------------- |
   | Disable SegCompaciton | 1.12 million rows, 18.67GB |
   | Enable SegCompaction  | 11 million rows, 183GB     |
   
   The result shows that the amount of data we can load to doris improve 10 
times after enabling segment compaction. The ratio is correspond to the 
triggering segment number.
   
   ### Impact on latency
   
   When segment compaction is disabled, a load job will finish in 1260s during 
the test. And the sequential rowsetwise compaction cost 151s.
   
   We give the test results when enabling segment compaction in different 
triggering segment number:
   
   | triggering segment number    | Load Latency | RowsetCompaction Latency |
   | ---------------------------- | ------------ | ------------------------ |
   | 5 (trigger every 5 segments) | 089s (-13%)  | 242s (+60%)              |
   | 10                           | 1053s (-16%) | 166s (+9%)               |
   | 20                           | 960s (-23%)  | 172s (+13%)              |
   | 40                           | 1320s (+4%)  | 169s (+11%)              |
   
   We load without segment compaction for serveral times and each gives us a 
different latency range from (-25%, +25%). So we believe that segment 
compaction has little impact on the latency.
   
   In addition to the above costs, we wait inflight segment compaction tasks to 
finish before building rowset meta and publishing the data. The length of the 
wait time depends on when the build takes the place but there is a theoretical 
range for it and the range is related to the time each segment compaction task 
will cost:
   
   | triggering segment number | Single SegCompaction Task Latency |
   | ------------------------- | --------------------------------- |
   | 5                         | 5s                                |
   | 10                        | 9s                                |
   | 20                        | 20s                               |
   | 40                        | 60s                               |
   
   ### Impact on memory usage
   
   Compaction itself will consume memory. The following test results show the 
memory footprint when enabling segment compaction.
   
   When segment compaction disabled, a load job will use 4.83% of 128GB memory. 
And the sequential rowsetwise compaction takes 8.21%.
   
   When enable segment compaction:
   
   | triggering segment number    | Load Memory Usage       | RowsetCompaction 
Memory Usage |
   | ---------------------------- | ----------------------- | 
----------------------------- |
   | 5 (trigger every 5 segments) | Avg.:6.62%  Peak:7.01%  | 6.9% (-16%)       
            |
   | 10                           | Avg.:7.63%   Peak:9.61% | 6.56% (-20%)      
            |
   | 20                           | Avg.:7.6%     Peak:9.8% | 6.9% (-16%)       
            |
   | 40                           | Avg.:5.09%  Peak:9%     | 6.62% (-19%)      
            |
   
   Segment compaction uses more memory because we add another segment writer to 
write compacted data and multiple segment readers to read source data.
   
   However, since data are more ordered and the number of segment is decreased 
after segment compaction, RowsetCompaction uses less memory.
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [X] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [X] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [X] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [X] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [X] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to