[ 
https://issues.apache.org/jira/browse/OAK-12134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064540#comment-18064540
 ] 

Rishabh Daim edited comment on OAK-12134 at 3/10/26 4:09 PM:
-------------------------------------------------------------

The workaround mentioned in OAK-11895 is working fine.


||Metric||1.88 without legacy (GC#3)||1.78 GC#2||1.88 with legacy=true GC#2||
|Compaction time|35.4s (6 cycles + force)|
|Data written by compaction|~800 MB|~35 MB|~35 MB|
|Post-compaction size|1.3 GB|541 MB|555 MB|
|Force compact needed|Yes|No|No|

1.88 with -Doak.compaction.legacy=true is now identical to 1.78 in both 
compaction speed and output size. The regression from OAK-11895 is completely 
bypassed.

cc [~jsedding] 


was (Author: JIRAUSER299730):
The workaround mentioned in OAK-11895 is working fine.


  | |Metric| |1.88 without legacy (GC#3)| |1.78 GC#2| |1.88 with legacy=true 
GC#2| |
  |Compaction time|35.4s (6 cycles + force)|||
  |Data written by compaction|~800 MB|~35 MB|~35 MB|
  |Post-compaction size|1.3 GB|541 MB|555 MB|
  |Force compact needed|Yes|No|No|


1.88 with -Doak.compaction.legacy=true is now identical to 1.78 in both 
compaction speed and output size. The regression from OAK-11895 is completely 
bypassed.

> tail compaction produces more data now
> --------------------------------------
>
>                 Key: OAK-12134
>                 URL: https://issues.apache.org/jira/browse/OAK-12134
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>    Affects Versions: 1.88.0
>            Reporter: Rishabh Daim
>            Assignee: Julian Sedding
>            Priority: Major
>         Attachments: 1.78.GC#2.log, 1.78.GC#3.log, 1.88.GC#3.log
>
>
> The cleanup behavior is identical on both branches 1.78 & 1.88 
> Both show 0 bytes in post-compaction cleanup. That was never the real 
> difference.
> The actual regression is in compaction itself:
> ||Metric||1.78 GC#3||1.88 GC#3||
> |Compaction time |2.4s (3 cycles)|35.4s (6 cycles + force)|
> |Data written by compaction|~65 MB (489→554 MB)|~800 MB (511→1300 MB)|
> |Initial checkpoints|~66|~46|
> |Force compact needed|No|Yes|
> 1.88 writes 12x more data during compaction despite having fewer checkpoints. 
> That's the smoking gun.
> Root cause: OAK-11895
> The CheckpointCompactor change (onto vs after) modified what paths get 
> compacted per checkpoint:
>  - 1.78 (old): collectSuperRootPaths returns "root" and "checkpoints/X/root" 
> — only compacts the repository root and each checkpoint's content root
>  - 1.88 (new): returns "" and "checkpoints/X" — compacts the entire 
> super-root and the full checkpoint subtree (including metadata, not just the 
> root)
> More paths traversed per checkpoint → more nodes copied → more segments 
> written → more data produced.
> This has a cascading effect: more data written means compaction takes longer, 
> which means more concurrent commits happen during compaction, which means 
> more retry cycles, which eventually forces a blocking
> compaction.
> Summary
> The problem is not "cleanup doesn't reclaim space" — cleanup works 
> identically on both branches. The problem is that 1.88 TAIL compaction 
> produces ~12x more output data than 1.78 due to OAK-11895, causing the
> store to grow significantly after each GC cycle instead of shrinking. This is 
> worth raising as a regression against OAK-11895 in Apache JIRA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to