Hello, I am looking for advice for (1). There is quite a quality archaeology to go through. I have asked around already (thanks for that!) but nobody seems to ultimately remember / tell what it is for.
There is (2). "force" can be true / false. When true, then it will not check if a snapshot exists. If it exists and "force" is true, this has never worked - that is what CASSANDRA-20490 is for. It would try to hard link twice and it would error out. This "force" logic is called here (3). If ParentRepairSession (prs) is global, "force" is false, but if it is not (not global means that some dc was specified), then it is true. Notice that both branches of that if/else use desc.parentSessionId.toString() as snapshots' name. This "force" logic was introduced in (4) but it seems to me that it never worked. AFAICT this was just broken on delivery. One commit before (4), it looked like (5). The first branch of "if" used parentSessionId as a snapshot name and it checked if it exists or not before taking it, the else branch took sessionId (not parent session id) and took a snapshot of that name without any check if it exists or not. >From javadoc on TableRepairManager, "force" means: (introduced in (4) CASSANDRA-14116) "If force is true, a snapshot should be taken even if one already exists with that name.". Questions: 1) do we want to fix the behaviour of "force"? (patch 20490) if the answer is yes, what is the "semantics" about this. What does "force" mean exactly? It can mean a) we will remove existing snapshot and take a new one b) we will add more SSTables onto existing snapshot, we just hardlink more SSTables into same snapshot (that is what 20490 is doing) 2) if we do not want to fix "force", then it is most probably just redundant and we would remove it but it order to not fail, we would need to go back to naming snapshots not parent id / parent id but parent id / session id for global and non-global repair respectively (basically we would return to pre-14116 behavior). I think the introduction of "force" flag was there because there might be multiple calls to "else" branch in RepairMessageVerbHandler and calling it all over again with parentSessionId would be erroneous when already there, so "force" was there for not caring and just taking a snapshot again. If we go back to sessionId only, then we do not have this problem I guess. Is it important for people to keep the name of a snapshot in both cases (global / not global) equal to parentSessionId? Can we just go back? If we want to keep naming snapshots as done now, we might just do 20490, I was trying to fix older branches as well but it is very difficult, we might just fix the trunk and keep the rest broken, not ideal but ... I am just reinventing a wheel in 4.0+, it is just too cumbersome / complex to copy trunk's behaviour in 20490 to older stuff. Thanks and regards (1) https://issues.apache.org/jira/browse/CASSANDRA-20490 (2) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/repair/CassandraTableRepairManager.java#L84 (3) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L185-L192 (4) https://issues.apache.org/jira/browse/CASSANDRA-14116 (5) https://github.com/apache/cassandra/blob/478c1a9fdf027af0f373f9e26521803ae8775fdb/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L106-L121