DieterDP-ng commented on code in PR #7523:
URL: https://github.com/apache/hbase/pull/7523#discussion_r2599777144


##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupBoundaries.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.backup.util;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.backup.master.BackupLogCleaner;
+import org.apache.hadoop.hbase.net.Address;
+import org.apache.hadoop.hbase.wal.AbstractFSWALProvider;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
[email protected]
+public class BackupBoundaries {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BackupBoundaries.class);
+  private static final BackupBoundaries EMPTY_BOUNDARIES =
+    new BackupBoundaries(new HashMap<>(), Long.MAX_VALUE);
+
+  // This map tracks, for every RegionServer, the least recent (= oldest / 
lowest timestamp)
+  // inclusion in any backup. In other words, it is the timestamp boundary up 
to which all backup
+  // roots have included the WAL in their backup.
+  private final Map<Address, Long> boundaries;
+  private final long oldestStartCode;

Review Comment:
   This field deserves a description similar to above, since it's not just the 
oldest start code, it's the oldest start code of the most recent completed 
backup of each backup root.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupBoundaries.java:
##########


Review Comment:
   I'm not opposed to splitting this logic in a separate class, but then it 
needs some more documentation to make clear it is meant for the backup log 
cleaner. Some suggestions:
   
   - class javadoc
   - `isDeletable(Path path)` -> `isDeletable(Path walLogPath)`
   - `BackupBoundariesBuilder#add` -> 
`BackupBoundariesBuilder#addBackupTimestamps`
   
   



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupBoundaries.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.backup.util;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.backup.master.BackupLogCleaner;
+import org.apache.hadoop.hbase.net.Address;
+import org.apache.hadoop.hbase.wal.AbstractFSWALProvider;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
[email protected]
+public class BackupBoundaries {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BackupBoundaries.class);
+  private static final BackupBoundaries EMPTY_BOUNDARIES =
+    new BackupBoundaries(new HashMap<>(), Long.MAX_VALUE);

Review Comment:
   You can reuse an existing empty collection object here: 
`Collections.emptyMap()`.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupBoundaries.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.backup.util;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.backup.master.BackupLogCleaner;
+import org.apache.hadoop.hbase.net.Address;
+import org.apache.hadoop.hbase.wal.AbstractFSWALProvider;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
[email protected]
+public class BackupBoundaries {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BackupBoundaries.class);
+  private static final BackupBoundaries EMPTY_BOUNDARIES =
+    new BackupBoundaries(new HashMap<>(), Long.MAX_VALUE);
+
+  // This map tracks, for every RegionServer, the least recent (= oldest / 
lowest timestamp)
+  // inclusion in any backup. In other words, it is the timestamp boundary up 
to which all backup
+  // roots have included the WAL in their backup.
+  private final Map<Address, Long> boundaries;
+  private final long oldestStartCode;
+
+  private BackupBoundaries(Map<Address, Long> boundaries, long 
oldestStartCode) {
+    this.boundaries = boundaries;
+    this.oldestStartCode = oldestStartCode;
+  }
+
+  public boolean isDeletable(Path path) {
+    try {
+      String hostname = BackupUtils.parseHostNameFromLogFile(path);

Review Comment:
   The original code had a null check here. In this case, the NullPointer would 
be caught, but it's probably cleaner to actually check for nullness here 
already.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupBoundaries.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.backup.util;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.backup.master.BackupLogCleaner;
+import org.apache.hadoop.hbase.net.Address;
+import org.apache.hadoop.hbase.wal.AbstractFSWALProvider;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
[email protected]
+public class BackupBoundaries {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BackupBoundaries.class);
+  private static final BackupBoundaries EMPTY_BOUNDARIES =
+    new BackupBoundaries(new HashMap<>(), Long.MAX_VALUE);
+
+  // This map tracks, for every RegionServer, the least recent (= oldest / 
lowest timestamp)
+  // inclusion in any backup. In other words, it is the timestamp boundary up 
to which all backup
+  // roots have included the WAL in their backup.
+  private final Map<Address, Long> boundaries;
+  private final long oldestStartCode;
+
+  private BackupBoundaries(Map<Address, Long> boundaries, long 
oldestStartCode) {
+    this.boundaries = boundaries;
+    this.oldestStartCode = oldestStartCode;
+  }
+
+  public boolean isDeletable(Path path) {
+    try {
+      String hostname = BackupUtils.parseHostNameFromLogFile(path);
+      Address address = Address.fromString(hostname);
+      long ts = AbstractFSWALProvider.getTimestamp(path.getName());
+
+      if (!boundaries.containsKey(address)) {
+        boolean isDeletable = ts <= oldestStartCode;
+        if (LOG.isDebugEnabled()) {
+          LOG.debug(
+            "Boundary for {} not found. isDeletable = {} based on 
oldestStartCode = {} and WAL ts of {}",
+            path, isDeletable, oldestStartCode, ts);
+        }
+        return isDeletable;
+      }
+
+      long rollTs = boundaries.get(address);
+      if (ts <= rollTs) {
+        if (LOG.isDebugEnabled()) {
+          LOG.debug(
+            "WAL cleanup time-boundary found for server {}: {}. Ok to delete 
older file: {}",
+            address.getHostName(), ts, path);
+        }
+        return true;
+      }
+
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("WAL cleanup time-boundary found for server {}: {}. Keeping 
younger file: {}",
+          address.getHostName(), rollTs, path);
+      }
+
+      return false;
+    } catch (Exception e) {
+      LOG.warn("Error occurred while filtering file: {}. Ignoring cleanup of 
this log", path, e);
+      return false;
+    }
+  }
+
+  public Map<Address, Long> getBoundaries() {
+    return boundaries;
+  }
+
+  public long getOldestStartCode() {
+    return oldestStartCode;
+  }
+
+  public static BackupBoundariesBuilder builder(Configuration conf) {
+    return new BackupBoundariesBuilder(conf);
+  }
+
+  public static class BackupBoundariesBuilder {
+    private final Map<Address, Long> boundaries = new HashMap<>();
+    private final Configuration conf;
+
+    private long oldestStartCode = 0L;
+
+    private BackupBoundariesBuilder(Configuration conf) {
+      this.conf = conf;
+    }
+
+    public BackupBoundariesBuilder add(String host, long hostLogRollTs, long 
backupStartCode)
+      throws IOException {
+      Address address = Address.fromString(host);
+      Long storedTs = boundaries.get(address);
+      if (storedTs == null || hostLogRollTs < storedTs) {
+        boundaries.put(address, hostLogRollTs);
+      }
+
+      if (oldestStartCode == 0L || oldestStartCode > backupStartCode) {
+        oldestStartCode = backupStartCode;
+      }
+
+      return this;
+    }
+
+    public BackupBoundaries build() {
+      if (boundaries.isEmpty()) {
+        return EMPTY_BOUNDARIES;
+      }
+
+      long buffer =
+        conf.getLong(BackupLogCleaner.TS_BUFFER_KEY, 
BackupLogCleaner.TS_BUFFER_DEFAULT);

Review Comment:
   I'd move this out of the builder, and simply take is as a constructor 
argument. That way, you keep the usage and definition of that config key in the 
BackupLogCleaner.



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupBoundaries.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.backup.util;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.backup.master.BackupLogCleaner;
+import org.apache.hadoop.hbase.net.Address;
+import org.apache.hadoop.hbase.wal.AbstractFSWALProvider;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
[email protected]
+public class BackupBoundaries {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BackupBoundaries.class);
+  private static final BackupBoundaries EMPTY_BOUNDARIES =
+    new BackupBoundaries(new HashMap<>(), Long.MAX_VALUE);
+
+  // This map tracks, for every RegionServer, the least recent (= oldest / 
lowest timestamp)
+  // inclusion in any backup. In other words, it is the timestamp boundary up 
to which all backup
+  // roots have included the WAL in their backup.
+  private final Map<Address, Long> boundaries;
+  private final long oldestStartCode;
+
+  private BackupBoundaries(Map<Address, Long> boundaries, long 
oldestStartCode) {
+    this.boundaries = boundaries;
+    this.oldestStartCode = oldestStartCode;
+  }
+
+  public boolean isDeletable(Path path) {
+    try {
+      String hostname = BackupUtils.parseHostNameFromLogFile(path);
+      Address address = Address.fromString(hostname);
+      long ts = AbstractFSWALProvider.getTimestamp(path.getName());
+
+      if (!boundaries.containsKey(address)) {
+        boolean isDeletable = ts <= oldestStartCode;
+        if (LOG.isDebugEnabled()) {
+          LOG.debug(
+            "Boundary for {} not found. isDeletable = {} based on 
oldestStartCode = {} and WAL ts of {}",
+            path, isDeletable, oldestStartCode, ts);
+        }
+        return isDeletable;
+      }
+
+      long rollTs = boundaries.get(address);
+      if (ts <= rollTs) {
+        if (LOG.isDebugEnabled()) {
+          LOG.debug(
+            "WAL cleanup time-boundary found for server {}: {}. Ok to delete 
older file: {}",
+            address.getHostName(), ts, path);
+        }
+        return true;
+      }
+
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("WAL cleanup time-boundary found for server {}: {}. Keeping 
younger file: {}",
+          address.getHostName(), rollTs, path);
+      }
+
+      return false;
+    } catch (Exception e) {
+      LOG.warn("Error occurred while filtering file: {}. Ignoring cleanup of 
this log", path, e);
+      return false;
+    }
+  }
+
+  public Map<Address, Long> getBoundaries() {
+    return boundaries;
+  }
+
+  public long getOldestStartCode() {
+    return oldestStartCode;
+  }
+
+  public static BackupBoundariesBuilder builder(Configuration conf) {
+    return new BackupBoundariesBuilder(conf);
+  }
+
+  public static class BackupBoundariesBuilder {
+    private final Map<Address, Long> boundaries = new HashMap<>();
+    private final Configuration conf;
+
+    private long oldestStartCode = 0L;

Review Comment:
   If you change this to `Long.MAX_VALUE`, you can simplify the `add` method a 
bit without any other changes to the result. Since if `add` is never called, 
you return `EMPTY_BOUNDARIES`



##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BackupBoundaries.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.backup.util;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.backup.master.BackupLogCleaner;
+import org.apache.hadoop.hbase.net.Address;
+import org.apache.hadoop.hbase.wal.AbstractFSWALProvider;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
[email protected]
+public class BackupBoundaries {
+  private static final Logger LOG = 
LoggerFactory.getLogger(BackupBoundaries.class);
+  private static final BackupBoundaries EMPTY_BOUNDARIES =
+    new BackupBoundaries(new HashMap<>(), Long.MAX_VALUE);
+
+  // This map tracks, for every RegionServer, the least recent (= oldest / 
lowest timestamp)
+  // inclusion in any backup. In other words, it is the timestamp boundary up 
to which all backup
+  // roots have included the WAL in their backup.
+  private final Map<Address, Long> boundaries;
+  private final long oldestStartCode;
+
+  private BackupBoundaries(Map<Address, Long> boundaries, long 
oldestStartCode) {
+    this.boundaries = boundaries;
+    this.oldestStartCode = oldestStartCode;
+  }
+
+  public boolean isDeletable(Path path) {
+    try {
+      String hostname = BackupUtils.parseHostNameFromLogFile(path);
+      Address address = Address.fromString(hostname);
+      long ts = AbstractFSWALProvider.getTimestamp(path.getName());
+
+      if (!boundaries.containsKey(address)) {
+        boolean isDeletable = ts <= oldestStartCode;
+        if (LOG.isDebugEnabled()) {
+          LOG.debug(
+            "Boundary for {} not found. isDeletable = {} based on 
oldestStartCode = {} and WAL ts of {}",
+            path, isDeletable, oldestStartCode, ts);
+        }
+        return isDeletable;
+      }
+
+      long rollTs = boundaries.get(address);
+      if (ts <= rollTs) {

Review Comment:
   Both names are ambiguous, suggestion: `pathTs`, `backupTs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to