Copilot commented on code in PR #8270:
URL: https://github.com/apache/hbase/pull/8270#discussion_r3379455314


##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java:
##########
@@ -1018,7 +1022,15 @@ static String 
serializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor, St
     BloomType bloomType = familyDescriptor.getBloomFilterType();
     String bloomParam = "";
     if (bloomType == BloomType.ROWPREFIX_FIXED_LENGTH) {
-      bloomParam = 
familyDescriptor.getConfigurationValue(BloomFilterUtil.PREFIX_LENGTH_KEY);
+      String prefixLength =
+        
familyDescriptor.getConfigurationValue(BloomFilterUtil.PREFIX_LENGTH_KEY);
+      String startOffset =
+        
familyDescriptor.getConfigurationValue(BloomFilterUtil.PREFIX_START_OFFSET_KEY);
+      if (startOffset != null && Integer.parseInt(startOffset) > 0) {
+        bloomParam = prefixLength + ":" + startOffset;
+      } else {
+        bloomParam = prefixLength;
+      }

Review Comment:
   `Integer.parseInt(startOffset)` can throw `NumberFormatException` during job 
configuration, which makes failures harder to diagnose and prevents the later, 
more descriptive validation in `BloomFilterUtil` from running. Consider 
avoiding parsing here and only omitting the offset when it is exactly "0"/empty.



##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java:
##########
@@ -423,8 +423,12 @@ private WriterLength getNewWriter(byte[] tableName, byte[] 
family, Configuration
         BloomType bloomType = bloomTypeMap.get(tableAndFamily);
         bloomType = bloomType == null ? BloomType.NONE : bloomType;
         String bloomParam = bloomParamMap.get(tableAndFamily);
-        if (bloomType == BloomType.ROWPREFIX_FIXED_LENGTH) {
-          conf.set(BloomFilterUtil.PREFIX_LENGTH_KEY, bloomParam);
+        if (bloomType == BloomType.ROWPREFIX_FIXED_LENGTH && bloomParam != 
null) {
+          String[] parts = bloomParam.split(":");
+          conf.set(BloomFilterUtil.PREFIX_LENGTH_KEY, parts[0]);
+          if (parts.length > 1) {
+            conf.set(BloomFilterUtil.PREFIX_START_OFFSET_KEY, parts[1]);
+          }
         }

Review Comment:
   In incremental-load MR jobs, `conf` is shared across column families. If one 
family sets `RowPrefixBloomFilter.prefix_start_offset` and a later 
`ROWPREFIX_FIXED_LENGTH` family only specifies a prefix length (no ":offset"), 
the previous offset will remain in `conf` and be incorrectly applied to 
subsequent writers.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterUtil.java:
##########
@@ -264,17 +265,25 @@ public static byte[] getBloomFilterParam(BloomType 
bloomFilterType, Configuratio
       int prefixLength;
       try {
         prefixLength = Integer.parseInt(prefixLengthString);
-        if (prefixLength <= 0 || prefixLength > HConstants.MAX_ROW_LENGTH) {
-          message +=
-            "the value of " + PREFIX_LENGTH_KEY + " must >=0 and < " + 
HConstants.MAX_ROW_LENGTH;
+        if (prefixLength <= 0) {
+          message += "the value of " + PREFIX_LENGTH_KEY + " must be > 0";
           throw new IllegalArgumentException(message);
         }
       } catch (NumberFormatException nfe) {
         message = "Number format exception when parsing " + PREFIX_LENGTH_KEY 
+ " for BloomType "
           + bloomFilterType.toString() + ":" + prefixLengthString;
         throw new IllegalArgumentException(message, nfe);
       }
-      bloomParam = Bytes.toBytes(prefixLength);
+      int startOffset = conf.getInt(PREFIX_START_OFFSET_KEY, 0);
+      if (startOffset < 0) {
+        throw new IllegalArgumentException(
+          message + "the value of " + PREFIX_START_OFFSET_KEY + " must be >= 
0");
+      }

Review Comment:
   `Configuration#getInt` will throw `NumberFormatException` if 
`RowPrefixBloomFilter.prefix_start_offset` is set to a non-integer value. This 
bypasses the existing error-message formatting used for `prefix_length`. Parse 
the start offset explicitly and throw an `IllegalArgumentException` with a 
clear message (consistent with the prefix length handling).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to