Copilot commented on code in PR #8270:
URL: https://github.com/apache/hbase/pull/8270#discussion_r3379455314
##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java:
##########
@@ -1018,7 +1022,15 @@ static String
serializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor, St
BloomType bloomType = familyDescriptor.getBloomFilterType();
String bloomParam = "";
if (bloomType == BloomType.ROWPREFIX_FIXED_LENGTH) {
- bloomParam =
familyDescriptor.getConfigurationValue(BloomFilterUtil.PREFIX_LENGTH_KEY);
+ String prefixLength =
+
familyDescriptor.getConfigurationValue(BloomFilterUtil.PREFIX_LENGTH_KEY);
+ String startOffset =
+
familyDescriptor.getConfigurationValue(BloomFilterUtil.PREFIX_START_OFFSET_KEY);
+ if (startOffset != null && Integer.parseInt(startOffset) > 0) {
+ bloomParam = prefixLength + ":" + startOffset;
+ } else {
+ bloomParam = prefixLength;
+ }
Review Comment:
`Integer.parseInt(startOffset)` can throw `NumberFormatException` during job
configuration, which makes failures harder to diagnose and prevents the later,
more descriptive validation in `BloomFilterUtil` from running. Consider
avoiding parsing here and only omitting the offset when it is exactly "0"/empty.
##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java:
##########
@@ -423,8 +423,12 @@ private WriterLength getNewWriter(byte[] tableName, byte[]
family, Configuration
BloomType bloomType = bloomTypeMap.get(tableAndFamily);
bloomType = bloomType == null ? BloomType.NONE : bloomType;
String bloomParam = bloomParamMap.get(tableAndFamily);
- if (bloomType == BloomType.ROWPREFIX_FIXED_LENGTH) {
- conf.set(BloomFilterUtil.PREFIX_LENGTH_KEY, bloomParam);
+ if (bloomType == BloomType.ROWPREFIX_FIXED_LENGTH && bloomParam !=
null) {
+ String[] parts = bloomParam.split(":");
+ conf.set(BloomFilterUtil.PREFIX_LENGTH_KEY, parts[0]);
+ if (parts.length > 1) {
+ conf.set(BloomFilterUtil.PREFIX_START_OFFSET_KEY, parts[1]);
+ }
}
Review Comment:
In incremental-load MR jobs, `conf` is shared across column families. If one
family sets `RowPrefixBloomFilter.prefix_start_offset` and a later
`ROWPREFIX_FIXED_LENGTH` family only specifies a prefix length (no ":offset"),
the previous offset will remain in `conf` and be incorrectly applied to
subsequent writers.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterUtil.java:
##########
@@ -264,17 +265,25 @@ public static byte[] getBloomFilterParam(BloomType
bloomFilterType, Configuratio
int prefixLength;
try {
prefixLength = Integer.parseInt(prefixLengthString);
- if (prefixLength <= 0 || prefixLength > HConstants.MAX_ROW_LENGTH) {
- message +=
- "the value of " + PREFIX_LENGTH_KEY + " must >=0 and < " +
HConstants.MAX_ROW_LENGTH;
+ if (prefixLength <= 0) {
+ message += "the value of " + PREFIX_LENGTH_KEY + " must be > 0";
throw new IllegalArgumentException(message);
}
} catch (NumberFormatException nfe) {
message = "Number format exception when parsing " + PREFIX_LENGTH_KEY
+ " for BloomType "
+ bloomFilterType.toString() + ":" + prefixLengthString;
throw new IllegalArgumentException(message, nfe);
}
- bloomParam = Bytes.toBytes(prefixLength);
+ int startOffset = conf.getInt(PREFIX_START_OFFSET_KEY, 0);
+ if (startOffset < 0) {
+ throw new IllegalArgumentException(
+ message + "the value of " + PREFIX_START_OFFSET_KEY + " must be >=
0");
+ }
Review Comment:
`Configuration#getInt` will throw `NumberFormatException` if
`RowPrefixBloomFilter.prefix_start_offset` is set to a non-integer value. This
bypasses the existing error-message formatting used for `prefix_length`. Parse
the start offset explicitly and throw an `IllegalArgumentException` with a
clear message (consistent with the prefix length handling).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]