hgromer opened a new pull request, #7354:
URL: https://github.com/apache/hbase/pull/7354

   We started seeing errors when doing incremental backups
   
   ```
   2025-09-29 19:31:19.527  [main] INFO  org.apache.hadoop.mapreduce.Job - Task 
Id : attempt_1759167518321_0143_m_000000_0, Status : FAILED
   Error: java.lang.IllegalArgumentException: Can't read partitions file
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:117)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:715)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
   Caused by: java.io.IOException: Wrong number of partitions in keyset
        at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:91)
        ... 10 more
   ```
   
   The root cause is that the SnapshotRegionLocator which is used as the 
RegionLocator for modern backups can return dupe start keys in the case of 
region splits. 
   
   `HFileOutputFormat2` will do two things 
   
   1. Configure the number of reducers based on the # of start keys that we get 
from _all_ region locations
   2. De-dupe the start keys and write the partitions based on the de-duped set 
   
   When this happens, the TotalOrderPartitioner fails because it's expecting 
the same number of reducers are partitions. 
   
   We should filter out regions that are either offline, or have been split, 
which is the same thing that the 
[MetaTableAccessor](https://github.com/HubSpot/hbase/blob/70f6120227f9050c8b3cb7c6bb33a768264cf5c4/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java#L1261C13-L1261C19)
 does. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to