[ 
https://issues.apache.org/jira/browse/HBASE-28513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885868#comment-17885868
 ] 

Ray Mattingly commented on HBASE-28513:
---------------------------------------

In the issue above I describe the idea of "balancer conditionals":
{quote}For example, maybe we could have two types of balancer considerations: 
costs (as we do now), and conditionals (for the more discrete considerations, 
like ">1 replica of the same region should not exist on a single host"). This 
would allow us to prioritize replica distribution _and_ maintain consideration 
for things like storefile balance.
{quote}
I think it would be nice if these balancer conditionals are extensible. For 
example, my day job is exploring the idea of prefixing our otherwise randomly 
distributed rowkeys so that RegionServer outages will only affect 1/{_}n{_} 
customers, where _n_ is the prefix cardinality. This idea most easily works if 
we're able to give the balancer some instruction regarding which regions should 
not share a server (e.g., if the first n bytes of their rowkey are different, 
then they should be on different servers (cluster size allowing)).

> Secondary replica balancing squashes all other cost considerations
> ------------------------------------------------------------------
>
>                 Key: HBASE-28513
>                 URL: https://issues.apache.org/jira/browse/HBASE-28513
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ray Mattingly
>            Priority: Major
>
> I have a larger write up available 
> [here|https://gist.github.com/rmdmattingly/9db0885f9977234e56325beff566e749].
> Basically there are a few cost functions with relatively huge default 
> multipliers. For example `PrimaryRegionCountSkewCostFunction` has a default 
> multiplier of 100,000. Meanwhile things like StoreFileCostFunction have a 
> multiplier of 5. Having any multiplier of 100k, while others are single 
> digit, basically makes the latter category totally irrelevant from balancer 
> considerations.
> I understand that it's critical to distribute a region's replicas across 
> multiple hosts/racks, but I don't think we should do this at the expense of 
> all other balancer considerations.
> For example, maybe we could have two types of balancer considerations: costs 
> (as we do now), and conditionals (for the more discrete considerations, like 
> ">1 replica of the same region should not exist on a single host"). This 
> would allow us to prioritize replica distribution _and_ maintain 
> consideration for things like storefile balance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to