[GitHub] [pinot] jackjlli opened a new issue #8306: Upgrade Helix to 1.0+ in Pinot

GitBox Mon, 07 Mar 2022 10:14:45 -0800


jackjlli opened a new issue #8306:
URL: https://github.com/apache/pinot/issues/8306

We’re planning to upgrade the Apache Helix version from 0.9.8 to 1.0+ in
Pinot repo. This will not only address the issues we’ve seen in the current
0.9.8 release (which have been addressed in Helix 1.0+ and the Helix project
will not plan on back-porting to 0.9.x), but also provide the opportunities to
build more Pinot features with the new features in Helix 1.0+. An [initial
attempt](https://github.com/apache/pinot/pull/7500) at moving to 1.0 has also
detected some test failures in Pinot, so this time we’ve sought help from the
Helix team.

### What we get from Helix 1.0+
Here are a few items that can be addressed after bumping up the Helix
version to 1.0+.
**ZNRecord serialization issue**
In the serialize() method of
[ZNRecordSerializer](https://github.com/apache/pinot/pull/7500), a very
expensive Jackson “ObjectMapper” is constructed every time, and this is already
addressed in Helix 1.0+.
Burst ZK write issue during brokers startup in large cluster
In a large Pinot cluster where there are thousands of Pinot tables, broker
restart generates a burst of ZK access to the current state causes, and the
Helix controller takes longer (20 mins) to calculate the ideal state. The
algorithm of calculating the ideal state is improved in the later Helix 1.0+
release.

**Lack of pagination support**
Because of the lack of pagination support in Helix 0.9.8, a huge amount of
ZNodes needs to be read from ZK to Pinot during the startup, which will cause a
huge burst of ZK read and write access, especially in the huge cluster which
maintains thousands of Pinot tables. This pain point can be addressed by the
[Zk Client API
pagination](https://github.com/apache/helix/wiki/Helix-ZkClient-API-to-Support-Getting-a-Large-Number-of-Children-Using-Pagination)
feature in the Helix 1.0+ version. This feature is needed to support Pinot
clusters with a large number of tables and segments.

**State transition task prioritization**
Currently Helix tasks are picked up by the participant based on the inQueue
time. While there is some scenario that the tasks which are queued later need
to be picked up first (due to some constraints like disk usage, etc). In Pinot
the custom Helix state model called "SegmentOnlineOfflineStateModel" is used
for segment level state transition. The state transition "OFFLINE->ONLINE"
downloads a new segment to the local disk, and the one "OFFLINE->DROPPED"
deletes the segment from the local disk. While we notice that the state
transition "OFFLINE->ONLINE" always comes before "OFFLINE->DROPPED", which
makes the pinot server busy downloading new segments and then fills up the full
disk. This [issue](https://github.com/apache/helix/issues/1889) will be
addressed only in the Helix 1.0+ version.

### What opportunities that Helix 1.0+ provides
Other than that, there are several other features in Helix 1.0+ that can be
the building blocks for future Pinot features.

**Leverage weight based rebalancer**
The new [weight based
rebalancing](https://github.com/apache/helix/wiki/Weight-aware-Globally-even-distribute-Rebalancer)
algorithms can be added to Pinot to support features like weight-based segment
assignment and weight-based routing assignment.
_Weight-based segment assignment_

Right now, Pinot considers all segments as having the same weight. This may
not always hold true once we land onto Helix 1.0+. With this new algorithm, new
Pinot segments could have the opportunity to be assigned to the hardware with
more resources like higher ram, larger SSD, etc, as the newer segments might be
queried more frequently than the older ones. Whereas the older segment can be
moved to the cheaper hardware, in order to reduce cost.
_Weight-based broker routing assignment_

Currently all the brokers with the same tag will be regarded as the same.
All the queries with totally different query patterns would be routed to the
same host. With this new algorithm, brokers with different resources can have
the ability to handle different kinds of query patterns.

**Leverage
[FederatedZkClient](https://github.com/apache/helix/wiki/FederatedZkClient)**
The FederatedZkClient feature has the ability to maintain multiple ZK
connections to different ZK realms, which provides the ability for Pinot to
consider splitting the large Pinot cluster into multiple ones.

Helix 1.0+ has been in use in production at LinkedIn for years, and it’s
considered to be stable by the Helix team. At LinkedIn there are a wide variety
of Pinot use cases that cover all the scenarios interacting with Helix.

### Approach
We’re going to follow the steps below in order to make the upgrade smoothly.
Any step with a higher number cannot proceed if any of the steps with lower
numbers are blocked.

- Step 1: Create a branch and change pinot source code to be on Helix-1.x in
the branch

- Step 2: Deploy Pinot with Helix 1.x (from the branch) on LinkedIn staging
and production environments and validate (this step may take a few weeks
depending on the problems encountered)

- Step 3: Merge the open source branch back to the master branch

We’ll also update the status to this issue on the completeness of each of
the steps.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] jackjlli opened a new issue #8306: Upgrade Helix to 1.0+ in Pinot

Reply via email to