[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631714#comment-13631714
 ] 

Jiannan Wang commented on BOOKKEEPER-606:
-----------------------------------------

I think we can start this work with following two problems:
   * In current Hedwig, all the subscriptions' metadata are loaded into memory 
(by scanning metadata storage) when hub aquires a topic. This is somewhat heavy 
since not all subscribers are online at that time. And from my understanding, 
this is also the major obstacle to scale subscription.
   * How to handle it if there are really huge subscribers online (on a same 
topic) at the time?

*Problem 1. Load All Subscriptions into Memory when Acquiring Topic.*

Actually, this behavior is mean to obtain the minimus comsume pointer for all 
subscribers in same topic, which enables Hub to GC outdated ledgers. Under this 
strong requirement, it might very tough to achieve a good scalability. Instead, 
other retention policies can be provided, such as time-based, max messages 
bound (current Hedwig's message bound is still tight with the code of getting 
minimus comsume pointer), which works much better in practice.

With new policy, we don't need to load all subscriptions into memory now, and 
we can then focus on scaling subscriptions.

*Problem 2. Scale Subscriptions.*

For this problem, if Hub takes responsibility for subscription metadata 
maintenance (change the consume point for subscriber), it would very hard to do 
scaling. Another idea is move subscription metadata maintenance from Hub to 
client, then Hub only deal with messages persistence and delivery.

And after eliminating subscription metadata from Hub, it seems Hub can serve 
for huge subscriptions theoretically. However, messages only come out from the 
topic owner which degrades delivery throughput.

We can then consider separating the "read only ledger" and "last in-writing 
ledger" in each topic from Hub:
   * "read only ledger" can be read by many clients, many Hubs can serve 
reading for these ledgers. Or we can even let hedwig-client read these ledgers 
directly.
   * While it still a problem for "last in-writing ledger" because ledger can 
have only one writer and only the writer can see all messages. A possible walk 
around is creating some special subscribers in other Hubs and let them 
subscribe the topic to act as message replication servers to enlarge the 
throughput.

Anyway, there are several things need to be changed if we want to scale 
subscriptions.
                
> Scale Subscriptions
> -------------------
>
>                 Key: BOOKKEEPER-606
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-606
>             Project: Bookkeeper
>          Issue Type: New Feature
>          Components: hedwig-client, hedwig-server
>    Affects Versions: 4.2.0, 4.2.1
>            Reporter: Jiannan Wang
>
> At present, Hedwig is able to scale topic but not subscription, so one topic 
> can only serve a few subscribers. However, there are many user cases with 
> huge subscriptions in reality: lots of users are interested in same things, 
> such as specific sport game, famous people's activity (update on 
> twitter/facebook), etc.
> And if Hedwig user plans to scale subscriptions, the only way I know is 
> transform subscription into topic: for each subscription, create a new topic 
> "topic#subId" then each topic consists of only one subscriber. It do resolve 
> the scalability issue but it is not an ideal solution:
>    * Topic amount grows a lot, which increases the metadata usage and demands 
> more Hub servers.
>    * Each message is replicated for each subscriber. In other words, assume 
> there are S subscribers with M messages on the topic, then the actual message 
> number in the system is S x M!
> This JIRA aims to find a better solution for Hedwig to afford subscription 
> scalability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to