[ 
https://issues.apache.org/jira/browse/MRESOLVER-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Cservenak updated MRESOLVER-404:
--------------------------------------
    Summary: New strategy may be needed for Hazelcast named locks  (was: New 
strategy for Hazelcast named locks)

> New strategy may be needed for Hazelcast named locks
> ----------------------------------------------------
>
>                 Key: MRESOLVER-404
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-404
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: Resolver
>            Reporter: Tamas Cservenak
>            Priority: Major
>
> Originally (for today, see below) Hazelcast NamedLock implementation worked 
> like this:
> * on lock acquire, an ISemaphore DO with lock name is created (or just get, 
> if exists), is refCounted
> * on lock release, if refCount shows 0 = uses, ISemaphore was destroyed 
> (releasing HZ cluster resources)
> * if after some time, a new lock acquire happened for same name, ISemaphore 
> DO would get re-created.
> Today, HZ NamedLocks implementation works in following way:
> * there is only one Semaphore provider implementation, the 
> {{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore 
> Distributed Object (DO) name and does not destroys the DO
> Reason for this is historical: originally, named locks precursor code was 
> done for Hazelcast 2/3, that used "unreliable" distributed operations, and 
> recreating previously destroyed DO was possible (at the cost of 
> "unreliability").
> Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things 
> reliable, it was at the cost that DOs once created, then destroyed, could not 
> be recreated anymore. This change was applied to 
> {{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused 
> ISemaphores (release semaphore is no-op method).
> But, this has an important consequence: a long running Hazelcast cluster will 
> have more and more ISemaphore DOs (basically as many as many Artifacts all 
> the builds met, that use this cluster to coordinate). Artifacts count 
> existing out there is not infinite, but is large enough -- especially if 
> cluster shared across many different/unrelated builds -- to grow over sane 
> limit.
> So, current recommendation is to have "large enough" dedicated Hazelcast 
> cluster and  use {{semaphore-hazelcast-client}} (that is a "thin client" that 
> connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick 
> client", so puts burden onto JVM process running it as node, hence Maven as 
> well). But even then, regular reboot of cluster may be needed.
> A proper but somewhat complicated solution would be to introduce some sort of 
> indirection: create as many ISemaphore as needed at the moment, and map those 
> onto locks names in use at the moment (and reuse unused semaphores). Problem 
> is, that mapping would need to be distributed as well (so all clients pick 
> them up, or perform new mapping), and this may cause performance penalty. But 
> this could be proved by exhaustive perf testing only.
> The benefit would be obvious: today cluster holds as many ISemaphores as many 
> Artifacts were met by all the builds, that use given cluster since cluster 
> boot. With indirection, the number of DOs would lowered to "maximum 
> concurrently used", so if you have a large build farm, that is able to juggle 
> with 1000 artifacts at given one moment, your cluster would have 1000 
> ISemaphores.
> Still, with proper "segmenting" of the clusters, for example to have them 
> split for "related" job groups, hence, the Artifacts coming thru them would 
> remain somewhat within limited boundaries, or some automation for "cluster 
> regular reboot", or simply just create "huge enough" clusters, may make users 
> benefit of never hitting these issues (cluster OOM). 
> And current code is most probably the fastest solution, hence, I just created 
> this issue to have it documented, but i plan no meritorious work on this 
> topic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to