[ https://issues.apache.org/jira/browse/MRESOLVER-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tamas Cservenak updated MRESOLVER-404: -------------------------------------- Summary: New strategy may be needed for Hazelcast named locks (was: New strategy for Hazelcast named locks) > New strategy may be needed for Hazelcast named locks > ---------------------------------------------------- > > Key: MRESOLVER-404 > URL: https://issues.apache.org/jira/browse/MRESOLVER-404 > Project: Maven Resolver > Issue Type: Improvement > Components: Resolver > Reporter: Tamas Cservenak > Priority: Major > > Originally (for today, see below) Hazelcast NamedLock implementation worked > like this: > * on lock acquire, an ISemaphore DO with lock name is created (or just get, > if exists), is refCounted > * on lock release, if refCount shows 0 = uses, ISemaphore was destroyed > (releasing HZ cluster resources) > * if after some time, a new lock acquire happened for same name, ISemaphore > DO would get re-created. > Today, HZ NamedLocks implementation works in following way: > * there is only one Semaphore provider implementation, the > {{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore > Distributed Object (DO) name and does not destroys the DO > Reason for this is historical: originally, named locks precursor code was > done for Hazelcast 2/3, that used "unreliable" distributed operations, and > recreating previously destroyed DO was possible (at the cost of > "unreliability"). > Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things > reliable, it was at the cost that DOs once created, then destroyed, could not > be recreated anymore. This change was applied to > {{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused > ISemaphores (release semaphore is no-op method). > But, this has an important consequence: a long running Hazelcast cluster will > have more and more ISemaphore DOs (basically as many as many Artifacts all > the builds met, that use this cluster to coordinate). Artifacts count > existing out there is not infinite, but is large enough -- especially if > cluster shared across many different/unrelated builds -- to grow over sane > limit. > So, current recommendation is to have "large enough" dedicated Hazelcast > cluster and use {{semaphore-hazelcast-client}} (that is a "thin client" that > connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick > client", so puts burden onto JVM process running it as node, hence Maven as > well). But even then, regular reboot of cluster may be needed. > A proper but somewhat complicated solution would be to introduce some sort of > indirection: create as many ISemaphore as needed at the moment, and map those > onto locks names in use at the moment (and reuse unused semaphores). Problem > is, that mapping would need to be distributed as well (so all clients pick > them up, or perform new mapping), and this may cause performance penalty. But > this could be proved by exhaustive perf testing only. > The benefit would be obvious: today cluster holds as many ISemaphores as many > Artifacts were met by all the builds, that use given cluster since cluster > boot. With indirection, the number of DOs would lowered to "maximum > concurrently used", so if you have a large build farm, that is able to juggle > with 1000 artifacts at given one moment, your cluster would have 1000 > ISemaphores. > Still, with proper "segmenting" of the clusters, for example to have them > split for "related" job groups, hence, the Artifacts coming thru them would > remain somewhat within limited boundaries, or some automation for "cluster > regular reboot", or simply just create "huge enough" clusters, may make users > benefit of never hitting these issues (cluster OOM). > And current code is most probably the fastest solution, hence, I just created > this issue to have it documented, but i plan no meritorious work on this > topic. -- This message was sent by Atlassian Jira (v8.20.10#820010)