[
https://issues.apache.org/jira/browse/GEODE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirk Lund reassigned GEODE-29:
------------------------------
Assignee: Kirk Lund (was: Jens Deppe)
> Fix all functional/behavioral differences between cache.xml and the public
> Java API.
> ------------------------------------------------------------------------------------
>
> Key: GEODE-29
> URL: https://issues.apache.org/jira/browse/GEODE-29
> Project: Geode
> Issue Type: Improvement
> Components: configuration
> Affects Versions: 1.0.0-incubating
> Environment: Apache Geode configured either with cache.xml, public
> Java API or Gfsh (+Cluster Config, an extension of cache.xml).
> Reporter: John Blum
> Assignee: Kirk Lund
> Priority: Critical
> Labels: ApacheGeode, CacheXML, PublicJavaAPI
>
> Certain _Apache Geode_ functions/behaviors are encapsulated in "internal"
> classes. Therefore, when a developer initially uses {{cache.xml}} to
> configure _Geode_ and then (perhaps) switches to configuring a node
> programmatically using the public, Java API with seemingly equivalent and
> complimentary configuration logic certain things cease to "work as expected."
> For example...
> 1. Premature GatewayReceiver start before Region exists resulting in
> event/data loss issue:
> In {{cache.xml}}, if a developer defines a {{GatewayReceiver}} along with
> Regions that may potentially be updated by the {{GatewayReceiver}}, _Goede_
> is careful not to "start" the {{GatewayReceiver}} until all the Regions have
> been created when processing (parsing and initializing _Geode_ components)
> the {{cache.xml}}.
> If _Geode_ were to start the {{GatewayReceiver}} "prematurely", and then
> events from the remote WAN site arrive before the Regions targeted by those
> events are created, then Geode will drop those events, thus causing data
> loss. Therefore _Geode's_ logic when processing {{cache.xml}} prevents this
> from happening.
> However, if a developer uses the public, Java API to define the same
> configuration, no out-of-box protection is offered to prevent event (data)
> loss from happening, thus leaving application developers of the _Geode_ API
> to know how _Geode_ functions "internally".
> Fortunately, application developers are not completely left to fend for
> themselves and be purview to all the details. Technologies, such as _Spring
> Data GemFire_, also consume and adhere to the _Geode_ public, Java API (and
> +only+ the "public" Java API; "internal" classes are not used given they are
> subject to change), is able to handle this using Spring's robust bean
> container lifecycle management features. However, other application
> consumers using the API will not fare as well.
> 2. Another problem stems from the poorly conceived and "imposed" ordering of
> persistent Regions.
> For instance, if I have 2 Members, each defining 2 persistent Regions, for
> which the Members are the "primary" for 1 of the 2 Regions and the 'other'
> Member hosts the redundant copy, like so...
> Member Regions
> -------------------------
> X B, A'
> Y A, B'
> Tick (') - indicates member (e.g. X) is the primary for a particular Region
> (i.e. A).
> Then, the system can result in a distributed deadlock due the non-apparent,
> non-arbitrary dependency between the Members caused by an improper
> configuration order of the Regions.
> In this situation, the primary Member for a Region must start before the
> Member hosting the redundant Region copy (secondary) because it is a property
> of _Geode" that the primary will have most recent, correct copy of the data.
> But, as I have illustrated above, when the system starts, and because I have
> defined the Regions in an improper (arbitrary) order, this system will
> deadlock. I.e. when Member X starts, it will attempt to create Region B
> first. However, Member X must wait for Member Y to start since Member Y is
> the "primary" for Region B.
> However, when Member Y starts, and because it tries to create Region A first,
> it too will wait on Member X hosting the "primary" copy of Region A thereby
> leading to a situation where each Member waits for the other and results in a
> distributed deadlock.
> This example is pretty scaled and get more complex as you add Members and
> additional Regions in a complex system.
> Of course, the "easy" solution is to ensure the Members in the cluster
> declaring the Region all define the Regions in their configuration in the
> "same order". This is made even easier with the use of a cluster-wide,
> shared configuration using the Cluster Configuration Service). So by
> defining all Regions in the same order on every Member (e.g. A followed by
> B), then a developer/user can avoid the distributed deadlock.
> However, it is naive for _Geode_ to assume users will know/conform to this
> restriction and impose an non-arbitrary order to workaround, basically, a
> technical limitation of the code.
> In other environments, such as Spring, you cannot necessarily guarantee what
> the order will be at runtime, especially if application components (e.g.
> DAO's) inject references to GemFire components (e.g. Regions) along with
> using in combination other advanced Spring container features like CLASSPATH
> component-scanning to wire up the entire application.
> Even "collocation" has an impact on the Region creation order since Spring
> must logically satisfy the "dependency" order of the beans first. This is
> both logical and makes sense, where as Geode's ordering is non-arbitrary and
> non-apparent since any Member could host the redundant copy. Therefore, this
> problem is an implementation detail leaked.
> Technically, the same problem can be reproduced in {{cache.xml}} for that
> matter with no Spring present. And, this problem is especially more likely
> to happen using the public Java API since again, there is no special *magic*
> being handled by "internal" Geode classes (in this case) w.r.t. to
> {{cache.xml}}. Users/developers just have to know the correct ordering.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)