[
https://issues.apache.org/jira/browse/IGNITE-27016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070487#comment-18070487
]
Kirill Anisimov edited comment on IGNITE-27016 at 4/2/26 4:52 AM:
------------------------------------------------------------------
[~zstan], added and validated a reproducer
{{{}org.apache.ignite.spi.discovery.zk.ZookeeperDiscoveryMultiJvmNodeRunnerReproducerTest{}}}.
It targets exactly the serialization path where discovery SPI may be lost/reset
before remote JVM startup. The test explicitly applies preprocessing and then
forces the problematic remote startup branch.
Root cause is in {*}IgniteNodeRunner.storeToFile(..){*}: before the fix,
*resetDiscovery=true* unconditionally reset discovery
({*}cfg0.setDiscoverySpi(null){*}), which dropped preprocessed
+*ZookeeperDiscoverySpi*+ before config transfer to remote JVM.
The issue {color:#de350b}was masked on master by fallback{color} in
*IgniteNodeRunner.readCfgFromFileAndDeleteFile(..)* (replacing missing SPI with
{+}*TcpDiscoverySpi*{+}) and by flows that did not force the problematic branch.
The reproducer forces this exact branch ({*}startRemoteGrid(..){*} override +
*resetDiscovery=true* + preprocessing before serialization) and validates
{*}testRemoteNodeStartWithPreprocessedDiscovery{*}: without fix remote node
does not join ({{{}Remote node has not joined{}}}), with fix remote node uses
+*TestZookeeperDiscoverySpi*+ and joins successfully.
was (Author: JIRAUSER310920):
Added and validated a reproducer
{{{}org.apache.ignite.spi.discovery.zk.ZookeeperDiscoveryMultiJvmNodeRunnerReproducerTest{}}}.
It targets exactly the serialization path where discovery SPI may be lost/reset
before remote JVM startup. The test explicitly applies preprocessing and then
forces the problematic remote startup branch.
Root cause is in {*}IgniteNodeRunner.storeToFile(..){*}: before the fix,
*resetDiscovery=true* unconditionally reset discovery
({*}cfg0.setDiscoverySpi(null){*}), which dropped preprocessed
+*ZookeeperDiscoverySpi*+ before config transfer to remote JVM.
The issue {color:#de350b}was masked on master by fallback{color} in
*IgniteNodeRunner.readCfgFromFileAndDeleteFile(..)* (replacing missing SPI with
{+}*TcpDiscoverySpi*{+}) and by flows that did not force the problematic branch.
The reproducer forces this exact branch ({*}startRemoteGrid(..){*} override +
*resetDiscovery=true* + preprocessing before serialization) and validates
{*}testRemoteNodeStartWithPreprocessedDiscovery{*}: without fix remote node
does not join ({{{}Remote node has not joined{}}}), with fix remote node uses
+*TestZookeeperDiscoverySpi*+ and joins successfully.
> Test GridCacheAtomicMultiJvmFullApiSelfTest fails when started with ZooKeeper
> Discovery
> ---------------------------------------------------------------------------------------
>
> Key: IGNITE-27016
> URL: https://issues.apache.org/jira/browse/IGNITE-27016
> Project: Ignite
> Issue Type: Task
> Components: zookeeper
> Reporter: Sergey Chugunov
> Assignee: Kirill Anisimov
> Priority: Major
> Labels: MakeTeamcityGreenAgain, Newbie, ignite-2, ise
> Fix For: 2.19
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Test
> [fails|https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-1189263754594131880&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E]
> in master branch with 100% fail rate.
> An exception is thrown in "main" test logs saying that node in remote JVM
> hasn't joined topology in time:
> {code:java}
> java.lang.AssertionError: Remote node has not joined
> [id=3a39096c-c2ad-4467-a68f-0f031c900001]
> at
> org.apache.ignite.testframework.junits.multijvm.IgniteProcessProxy.<init>(IgniteProcessProxy.java:201)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startRemoteGrid(GridAbstractTest.java:1483)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startRemoteGrid(GridAbstractTest.java:1378)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1339)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1240)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAbstractFullApiSelfTest.startGrid(GridCacheAbstractFullApiSelfTest.java:349)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1216)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1047)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrids(GridAbstractTest.java:889)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAbstractSelfTest.beforeTestsStarted(GridCacheAbstractSelfTest.java:97)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAbstractFullApiSelfTest.beforeTestsStarted(GridCacheAbstractFullApiSelfTest.java:213)
> {code}
> Logs of external node indicate that something is wrong with passing
> configuration from the main test to external JVM as that node is started with
> TcpDiscoverySpi instead of ZookeeperDiscovery:
> {noformat}
> [2025-11-11T10:25:19,351][INFO ][Thread-2925][jvm-3a39096c]
> [2025-11-11T10:25:19,350][INFO
> ][main][GridCacheAtomicMultiJvmFullApiSelfTest1] IgniteConfiguration
> [igniteInstanceName=multijvm.GridCacheAtomicMultiJvmFullApiSelfTest1...,
> discoSpi=TcpDiscoverySpi [addrRslvr=null, addressFilter=null,...]
> {noformat}
> It is needed to figure out why configuration is passed incorrectly and fix it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)