[ 
https://issues.apache.org/jira/browse/IGNITE-27016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070487#comment-18070487
 ] 

Kirill Anisimov edited comment on IGNITE-27016 at 4/2/26 4:52 AM:
------------------------------------------------------------------

[~zstan],  added and validated a reproducer  
{{{}org.apache.ignite.spi.discovery.zk.ZookeeperDiscoveryMultiJvmNodeRunnerReproducerTest{}}}.

It targets exactly the serialization path where discovery SPI may be lost/reset 
before remote JVM startup. The test explicitly applies preprocessing and then 
forces the problematic remote startup branch.
Root cause is in {*}IgniteNodeRunner.storeToFile(..){*}: before the fix, 
*resetDiscovery=true* unconditionally reset discovery 
({*}cfg0.setDiscoverySpi(null){*}), which dropped preprocessed 
+*ZookeeperDiscoverySpi*+ before config transfer to remote JVM.
The issue {color:#de350b}was masked on master by fallback{color} in 
*IgniteNodeRunner.readCfgFromFileAndDeleteFile(..)* (replacing missing SPI with 
{+}*TcpDiscoverySpi*{+}) and by flows that did not force the problematic branch.
The reproducer forces this exact branch ({*}startRemoteGrid(..){*} override + 
*resetDiscovery=true* + preprocessing before serialization) and validates 
{*}testRemoteNodeStartWithPreprocessedDiscovery{*}: without fix remote node 
does not join ({{{}Remote node has not joined{}}}), with fix remote node uses 
+*TestZookeeperDiscoverySpi*+ and joins successfully.


was (Author: JIRAUSER310920):
Added and validated a reproducer  
{{{}org.apache.ignite.spi.discovery.zk.ZookeeperDiscoveryMultiJvmNodeRunnerReproducerTest{}}}.

It targets exactly the serialization path where discovery SPI may be lost/reset 
before remote JVM startup. The test explicitly applies preprocessing and then 
forces the problematic remote startup branch.
Root cause is in {*}IgniteNodeRunner.storeToFile(..){*}: before the fix, 
*resetDiscovery=true* unconditionally reset discovery 
({*}cfg0.setDiscoverySpi(null){*}), which dropped preprocessed 
+*ZookeeperDiscoverySpi*+ before config transfer to remote JVM.
The issue {color:#de350b}was masked on master by fallback{color} in 
*IgniteNodeRunner.readCfgFromFileAndDeleteFile(..)* (replacing missing SPI with 
{+}*TcpDiscoverySpi*{+}) and by flows that did not force the problematic branch.
The reproducer forces this exact branch ({*}startRemoteGrid(..){*} override + 
*resetDiscovery=true* + preprocessing before serialization) and validates 
{*}testRemoteNodeStartWithPreprocessedDiscovery{*}: without fix remote node 
does not join ({{{}Remote node has not joined{}}}), with fix remote node uses 
+*TestZookeeperDiscoverySpi*+ and joins successfully.

> Test GridCacheAtomicMultiJvmFullApiSelfTest fails when started with ZooKeeper 
> Discovery
> ---------------------------------------------------------------------------------------
>
>                 Key: IGNITE-27016
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27016
>             Project: Ignite
>          Issue Type: Task
>          Components: zookeeper
>            Reporter: Sergey Chugunov
>            Assignee: Kirill Anisimov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain, Newbie, ignite-2, ise
>             Fix For: 2.19
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Test 
> [fails|https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-1189263754594131880&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E]
>  in master branch with 100% fail rate.
> An exception is thrown in "main" test logs saying that node in remote JVM 
> hasn't joined topology in time:
> {code:java}
> java.lang.AssertionError: Remote node has not joined 
> [id=3a39096c-c2ad-4467-a68f-0f031c900001]
>       at 
> org.apache.ignite.testframework.junits.multijvm.IgniteProcessProxy.<init>(IgniteProcessProxy.java:201)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startRemoteGrid(GridAbstractTest.java:1483)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startRemoteGrid(GridAbstractTest.java:1378)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1339)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1240)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAbstractFullApiSelfTest.startGrid(GridCacheAbstractFullApiSelfTest.java:349)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1216)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:1047)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrids(GridAbstractTest.java:889)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAbstractSelfTest.beforeTestsStarted(GridCacheAbstractSelfTest.java:97)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAbstractFullApiSelfTest.beforeTestsStarted(GridCacheAbstractFullApiSelfTest.java:213)
> {code}
> Logs of external node indicate that something is wrong with passing 
> configuration from the main test to external JVM as that node is started with 
> TcpDiscoverySpi instead of ZookeeperDiscovery:
> {noformat}
> [2025-11-11T10:25:19,351][INFO ][Thread-2925][jvm-3a39096c] 
> [2025-11-11T10:25:19,350][INFO 
> ][main][GridCacheAtomicMultiJvmFullApiSelfTest1] IgniteConfiguration 
> [igniteInstanceName=multijvm.GridCacheAtomicMultiJvmFullApiSelfTest1..., 
> discoSpi=TcpDiscoverySpi [addrRslvr=null, addressFilter=null,...]
> {noformat}
> It is needed to figure out why configuration is passed incorrectly and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to