This is an automated email from the ASF dual-hosted git repository.

github-merge-queue[bot] pushed a commit to branch 
gh-readonly-queue/main/pr-5027-7bd65500562724eced01d87c254fdab1e9474e7b
in repository https://gitbox.apache.org/repos/asf/texera.git

commit 469e3874c9fb0b167ca190bd57078622488cb721
Author: nathant27 <[email protected]>
AuthorDate: Thu May 28 10:37:19 2026 -0700

    fix(amber): avoids false negative on _check_heartbeat when connection is 
established but close raises exception (#5027)
    
    <!--
    Thanks for sending a pull request (PR)! Here are some tips for you:
    1. If this is your first time, please read our contributor guidelines:
    [Contributing to
    Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
      2. Ensure you have added or run the appropriate tests for your PR
      3. If the PR is work in progress, mark it a draft on GitHub.
      4. Please write your PR title to summarize what this PR proposes, we
        are following Conventional Commits style for PR titles as well.
      5. Be sure to keep the PR description updated to reflect all changes.
    -->
    
    ### What changes were proposed in this PR?
    <!--
    Please clarify what changes you are proposing. The purpose of this
    section
    is to outline the changes. Here are some tips for you:
      1. If you propose a new API, clarify the use case for a new API.
      2. If you fix a bug, you can clarify why it is a bug.
      3. If it is a refactoring, clarify what has been changed.
      3. It would be helpful to include a before-and-after comparison using
         screenshots or GIFs.
      4. Please consider writing useful notes for better and faster reviews.
    -->
    Old version of Heartbeat._check_heartbeat(self) wraps the socket
    connection and socket close in a single try except, and handles with the
    same try except. The issue is that if close raises after a successful
    connection, there is a false negative since it will handle it as if
    socket connection had failed. In Heartbeat.run(self), this could cause
    the server to shutdown on close raising exception since _check_heartbeat
    will return False in this case.
    New version separates the connection handling and the close in two
    adjacent try except blocks, and returns true as long as connection is
    successfully established as well, getting rid of the false negative.
    
    As per discussed in the issue, will keep the double check for
    _check_heartbeat in Heartbeat.run.
    
    Old Version:
    ```
        def _check_heartbeat(self) -> bool:
            """
            Attempt to connect to JVM on the specific port. If succeeds, it 
means the
            socket is still available and the JVM is still alive. Otherwise, 
the JVM
            might have been gone.
    
            :return: bool, indicating if the socket is available.
            """
            try:
                temp_socket = socket.create_connection(
                    (self._parsed_server_host, self._parsed_server_port), 
timeout=1
                )
                temp_socket.close()
                return True
            except Exception as e:
                logger.warning(f"Server is down with exception: {e}")
                return False
    ```
    
    New Version:
    ```
        def _check_heartbeat(self) -> bool:
            """
            Attempt to connect to the JVM port. If connection failure, then JVM 
is dead, or if connection success
            then JVM is alive even if close() raises. Logs on connection 
failure and on close error.
    
            :return: bool, True if connect succeeded, False if connect failed.
            """
            try:
                temp_socket = socket.create_connection(
                    (self._parsed_server_host, self._parsed_server_port), 
timeout=1
                )
            except Exception as e:
                logger.warning(f"Server is down with exception: {e}")
                return False
    
            try:
                temp_socket.close()
            except Exception as e:
                logger.warning(f"Failed to close socket: {e}")
    
            return True
    ```
    
    ### Any related issues, documentation, discussions?
    Closes #4726
    
    ### How was this PR tested?
    Changed old test to reflect new behavior(True on connection succeeds but
    socket close rasises)
    OLD TEST(REMOVED):
    ```
        def test_returns_false_when_socket_close_raises(self):
            # Pins the false-negative path: connect succeeds but the subsequent
            # close() throws (e.g. broken pipe on a half-open socket). The bare
            # `except Exception` in _check_heartbeat() catches it and reports
            # "server down", and a regression that narrows that handler would be
            # caught here.
            hb = make_heartbeat()
            fake_sock = MagicMock()
            fake_sock.close.side_effect = OSError("close failed")
            with patch(
                "core.runnables.heartbeat.socket.create_connection",
                return_value=fake_sock,
            ):
                assert hb._check_heartbeat() is False
    ```
    NEW TEST:
    ```
        def 
test_returns_true_when_connection_succeeds_but_socket_close_raises(self):
            hb = make_heartbeat()
            fake_sock = MagicMock()
            fake_sock.close.side_effect = OSError("close failed")
            with patch(
                "core.runnables.heartbeat.socket.create_connection",
                return_value=fake_sock,
            ):
                assert hb._check_heartbeat() is True
    ```
    Passes on existing old tests
    
    ### Was this PR authored or co-authored using generative AI tooling?
    No
---
 amber/src/main/python/core/runnables/heartbeat.py      | 16 ++++++++++------
 amber/src/test/python/core/runnables/test_heartbeat.py | 10 +++-------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/amber/src/main/python/core/runnables/heartbeat.py 
b/amber/src/main/python/core/runnables/heartbeat.py
index 1e4c45837d..941782f699 100644
--- a/amber/src/main/python/core/runnables/heartbeat.py
+++ b/amber/src/main/python/core/runnables/heartbeat.py
@@ -79,22 +79,26 @@ class Heartbeat(Runnable, Stoppable):
 
     def _check_heartbeat(self) -> bool:
         """
-        Attempt to connect to JVM on the specific port. If succeeds, it means 
the
-        socket is still available and the JVM is still alive. Otherwise, the 
JVM
-        might have been gone.
+        Attempt to connect to the JVM port. Returns True iff 
socket.create_connection succeeds;
+        a close() failure after a successful connection is logged but DOES NOT 
flip the return value.
 
-        :return: bool, indicating if the socket is available.
+        :return: bool, True if connect succeeded, False if connect failed.
         """
         try:
             temp_socket = socket.create_connection(
                 (self._parsed_server_host, self._parsed_server_port), timeout=1
             )
-            temp_socket.close()
-            return True
         except Exception as e:
             logger.warning(f"Server is down with exception: {e}")
             return False
 
+        try:
+            temp_socket.close()
+        except Exception as e:
+            logger.warning(f"Failed to close socket: {e}")
+
+        return True
+
     @overrides
     def stop(self):
         # clean up every process under the python worker
diff --git a/amber/src/test/python/core/runnables/test_heartbeat.py 
b/amber/src/test/python/core/runnables/test_heartbeat.py
index 76dc93071a..0fe5b321f5 100644
--- a/amber/src/test/python/core/runnables/test_heartbeat.py
+++ b/amber/src/test/python/core/runnables/test_heartbeat.py
@@ -79,12 +79,7 @@ class TestCheckHeartbeat:
         ):
             assert hb._check_heartbeat() is False
 
-    def test_returns_false_when_socket_close_raises(self):
-        # Pins the false-negative path: connect succeeds but the subsequent
-        # close() throws (e.g. broken pipe on a half-open socket). The bare
-        # `except Exception` in _check_heartbeat() catches it and reports
-        # "server down", and a regression that narrows that handler would be
-        # caught here.
+    def 
test_returns_true_when_connection_succeeds_but_socket_close_raises(self):
         hb = make_heartbeat()
         fake_sock = MagicMock()
         fake_sock.close.side_effect = OSError("close failed")
@@ -92,7 +87,8 @@ class TestCheckHeartbeat:
             "core.runnables.heartbeat.socket.create_connection",
             return_value=fake_sock,
         ):
-            assert hb._check_heartbeat() is False
+            assert hb._check_heartbeat() is True
+            fake_sock.close.assert_called_once()
 
 
 class TestRunEarlyExit:

Reply via email to