This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new e837ff9707ef [SPARK-54123][PYTHON] Add timezone to make the timestamp 
an absolute time
e837ff9707ef is described below

commit e837ff9707ef5d08cde53c338dfaf7d81a5c55aa
Author: Takuya Ueshin <[email protected]>
AuthorDate: Sun Nov 2 16:49:41 2025 +0900

    [SPARK-54123][PYTHON] Add timezone to make the timestamp an absolute time
    
    ### What changes were proposed in this pull request?
    
    Adds timezone to make the timestamp in the log record JSON string the 
absolute time.
    
    ### Why are the changes needed?
    
    Without the timezone, the timestamp of each log record doesn't reflect the 
session timezone, which makes it confusing.
    
    <details>
    <summary>example</summary>
    
    ```python
    >>> from pyspark.sql.functions import *
    >>> import logging
    >>>
    >>> udf
    ... def logging_test_udf(x):
    ...     logger = logging.getLogger("test")
    ...     logger.warning(f"message")
    ...     return str(x)
    ...
    >>>
    >>> spark.conf.set("spark.sql.pyspark.worker.logging.enabled", True)
    >>>
    >>> spark.range(1).select(logging_test_udf("id")).show()
    ...
    ```
    
    </details>
    
    - Before
    
    ```python
    >>> spark.conf.get('spark.sql.session.timeZone')
    'America/Los_Angeles'
    >>> spark.sql("select ts from 
system.session.python_worker_logs").show(truncate=False)
    +--------------------------+
    |ts                        |
    +--------------------------+
    |2025-10-31 17:17:59.495541|
    +--------------------------+
    
    >>> spark.conf.set('spark.sql.session.timeZone', 'UTC')
    >>> spark.sql("select ts from 
system.session.python_worker_logs").show(truncate=False)
    +--------------------------+
    |ts                        |
    +--------------------------+
    |2025-10-31 17:17:59.495541|
    +--------------------------+
    ```
    
    - After
    
    ```python
    >>> spark.conf.get('spark.sql.session.timeZone')
    'America/Los_Angeles'
    >>> spark.sql("select ts from 
system.session.python_worker_logs").show(truncate=False)
    +--------------------------+
    |ts                        |
    +--------------------------+
    |2025-10-31 17:19:52.152868|
    +--------------------------+
    
    >>> spark.conf.set('spark.sql.session.timeZone', 'UTC')
    >>> spark.sql("select ts from 
system.session.python_worker_logs").show(truncate=False)
    +--------------------------+
    |ts                        |
    +--------------------------+
    |2025-11-01 00:19:52.152868|
    +--------------------------+
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the timestamp of log record is now absolute time.
    
    ### How was this patch tested?
    
    Manually.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #52823 from ueshin/issues/SPARK-54123/timezone.
    
    Authored-by: Takuya Ueshin <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/logger/worker_io.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/pyspark/logger/worker_io.py 
b/python/pyspark/logger/worker_io.py
index 2e5ced2e84ad..79684b7aca62 100644
--- a/python/pyspark/logger/worker_io.py
+++ b/python/pyspark/logger/worker_io.py
@@ -164,6 +164,7 @@ class JSONFormatterWithMarker(JSONFormatter):
                 )
             elif self.default_msec_format:
                 s = self.default_msec_format % (s, record.msecs)
+            s = f"{s}{time.strftime('%z', ct)}"
         return s
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to