This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e837ff9707ef [SPARK-54123][PYTHON] Add timezone to make the timestamp
an absolute time
e837ff9707ef is described below
commit e837ff9707ef5d08cde53c338dfaf7d81a5c55aa
Author: Takuya Ueshin <[email protected]>
AuthorDate: Sun Nov 2 16:49:41 2025 +0900
[SPARK-54123][PYTHON] Add timezone to make the timestamp an absolute time
### What changes were proposed in this pull request?
Adds timezone to make the timestamp in the log record JSON string the
absolute time.
### Why are the changes needed?
Without the timezone, the timestamp of each log record doesn't reflect the
session timezone, which makes it confusing.
<details>
<summary>example</summary>
```python
>>> from pyspark.sql.functions import *
>>> import logging
>>>
>>> udf
... def logging_test_udf(x):
... logger = logging.getLogger("test")
... logger.warning(f"message")
... return str(x)
...
>>>
>>> spark.conf.set("spark.sql.pyspark.worker.logging.enabled", True)
>>>
>>> spark.range(1).select(logging_test_udf("id")).show()
...
```
</details>
- Before
```python
>>> spark.conf.get('spark.sql.session.timeZone')
'America/Los_Angeles'
>>> spark.sql("select ts from
system.session.python_worker_logs").show(truncate=False)
+--------------------------+
|ts |
+--------------------------+
|2025-10-31 17:17:59.495541|
+--------------------------+
>>> spark.conf.set('spark.sql.session.timeZone', 'UTC')
>>> spark.sql("select ts from
system.session.python_worker_logs").show(truncate=False)
+--------------------------+
|ts |
+--------------------------+
|2025-10-31 17:17:59.495541|
+--------------------------+
```
- After
```python
>>> spark.conf.get('spark.sql.session.timeZone')
'America/Los_Angeles'
>>> spark.sql("select ts from
system.session.python_worker_logs").show(truncate=False)
+--------------------------+
|ts |
+--------------------------+
|2025-10-31 17:19:52.152868|
+--------------------------+
>>> spark.conf.set('spark.sql.session.timeZone', 'UTC')
>>> spark.sql("select ts from
system.session.python_worker_logs").show(truncate=False)
+--------------------------+
|ts |
+--------------------------+
|2025-11-01 00:19:52.152868|
+--------------------------+
```
### Does this PR introduce _any_ user-facing change?
Yes, the timestamp of log record is now absolute time.
### How was this patch tested?
Manually.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #52823 from ueshin/issues/SPARK-54123/timezone.
Authored-by: Takuya Ueshin <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/logger/worker_io.py | 1 +
1 file changed, 1 insertion(+)
diff --git a/python/pyspark/logger/worker_io.py
b/python/pyspark/logger/worker_io.py
index 2e5ced2e84ad..79684b7aca62 100644
--- a/python/pyspark/logger/worker_io.py
+++ b/python/pyspark/logger/worker_io.py
@@ -164,6 +164,7 @@ class JSONFormatterWithMarker(JSONFormatter):
)
elif self.default_msec_format:
s = self.default_msec_format % (s, record.msecs)
+ s = f"{s}{time.strftime('%z', ct)}"
return s
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]