bito-code-review[bot] commented on code in PR #35042:
URL: https://github.com/apache/superset/pull/35042#discussion_r2330828139
##########
superset/utils/core.py:
##########
@@ -1858,6 +1860,62 @@ def get_legacy_time_column(
)
+def _process_datetime_column(
+ df: pd.DataFrame,
+ col: DateColumn,
+) -> None:
+ """Process a single datetime column with format detection."""
+ if col.timestamp_format in ("epoch_s", "epoch_ms"):
+ dttm_series = df[col.col_label]
+ if is_numeric_dtype(dttm_series):
+ # Column is formatted as a numeric value
+ unit = col.timestamp_format.replace("epoch_", "")
+ df[col.col_label] = pd.to_datetime(
+ dttm_series,
+ utc=False,
+ unit=unit,
+ origin="unix",
+ errors="coerce",
+ exact=False,
+ )
+ else:
+ # Column has already been formatted as a timestamp.
+ try:
+ df[col.col_label] = dttm_series.apply(
+ lambda x: pd.Timestamp(x) if pd.notna(x) else pd.NaT
+ )
+ except ValueError:
+ logger.warning(
+ "Unable to convert column %s to datetime, ignoring",
+ col.col_label,
+ )
+ else:
+ # Try to detect format if not specified
+ format_to_use = col.timestamp_format or detect_datetime_format(
+ df[col.col_label]
+ )
+
+ # Parse with or without format (suppress warning if no format)
+ if format_to_use:
+ df[col.col_label] = pd.to_datetime(
+ df[col.col_label],
+ utc=False,
+ format=format_to_use,
+ errors="coerce",
+ exact=False,
+ )
+ else:
+ with warnings.catch_warnings():
+ warnings.filterwarnings("ignore", message=".*Could not infer
format.*")
+ df[col.col_label] = pd.to_datetime(
+ df[col.col_label],
+ utc=False,
+ format=None,
+ errors="coerce",
+ exact=False,
+ )
Review Comment:
<div>
<div id="suggestion">
<div id="issue"><b>Inconsistent datetime parsing</b></div>
<div id="fix">
The datetime parsing fallback behavior when no format is detected uses
`exact=False` which can lead to inconsistent parsing behavior and potential
data corruption. When `detect_datetime_format` returns None (indicating no
consistent format was found), the current implementation allows pandas to infer
formats flexibly, which can result in different parsing outcomes for the same
data across different contexts. This affects downstream consumers like
`normalize_dttm_col` -> `_process_datetime_column` -> pandas.to_datetime.
Change `exact=False` to `exact=True` to ensure consistent parsing behavior when
no format is specified.
</div>
<details>
<summary>
<b>Code suggestion</b>
</summary>
<blockquote>Check the AI-generated fix before applying</blockquote>
<div id="code">
```suggestion
df[col.col_label],
utc=False,
format=None,
errors="coerce",
exact=True,
)
```
</div>
</details>
</div>
<small><i>Code Review Run <a
href=https://github.com/apache/superset/pull/35042#issuecomment-3267139563>#5caaa0</a></i></small>
</div>
---
Should Bito avoid suggestions like this for future reviews? (<a
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
- [ ] Yes, avoid them
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]