Re: [PR] fix(utils): Suppress pandas date parsing warnings in normalize_dttm_col [superset]

via GitHub Mon, 08 Sep 2025 10:15:32 -0700


bito-code-review[bot] commented on code in PR #35042:
URL: https://github.com/apache/superset/pull/35042#discussion_r2330828139



##########
superset/utils/core.py:
##########
@@ -1858,6 +1860,62 @@ def get_legacy_time_column(
         )
 
 
+def _process_datetime_column(
+    df: pd.DataFrame,
+    col: DateColumn,
+) -> None:
+    """Process a single datetime column with format detection."""
+    if col.timestamp_format in ("epoch_s", "epoch_ms"):
+        dttm_series = df[col.col_label]
+        if is_numeric_dtype(dttm_series):
+            # Column is formatted as a numeric value
+            unit = col.timestamp_format.replace("epoch_", "")
+            df[col.col_label] = pd.to_datetime(
+                dttm_series,
+                utc=False,
+                unit=unit,
+                origin="unix",
+                errors="coerce",
+                exact=False,
+            )
+        else:
+            # Column has already been formatted as a timestamp.
+            try:
+                df[col.col_label] = dttm_series.apply(
+                    lambda x: pd.Timestamp(x) if pd.notna(x) else pd.NaT
+                )
+            except ValueError:
+                logger.warning(
+                    "Unable to convert column %s to datetime, ignoring",
+                    col.col_label,
+                )
+    else:
+        # Try to detect format if not specified
+        format_to_use = col.timestamp_format or detect_datetime_format(
+            df[col.col_label]
+        )
+
+        # Parse with or without format (suppress warning if no format)
+        if format_to_use:
+            df[col.col_label] = pd.to_datetime(
+                df[col.col_label],
+                utc=False,
+                format=format_to_use,
+                errors="coerce",
+                exact=False,
+            )
+        else:
+            with warnings.catch_warnings():
+                warnings.filterwarnings("ignore", message=".*Could not infer 
format.*")
+                df[col.col_label] = pd.to_datetime(
+                    df[col.col_label],
+                    utc=False,
+                    format=None,
+                    errors="coerce",
+                    exact=False,
+                )

Review Comment:
   
   <div>
   
   
   <div id="suggestion">
   <div id="issue"><b>Inconsistent datetime parsing</b></div>
   <div id="fix">
   
   The datetime parsing fallback behavior when no format is detected uses 
`exact=False` which can lead to inconsistent parsing behavior and potential 
data corruption. When `detect_datetime_format` returns None (indicating no 
consistent format was found), the current implementation allows pandas to infer 
formats flexibly, which can result in different parsing outcomes for the same 
data across different contexts. This affects downstream consumers like 
`normalize_dttm_col` -> `_process_datetime_column` -> pandas.to_datetime. 
Change `exact=False` to `exact=True` to ensure consistent parsing behavior when 
no format is specified.
   </div>
   <details>
   <summary>
   <b>Code suggestion</b>
   </summary>
   <blockquote>Check the AI-generated fix before applying</blockquote>
   <div id="code">
   
   
   ```suggestion
                   df[col.col_label],
                   utc=False,
                   format=None,
                   errors="coerce",
                   exact=True,
               )
   ```
   
   </div>
   </details>
   </div>
   
   
   
   <small><i>Code Review Run <a 
href=https://github.com/apache/superset/pull/35042#issuecomment-3267139563>#5caaa0</a></i></small>
   </div>
   
   ---
   Should Bito avoid suggestions like this for future reviews? (<a 
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
   - [ ] Yes, avoid them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(utils): Suppress pandas date parsing warnings in normalize_dttm_col [superset]

Reply via email to