korbit-ai[bot] commented on code in PR #35318:
URL: https://github.com/apache/superset/pull/35318#discussion_r2386409370


##########
superset/config.py:
##########
@@ -837,6 +837,9 @@ class D3TimeFormat(TypedDict, total=False):
 }
 THUMBNAIL_ERROR_CACHE_TTL = int(timedelta(days=1).total_seconds())
 
+# Cache warmup user
+SUPERSET_CACHE_WARMUP_USER = "admin"

Review Comment:
   ### Hardcoded cache warmup user may not exist <sub>![category 
Functionality](https://img.shields.io/badge/Functionality-0284c7)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The hardcoded cache warmup user 'admin' may not exist in all Superset 
installations, potentially causing cache warmup operations to fail.
   
   
   ###### Why this matters
   If the 'admin' user doesn't exist in a particular Superset deployment, any 
cache warmup operations that rely on this user will fail, breaking the new 
WebDriver-based cache warmup functionality.
   
   ###### Suggested change ∙ *Feature Preview*
   Make the cache warmup user configurable with a fallback mechanism or 
validation. For example:
   
   ```python
   # Cache warmup user - should be configured per deployment
   # Ensure this user exists and has appropriate permissions
   SUPERSET_CACHE_WARMUP_USER = os.environ.get("SUPERSET_CACHE_WARMUP_USER", 
"admin")
   ```
   
   Or add validation logic to ensure the user exists before attempting cache 
warmup operations.
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/6cfa437d-2a6f-4d54-b38d-4f08a2048966/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/6cfa437d-2a6f-4d54-b38d-4f08a2048966?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/6cfa437d-2a6f-4d54-b38d-4f08a2048966?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/6cfa437d-2a6f-4d54-b38d-4f08a2048966?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/6cfa437d-2a6f-4d54-b38d-4f08a2048966)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:37495b7c-8543-4931-b3a6-4ba8703c031f -->
   
   
   [](37495b7c-8543-4931-b3a6-4ba8703c031f)



##########
superset/utils/webdriver.py:
##########
@@ -243,7 +246,30 @@ def get_screenshot(  # pylint: disable=too-many-locals, 
too-many-statements  # n
 
 
 class WebDriverSelenium(WebDriverProxy):
-    def create(self) -> WebDriver:
+    def __init__(
+        self,
+        driver_type: str,
+        window: WindowSize | None = None,
+        user: User | None = None,
+    ):
+        super().__init__(driver_type, window)
+        self._user = user
+        self._driver = None
+
+    def __del__(self) -> None:
+        self._destroy()

Review Comment:
   ### Unreliable resource cleanup with __del__ <sub>![category 
Performance](https://img.shields.io/badge/Performance-4f46e5)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   Using `__del__` for resource cleanup of WebDriver instances is unreliable 
and can cause resource leaks.
   
   
   ###### Why this matters
   Python's garbage collector doesn't guarantee when or if `__del__` will be 
called, potentially leaving WebDriver processes running indefinitely and 
consuming system resources.
   
   ###### Suggested change ∙ *Feature Preview*
   Implement a context manager protocol (`__enter__`/`__exit__`) or provide an 
explicit `close()` method that users must call, and document the resource 
management requirements clearly.
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/161cd273-8db8-4d92-bd3e-efb4a66a9e45/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/161cd273-8db8-4d92-bd3e-efb4a66a9e45?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/161cd273-8db8-4d92-bd3e-efb4a66a9e45?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/161cd273-8db8-4d92-bd3e-efb4a66a9e45?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/161cd273-8db8-4d92-bd3e-efb4a66a9e45)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:0a98da93-e3c2-4b2c-b82b-5e54d55b8856 -->
   
   
   [](0a98da93-e3c2-4b2c-b82b-5e54d55b8856)



##########
superset/tasks/cache.py:
##########
@@ -118,8 +109,12 @@ class DummyStrategy(Strategy):  # pylint: 
disable=too-few-public-methods
 
     name = "dummy"
 
-    def get_tasks(self) -> list[CacheWarmupTask]:
-        return [get_task(chart) for chart in db.session.query(Slice).all()]
+    def get_urls(self) -> list[str]:
+        dashboards = (
+            
db.session.query(Dashboard).filter(Dashboard.published.is_(True)).all()
+        )
+
+        return [get_dash_url(dashboard) for dashboard in dashboards if 
dashboard.slices]

Review Comment:
   ### Unbounded memory usage in dashboard query <sub>![category 
Performance](https://img.shields.io/badge/Performance-4f46e5)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   Loading all published dashboards into memory at once using .all() without 
pagination or batching.
   
   
   ###### Why this matters
   This approach can consume excessive memory and cause performance degradation 
or out-of-memory errors when there are thousands of dashboards. The entire 
result set is loaded into memory before filtering.
   
   ###### Suggested change ∙ *Feature Preview*
   Use database-level filtering and implement batching or streaming:
   
   ```python
   # Filter at database level to avoid loading unnecessary data
   dashboards = (
       db.session.query(Dashboard)
       .filter(Dashboard.published.is_(True))
       .filter(Dashboard.slices.any())  # Only dashboards with slices
   )
   
   # Process in batches to control memory usage
   batch_size = 100
   urls = []
   for offset in range(0, dashboards.count(), batch_size):
       batch = dashboards.offset(offset).limit(batch_size).all()
       urls.extend([get_dash_url(dashboard) for dashboard in batch])
   return urls
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1de87a6a-f9f5-4830-a2a6-c9b206c68a81/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1de87a6a-f9f5-4830-a2a6-c9b206c68a81?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1de87a6a-f9f5-4830-a2a6-c9b206c68a81?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1de87a6a-f9f5-4830-a2a6-c9b206c68a81?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/1de87a6a-f9f5-4830-a2a6-c9b206c68a81)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:11f43670-85f7-4233-9830-33bcb5a2b6ff -->
   
   
   [](11f43670-85f7-4233-9830-33bcb5a2b6ff)



##########
superset/utils/webdriver.py:
##########
@@ -243,7 +246,30 @@ def get_screenshot(  # pylint: disable=too-many-locals, 
too-many-statements  # n
 
 
 class WebDriverSelenium(WebDriverProxy):
-    def create(self) -> WebDriver:
+    def __init__(
+        self,
+        driver_type: str,
+        window: WindowSize | None = None,
+        user: User | None = None,
+    ):
+        super().__init__(driver_type, window)
+        self._user = user
+        self._driver = None
+
+    def __del__(self) -> None:
+        self._destroy()
+
+    @property
+    def driver(self) -> WebDriver:
+        if not self._driver:
+            self._driver = self._create()
+            assert self._driver  # for mypy
+            self._driver.set_window_size(*self._window)
+            if self._user:
+                self._auth(self._user)
+        return self._driver

Review Comment:
   ### No error recovery in expensive driver initialization <sub>![category 
Performance](https://img.shields.io/badge/Performance-4f46e5)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The driver property performs expensive operations (driver creation, window 
sizing, authentication) on every first access without any caching validation or 
error recovery.
   
   
   ###### Why this matters
   If driver creation fails partially (e.g., after `_create()` but before 
`set_window_size()`), subsequent calls will return a malformed driver instance, 
causing cascading failures without retry mechanisms.
   
   ###### Suggested change ∙ *Feature Preview*
   Wrap the entire initialization block in try-catch, reset `self._driver = 
None` on any failure, and consider adding a flag to track successful 
initialization completion.
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7659419f-dc90-4e0e-96bd-1f1dd80381b5/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7659419f-dc90-4e0e-96bd-1f1dd80381b5?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7659419f-dc90-4e0e-96bd-1f1dd80381b5?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7659419f-dc90-4e0e-96bd-1f1dd80381b5?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7659419f-dc90-4e0e-96bd-1f1dd80381b5)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:f0206eab-bce1-4e73-a494-a25d2d2372b3 -->
   
   
   [](f0206eab-bce1-4e73-a494-a25d2d2372b3)



##########
superset/utils/webdriver.py:
##########
@@ -243,7 +246,30 @@ def get_screenshot(  # pylint: disable=too-many-locals, 
too-many-statements  # n
 
 
 class WebDriverSelenium(WebDriverProxy):
-    def create(self) -> WebDriver:
+    def __init__(
+        self,
+        driver_type: str,
+        window: WindowSize | None = None,
+        user: User | None = None,
+    ):
+        super().__init__(driver_type, window)
+        self._user = user
+        self._driver = None

Review Comment:
   ### Missing thread safety in lazy driver initialization <sub>![category 
Performance](https://img.shields.io/badge/Performance-4f46e5)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   WebDriverSelenium now stores driver instance as class state with lazy 
initialization, but lacks proper thread safety mechanisms for concurrent access.
   
   
   ###### Why this matters
   Multiple threads could simultaneously check `if not self._driver` and create 
multiple driver instances, leading to resource leaks and unpredictable behavior 
in concurrent environments.
   
   ###### Suggested change ∙ *Feature Preview*
   Add thread synchronization using `threading.Lock()` around the driver 
creation check and initialization in the `driver` property to ensure only one 
thread creates the driver instance.
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c46e220-ca9f-4634-910a-d14548c384cd/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c46e220-ca9f-4634-910a-d14548c384cd?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c46e220-ca9f-4634-910a-d14548c384cd?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c46e220-ca9f-4634-910a-d14548c384cd?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/4c46e220-ca9f-4634-910a-d14548c384cd)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:4a080eae-3c7e-4fd2-b0d0-be5fecaa30bc -->
   
   
   [](4a080eae-3c7e-4fd2-b0d0-be5fecaa30bc)



##########
superset/tasks/cache.py:
##########
@@ -307,25 +239,20 @@ def cache_warmup(
         logger.exception(message)
         return message
 
-    results: dict[str, list[str]] = {"scheduled": [], "errors": []}
-    for task in strategy.get_tasks():
-        username = task["username"]
-        payload = json.dumps(task["payload"])
-        if username:
-            try:
-                user = security_manager.get_user_by_username(username)
-                cookies = MachineAuthProvider.get_auth_cookies(user)
-                headers = {
-                    "Cookie": f"session={cookies.get('session', '')}",
-                    "Content-Type": "application/json",
-                }
-                logger.info("Scheduling %s", payload)
-                fetch_url.delay(payload, headers)
-                results["scheduled"].append(payload)
-            except SchedulingError:
-                logger.exception("Error scheduling fetch_url for payload: %s", 
payload)
-                results["errors"].append(payload)
-        else:
-            logger.warn("Executor not found for %s", payload)
+    results: dict[str, list[str]] = {"success": [], "errors": []}
+
+    user = security_manager.find_user(
+        username=current_app.config["SUPERSET_CACHE_WARMUP_USER"]
+    )
+    wd = WebDriverSelenium(current_app.config["WEBDRIVER_TYPE"], user=user)
+
+    for url in strategy.get_urls():
+        try:
+            logger.info("Fetching %s", url)
+            wd.get_screenshot(url, "grid-container")
+            results["success"].append(url)
+        except URLError:
+            logger.exception("Error warming up cache!")
+            results["errors"].append(url)

Review Comment:
   ### WebDriver resource leak in cache warmup loop <sub>![category 
Performance](https://img.shields.io/badge/Performance-4f46e5)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   WebDriver instance is created once and reused across all URLs without proper 
cleanup or session management.
   
   
   ###### Why this matters
   This can lead to memory leaks, stale browser sessions, and potential crashes 
when processing many URLs. Browser resources accumulate without being released, 
and connection timeouts or browser crashes can affect subsequent URL processing.
   
   ###### Suggested change ∙ *Feature Preview*
   Implement proper WebDriver lifecycle management by either creating a new 
instance per URL or adding explicit cleanup in a try-finally block:
   
   ```python
   for url in strategy.get_urls():
       wd = None
       try:
           wd = WebDriverSelenium(current_app.config["WEBDRIVER_TYPE"], 
user=user)
           logger.info("Fetching %s", url)
           wd.get_screenshot(url, "grid-container")
           results["success"].append(url)
       except URLError:
           logger.exception("Error warming up cache!")
           results["errors"].append(url)
       finally:
           if wd:
               wd.quit()  # or appropriate cleanup method
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a0acf75f-54b8-4755-8de3-20f6618f1625/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a0acf75f-54b8-4755-8de3-20f6618f1625?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a0acf75f-54b8-4755-8de3-20f6618f1625?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a0acf75f-54b8-4755-8de3-20f6618f1625?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a0acf75f-54b8-4755-8de3-20f6618f1625)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:d7c5ece6-04a2-496f-bf93-00339323fb03 -->
   
   
   [](d7c5ece6-04a2-496f-bf93-00339323fb03)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to