mistercrunch commented on PR #23454:
URL: https://github.com/apache/superset/pull/23454#issuecomment-3212866369

   Found this PR researching why things that should be cached seemed to be 
cached ... Claude's analysis:
   
   # Cache Architecture Analysis: PR #23454 Side Effects
   
   ## Summary
   
   PR #23454 introduced `cachedSupersetGet` for drill-by performance 
optimization. While this provided session-scoped performance benefits, it 
inadvertently created a cache invalidation architecture that causes data 
staleness issues in multi-user scenarios and cross-component data mutations.
   
   ## Original Implementation Context
   
   **Commit**: `9fbfd1c1d883f983ef96b8812297721e2a1a9695`  
   **Author**: Kamil Gabryjelski  
   **Date**: March 29, 2023  
   **PR**: #23454 - "feat: Implement context menu for drill by"
   
   ### Cache Implementation Details
   
   **Location**: `superset-frontend/src/utils/cachedSupersetGet.ts`
   
   ```typescript
   export const supersetGetCache = new Map<string, any>();
   export const cachedSupersetGet = cacheWrapper(
     SupersetClient.get,
     supersetGetCache,
     ({ endpoint }) => endpoint || '',
   );
   ```
   
   **Design Characteristics**:
   - **Client-side JavaScript Map** - Lives in browser memory
   - **Endpoint-based keying** - Cache key = API endpoint URL
   - **Session-scoped persistence** - Survives React Router navigation
   - **Manual-only invalidation** - No automatic expiration or refresh
   - **No TTL mechanism** - Data cached indefinitely until page refresh
   
   ## Identified Issues
   
   ### 1. **Cache Invalidation Gap**
   
   **Problem**: Dataset metadata changes (like `drill_through_chart_id`) don't 
invalidate related cached endpoints.
   
   **Affected Flow**:
   1. User edits dataset in Dataset Editor → Saves `drill_through_chart_id`
   2. User navigates to dashboard (React Router navigation - cache persists)
   3. User clicks drill-through → Uses stale cached 
`/api/v1/dataset/{id}/drill_info/` response
   4. **Result**: Old drill-through behavior despite configuration change
   
   **Current Invalidation**: Only happens on API errors 
(`supersetGetCache.delete(endpoint)`)
   
   ### 2. **Multi-User Data Consistency**
   
   **Problem**: Client-side cache has no awareness of server-side data changes 
by other users.
   
   **Scenario**:
   - User A caches dataset drill info
   - User B updates dataset drill configuration  
   - User A continues seeing stale data until page refresh
   
   ### 3. **Cross-Component State Synchronization**
   
   **Problem**: Components that modify data don't coordinate with components 
that display cached versions of that data.
   
   **Current Architecture**:
   ```
   DatasourceModal (saves data) ←→ NO COORDINATION ←→ DrillDetailPane (uses 
cached data)
   ```
   
   **Missing**: Cache invalidation bridge between data mutations and data 
consumers.
   
   ## Why This Wasn't a Problem Initially
   
   ### Historical Context
   
   **Original drill-by scope**: The cache was likely designed for **read-only 
drill operations** within a single session:
   
   1. **Load dashboard** → Cache dataset drill info
   2. **Perform drill-by operations** → Use cached data (performance benefit)
   3. **Session ends** → Cache naturally cleared
   
   **No data mutation**: Original drill-by didn't involve **editing** dataset 
metadata during the session.
   
   ### SPA Navigation Evolution
   
   **Pre-SPA behavior**: Page refreshes between dataset editing and dashboard 
viewing would naturally clear caches.
   
   **Current SPA behavior**: React Router navigation preserves JavaScript 
state, exposing the cache staleness issue.
   
   ## Current Cache Usage Analysis
   
   **Cache is used in 4 locations**:
   1. `hooks/apiResources/datasets.ts` - Dataset drill info loading
   2. `nativeFilters/.../DatasetSelect.tsx` - Dataset selection for filters  
   3. `nativeFilters/.../ColumnSelect.tsx` - Column selection for filters
   4. `nativeFilters/.../FiltersConfigForm.tsx` - Filter configuration
   
   **Cache invalidation**: Only on errors, no systematic invalidation strategy.
   
   ## Recommended Solutions
   
   ### Short-term (Current PR)
   Add cache invalidation to `DatasourceModal.onConfirmSave()`:
   ```typescript
   import { supersetGetCache } from 'src/utils/cachedSupersetGet';
   
   // After successful dataset save:
   const drillInfoEndpoint = 
`/api/v1/dataset/${currentDatasource.id}/drill_info/`;
   supersetGetCache.delete(drillInfoEndpoint);
   ```
   
   ### Medium-term (Architecture Fix)
   Replace custom cache with proper cache management:
   - **React Query** with automatic invalidation
   - **Server-side ETags** for cache validation  
   - **Cache tags system** for related data invalidation
   
   ### Long-term (Systematic Solution)
   Implement cache invalidation coordination:
   - **Event-driven invalidation** when datasets are modified
   - **Cache versioning** with server-side coordination
   - **Optimistic updates** with rollback on conflicts
   
   ## Impact Assessment
   
   **Performance Impact**: The cache provides meaningful performance benefits 
for drill operations by avoiding repeated API calls.
   
   **Data Consistency Impact**: Cache staleness creates confusing UX where 
configuration changes don't take effect until hard refresh.
   
   **Trade-off**: Session performance vs. data freshness - current 
implementation prioritizes performance over consistency.
   
   ## Conclusion
   
   PR #23454's caching implementation represents a **performance optimization 
that didn't account for data mutation scenarios**. The cache works well for 
read-only operations but breaks down when dataset metadata becomes editable 
during the same session.
   
   This is a classic example of how **performance optimizations can introduce 
subtle data consistency bugs** that only manifest in specific user workflows 
involving cross-component data mutations.
   
   The issue is **architectural rather than implementation-specific** - any 
feature that modifies cached data will face similar staleness problems until 
systematic cache invalidation is implemented.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to