mistercrunch commented on PR #23454:
URL: https://github.com/apache/superset/pull/23454#issuecomment-3212866369
Found this PR researching why things that should be cached seemed to be
cached ... Claude's analysis:
# Cache Architecture Analysis: PR #23454 Side Effects
## Summary
PR #23454 introduced `cachedSupersetGet` for drill-by performance
optimization. While this provided session-scoped performance benefits, it
inadvertently created a cache invalidation architecture that causes data
staleness issues in multi-user scenarios and cross-component data mutations.
## Original Implementation Context
**Commit**: `9fbfd1c1d883f983ef96b8812297721e2a1a9695`
**Author**: Kamil Gabryjelski
**Date**: March 29, 2023
**PR**: #23454 - "feat: Implement context menu for drill by"
### Cache Implementation Details
**Location**: `superset-frontend/src/utils/cachedSupersetGet.ts`
```typescript
export const supersetGetCache = new Map<string, any>();
export const cachedSupersetGet = cacheWrapper(
SupersetClient.get,
supersetGetCache,
({ endpoint }) => endpoint || '',
);
```
**Design Characteristics**:
- **Client-side JavaScript Map** - Lives in browser memory
- **Endpoint-based keying** - Cache key = API endpoint URL
- **Session-scoped persistence** - Survives React Router navigation
- **Manual-only invalidation** - No automatic expiration or refresh
- **No TTL mechanism** - Data cached indefinitely until page refresh
## Identified Issues
### 1. **Cache Invalidation Gap**
**Problem**: Dataset metadata changes (like `drill_through_chart_id`) don't
invalidate related cached endpoints.
**Affected Flow**:
1. User edits dataset in Dataset Editor → Saves `drill_through_chart_id`
2. User navigates to dashboard (React Router navigation - cache persists)
3. User clicks drill-through → Uses stale cached
`/api/v1/dataset/{id}/drill_info/` response
4. **Result**: Old drill-through behavior despite configuration change
**Current Invalidation**: Only happens on API errors
(`supersetGetCache.delete(endpoint)`)
### 2. **Multi-User Data Consistency**
**Problem**: Client-side cache has no awareness of server-side data changes
by other users.
**Scenario**:
- User A caches dataset drill info
- User B updates dataset drill configuration
- User A continues seeing stale data until page refresh
### 3. **Cross-Component State Synchronization**
**Problem**: Components that modify data don't coordinate with components
that display cached versions of that data.
**Current Architecture**:
```
DatasourceModal (saves data) ←→ NO COORDINATION ←→ DrillDetailPane (uses
cached data)
```
**Missing**: Cache invalidation bridge between data mutations and data
consumers.
## Why This Wasn't a Problem Initially
### Historical Context
**Original drill-by scope**: The cache was likely designed for **read-only
drill operations** within a single session:
1. **Load dashboard** → Cache dataset drill info
2. **Perform drill-by operations** → Use cached data (performance benefit)
3. **Session ends** → Cache naturally cleared
**No data mutation**: Original drill-by didn't involve **editing** dataset
metadata during the session.
### SPA Navigation Evolution
**Pre-SPA behavior**: Page refreshes between dataset editing and dashboard
viewing would naturally clear caches.
**Current SPA behavior**: React Router navigation preserves JavaScript
state, exposing the cache staleness issue.
## Current Cache Usage Analysis
**Cache is used in 4 locations**:
1. `hooks/apiResources/datasets.ts` - Dataset drill info loading
2. `nativeFilters/.../DatasetSelect.tsx` - Dataset selection for filters
3. `nativeFilters/.../ColumnSelect.tsx` - Column selection for filters
4. `nativeFilters/.../FiltersConfigForm.tsx` - Filter configuration
**Cache invalidation**: Only on errors, no systematic invalidation strategy.
## Recommended Solutions
### Short-term (Current PR)
Add cache invalidation to `DatasourceModal.onConfirmSave()`:
```typescript
import { supersetGetCache } from 'src/utils/cachedSupersetGet';
// After successful dataset save:
const drillInfoEndpoint =
`/api/v1/dataset/${currentDatasource.id}/drill_info/`;
supersetGetCache.delete(drillInfoEndpoint);
```
### Medium-term (Architecture Fix)
Replace custom cache with proper cache management:
- **React Query** with automatic invalidation
- **Server-side ETags** for cache validation
- **Cache tags system** for related data invalidation
### Long-term (Systematic Solution)
Implement cache invalidation coordination:
- **Event-driven invalidation** when datasets are modified
- **Cache versioning** with server-side coordination
- **Optimistic updates** with rollback on conflicts
## Impact Assessment
**Performance Impact**: The cache provides meaningful performance benefits
for drill operations by avoiding repeated API calls.
**Data Consistency Impact**: Cache staleness creates confusing UX where
configuration changes don't take effect until hard refresh.
**Trade-off**: Session performance vs. data freshness - current
implementation prioritizes performance over consistency.
## Conclusion
PR #23454's caching implementation represents a **performance optimization
that didn't account for data mutation scenarios**. The cache works well for
read-only operations but breaks down when dataset metadata becomes editable
during the same session.
This is a classic example of how **performance optimizations can introduce
subtle data consistency bugs** that only manifest in specific user workflows
involving cross-component data mutations.
The issue is **architectural rather than implementation-specific** - any
feature that modifies cached data will face similar staleness problems until
systematic cache invalidation is implemented.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]