justinpark opened a new pull request, #32608:
URL: https://github.com/apache/superset/pull/32608
### SUMMARY
Currently, the `recent_activity` query for the log table groups by the
dashboard ID and slice ID to extract a distinct list from the entire log table,
which leads to performance issues.
(In the case of Airbnb, more than 1 million logs are generated each day, and
grouping by dashboard and slice ID, even with indexing, significantly impacts
database performance as shown in the following log)
```
| 3793593 | superset | 100.117.121.20:41904 | superset_production |
Query | 1726 s| Sending data
| SELECT anon_1.dashboard_id AS anon_1_dashboard_id, anon_1.slice_id AS
anon_1_slice_id, anon_1.action AS anon_1_action, anon_1.dttm AS anon_1_dttm,
dashboards.slug AS dashboard_slug, dashboards.dashboard_title AS
dashboards_dashboard_title, slices.slice_name AS slices_slice_name
FROM (SELECT logs.dashboard_id AS dashboard_id, logs.slice_id AS slice_id,
logs.action AS action, max(logs.dttm) AS dttm
FROM logs
WHERE logs.action IN ('explore', 'dashboard') AND logs.user_id = 13295 AND
logs.dttm > '2024-03-06 18:00:12.201653' AND (logs.dashboard_id IS NOT NULL OR
logs.slice_id IS NOT NULL) GROUP BY logs.dashboard_id, logs.slice_id,
logs.action) AS anon_1 LEFT OUTER JOIN dashboards ON dashboards.id =
anon_1.dashboard_id LEFT OUTER JOIN slices ON slices.id = anon_1.slice_id
WHERE dashboards.dashboard_title != '' OR slices.slice_name != '' ORDER BY
anon_1.dttm DESC
LIMIT 0, 6 |
```
To resolve this issue, it would be appropriate to create a materialized
activity statistics view through a daily/hourly batch job. However,
functionally, the main purpose of recent activity is to display only a few of
the most recently visited items. Therefore, we improved performance by changing
the approach to fetch the latest (including duplicates) log entries (by
`distinct: false`) and extract a distinct list using LRU cache in the frontend
side.
### TESTING INSTRUCTIONS
specs
### ADDITIONAL INFORMATION
<!--- Check any relevant boxes with "x" -->
<!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in
[SIP-59](https://github.com/apache/superset/issues/13351))
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]