Re: [D] Virtual dataset "sync columns" requires too much manual Intervention [superset]

via GitHub Thu, 03 Apr 2025 09:40:12 -0700


GitHub user u35253 added a comment to the discussion: Virtual dataset "sync 
columns" requires too much manual Intervention


# Empathy, agreement

I primarily want to say I understand and empathize with both issues, from 
experience.  1) The rigamarole of double-entering dataset edit sessions to save 
a new query definition and then sync columns, and 2) temporarily replacing with 
a fake query to sync columns, then to just change back to the real query and 
save that after syncing.

I understand the logical design as-is; it is simple, straightforward, 
consistent, and makes sense.

# Higher level topics

IMO, the issues themselves point to two slightly greater issues, (a) one in 
dash design/maintenance where some greater complexity is called for and (b) one 
where Superset could potentially fill a gap by taking on that greater 
complexity itself for this usage pattern.

## Possible alternative to temporary/fake queries

An alternative solution to the fake query is as follows: expand the query to 
return some data, always, even if that means using Jinja to detect that the 
query is executing outside of a dashboard, such as in Dataset Editor or SQL 
Editor (e.g., try to detect filters, and if they are absent, conditionally run 
other logic that will return the 'fake data'--whether that is read from a table 
or hardcoded, etc.).  This avoids manually replacing the query during column 
sync, by building the non-dashboard execution code into the same query itself.  
But such a query expansion requires additional consideration, and often times 
for high-speed or even regular changes, it is in fact more expedient to just 
put in a fake query temporarily.

## On the Superset fast-column-sync-proposal feature concept

Indeed, (as @gidinetapp mentioned in the DataGrip example, I believe) Superset 
could automatically execute a `select * from <dataset> limit 1;` on the "just 
saved" dataset to be aware of the "new" column names and data types, to propose 
a fast-column-sync immediately after completing the dataset SQL Save.  This 
modified/proposed (perhaps optional, but enabled-by-default?) flow would detect 
column changes and allow fast-saving an auto-poropsed column sync if the user 
accepts, thus preventing a need to re-enter the Edit Dataset dialog.

- If the SQL Edit did not change the column names or types, there is no need to 
auto-propose a sync.
- If a SQL Edit does change the column names or types, it would be nice to 
auto-detect (and have a low-click, fast way to make it take effect).

There could be a downside; we may not want to run the `select * limit 1` for 
whatever reason on each dataset edit, e.g. if the database is not performant; 
or, we may know our charts are fragile to column syncs in general, and we 
should not be running Column Syncs outside of some other conrol process.  A 
hypothetical DETECT_AND_PROPOSE_COLUMN_SYNCS_ON_DATASET_SAVE setting could be a 
disabled through a feature flag.

GitHub link: 
https://github.com/apache/superset/discussions/31573#discussioncomment-12716353

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] Virtual dataset "sync columns" requires too much manual Intervention [superset]

Reply via email to