AnzhiZhang commented on issue #3683: URL: https://github.com/apache/texera/issues/3683#issuecomment-3231875282
Based on our discussion today, here is the summary. ### User Perspective Design - User can create a dataset with a name that has been used by another user. - User cannot create a dataset if the name has been used by themself. - Dataset file URI (e.g., `/texera/tweets-500/v3/500.csv`) can be used in UDF, each sections are user email, dataset name, version, and file name. ### Changes Database - Introduce a new column `repository_name` in the `dataset` table, to map LakeFS repository name. - Update relevant DDLs. Backend - Use the repository name to create and read a dataset. - Implement dataset renaming (PR 2) Frontend - Update the error message if user creates a dataset with a duplicate name. ### PR Plan 1. Separate dataset name and LakeFS repository name, to allow same dataset name. 2. Allow user to change their dataset name. 3. (Maybe) Add documentation about how to use the dataset file URI, also improve the frontend. 4. (Maybe) Add validation of the dataset name before submission, for a better user experience. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
