AnzhiZhang commented on issue #3683:
URL: https://github.com/apache/texera/issues/3683#issuecomment-3231875282

   Based on our discussion today, here is the summary.
   
   ### User Perspective Design
   
   - User can create a dataset with a name that has been used by another user.
   - User cannot create a dataset if the name has been used by themself.
   - Dataset file URI (e.g., `/texera/tweets-500/v3/500.csv`) can be used in 
UDF, each sections are user email, dataset name, version, and file name.
   
   ### Changes
   
   Database
   
   - Introduce a new column `repository_name` in the `dataset` table, to map 
LakeFS repository name.
   - Update relevant DDLs.
   
   Backend
   
   - Use the repository name to create and read a dataset.
   - Implement dataset renaming (PR 2)
   
   Frontend
   
   - Update the error message if user creates a dataset with a duplicate name.
   
   ### PR Plan
   
   1. Separate dataset name and LakeFS repository name, to allow same dataset 
name.
   2. Allow user to change their dataset name.
   3. (Maybe) Add documentation about how to use the dataset file URI, also 
improve the frontend.
   4. (Maybe) Add validation of the dataset name before submission, for a 
better user experience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to