compasses opened a new pull request, #15839: URL: https://github.com/apache/doris/pull/15839
# Proposed changes Issue Number: just one part of #11640 ## Problem summary Describe your changes. ## Checklist(Required) 1. Does it affect the original behavior: - [ ] Yes - [ ✓] No - [ ] I don't know 2. Has unit tests been added: - [ ] Yes - [ ] No - [ ✓] No Need 3. Has document been added or modified: - [ ] Yes - [ ✓] No - [ ] No Need 4. Does it need to update dependencies: - [ ] Yes - [ ✓] No 5. Are there any changes that cannot be rolled back: - [ ] Yes (If Yes, please explain WHY) - [ ✓] No ## Further comments This PR is one part of our bulk load implementation, which provide the tool to build the segment file of a tablet in an external way. It's support build local and HDFS, which means you need provide the meta file and the data file like this: ``` ./segment_builder --meta_file=/path/to/hdr/88409.hdr --data_path=/path/to/data/file --format=parquet --is_remote=false ll /path/to/data/file xxx1..gz.parquet xxx2..gz.parquet ... ``` If the file all from the HDFS, the path should be the HDFS path. Currently only support parquet. Since from internal we use the privately-owned HDFS lib, *** so this PR HDFS related code may not work ***. I don't have such open source HDFS environment to test it.  From above picture you can see the final work flow: 1. Read the hdr file from the meta path, do some validation and system initialization. 2. Build the HDFS scanner, and read the parquet file from HDFS directly, and generate the segment file on local disk. 3. At last upload the segment file to HDFS, same path with the hdr file, and all these files will be used by the load segment statement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org