morningman opened a new pull request, #18074: URL: https://github.com/apache/doris/pull/18074
# Proposed changes Issue Number: close #xxx ## Problem summary Problem: 1. FE will split the parquet file into split. So a file can have several splits. 2. BE will scan each split, read the footer of the parquet file. 3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice. This PR mainly changes: 1. Use kv cache to cache the footer of parquet file. 2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache. 3. In cache, the key is "meta_file_path", the value is parsed thrift footer. In my test, a query with 26 splits can reduce the footer parse time from 3s -> 1s ## Checklist(Required) * [ ] Does it affect the original behavior * [ ] Has unit tests been added * [ ] Has document been added or modified * [ ] Does it need to update dependencies * [ ] Is this PR support rollback (If NO, please explain WHY) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org