wombatu-kun opened a new pull request, #16788:
URL: https://github.com/apache/iceberg/pull/16788

   ADLSLocation is constructed once per FileIO operation (newInputFile, 
newOutputFile, deleteFile, listPrefix, deletePrefix all call new 
ADLSLocation(path)), so its constructor is on the metadata path of every scan 
and commit.
   
   The constructor split the authority with String.split("@", -1) and then 
extracted the storage account with host.split("\\.", -1)[0]. The second split 
is the wasteful part: to read the account it materializes every dotted segment 
of the host (account.dfs.core.windows.net produces 5 strings plus the backing 
array) and then discards all but the first. This change parses the same fields 
with indexOf/substring, dropping the intermediate ArrayList, array, and unused 
substrings. The regex match is kept for scheme validation, so all parsing and 
error behavior is unchanged.
   
   Behavior is identical for every valid abfs/abfss/wasb/wasbs URI (zero or one 
@, host with or without dots); only pathological multi-@ authorities, which are 
already invalid, would differ. The storage-account extraction now guards the 
no-dot case (indexOf returns -1) explicitly, which a new test pins.
   
   Local JMH results (AverageTime, 5x1s warmup, 5x1s measurement, 1 fork, gc 
profiler; gc.alloc.rate.norm is deterministic):
   
   | URI shape | ns/op before | ns/op after | B/op before | B/op after |
   | --- | --- | --- | --- | --- |
   | [email protected]/... | 441 | 254 | 1072 | 656 |
   | account.dfs.core.windows.net/... | 381 | 240 | 864 | 504 |
   
   That is roughly -40% allocation and about 1.6x faster per parse. The 
benchmark was run locally and is not included (the azure module has no JMH 
source set).
   
   Tests: TestADLSLocation already covers the abfs/abfss/wasb/wasbs, 
with/without container, no-path and host-extraction cases; added 
testHostWithoutDot to pin the no-dot storage-account branch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to