wombatu-kun opened a new pull request, #16788:
URL: https://github.com/apache/iceberg/pull/16788
ADLSLocation is constructed once per FileIO operation (newInputFile,
newOutputFile, deleteFile, listPrefix, deletePrefix all call new
ADLSLocation(path)), so its constructor is on the metadata path of every scan
and commit.
The constructor split the authority with String.split("@", -1) and then
extracted the storage account with host.split("\\.", -1)[0]. The second split
is the wasteful part: to read the account it materializes every dotted segment
of the host (account.dfs.core.windows.net produces 5 strings plus the backing
array) and then discards all but the first. This change parses the same fields
with indexOf/substring, dropping the intermediate ArrayList, array, and unused
substrings. The regex match is kept for scheme validation, so all parsing and
error behavior is unchanged.
Behavior is identical for every valid abfs/abfss/wasb/wasbs URI (zero or one
@, host with or without dots); only pathological multi-@ authorities, which are
already invalid, would differ. The storage-account extraction now guards the
no-dot case (indexOf returns -1) explicitly, which a new test pins.
Local JMH results (AverageTime, 5x1s warmup, 5x1s measurement, 1 fork, gc
profiler; gc.alloc.rate.norm is deterministic):
| URI shape | ns/op before | ns/op after | B/op before | B/op after |
| --- | --- | --- | --- | --- |
| [email protected]/... | 441 | 254 | 1072 | 656 |
| account.dfs.core.windows.net/... | 381 | 240 | 864 | 504 |
That is roughly -40% allocation and about 1.6x faster per parse. The
benchmark was run locally and is not included (the azure module has no JMH
source set).
Tests: TestADLSLocation already covers the abfs/abfss/wasb/wasbs,
with/without container, no-path and host-extraction cases; added
testHostWithoutDot to pin the no-dot storage-account branch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]