Description:
The time complexities of ${v/...} and ${v^...} are expected to be
linear with respect to the length of the variable contents.
However, the time complexity of ${v/...} is quadratic if there is no
replacement when the length is larger than about 10000. Also, the
time complexity of ${v^} becomes quadratic if the variable contains
many newlines.
They are caused by the implementations that call `strlen' for the
variable contents inside loops. `strlen' itself has the linear
complexity, so the `strlen' inside the loops results in the
quadratic complexity.
Repeat-By:
I attach a sample script: `test.sh'. In this script, variable
content is constructed as
A=({00000..99999})
IFS=$'\n' eval 'v="${A[*]}"'
and then the time is measured as, e.g.,
time a=${v^}
By changing the size of array A, the time complexity may be checked.
I also attach plots, `r0026{a,b,c}.png', based on measurements on my
Linux host. `r0026{a,b,c}' are for ${v^}, ${v@U} and ${v//A},
respectively. The horizontal and vertical axes are the length of
the variable content and the time, respectively. The red points
show the results for the devel branch. The dashed lines are fitted
to the points in the range [20000, 500000] to extract the power. The
extracted power is written in the legend. These results clearly
show the quadratic complexity of these parameter expansions.
Fix:
I attach patches:
* r0026-ifdef-DEBUG.patch: This is not the fix for the above
problem, but a fix for non-DEBUG build of the devel branch.
`itrace' (which are introduced for the new treatment of array
subscripts and the new proc_comsub) are used outside `#ifdef
DEBUG'. Maybe these `itrace's are supposed to be removed before
the next Bash is released, but it is annoying to every time remove
them when we want to test the performance of the non-DEBUG build
of the devel Bash. In this patch, we enclose the new `itrace's in
`#ifdef DEBUG'
* r0026-fix-superlinear-sh_modcase.patch: The function `cval'
(lib/sh/casemod.c) is called from `sh_modcase' (lib/sh/casemod.c)
for each character in the string. However, the function `cval'
calls `strlen' by itself to determine the length of the string.
This causes quadratic complexity. In this patch, we pass the
length of the string from `sh_modcase' to `cval'.
* r0026-fix-superlinear-pat_subst.patch: At the end of `pat_subst'
(subst.c), before appending the remaining content of the variable,
the string buffer for the result is extended using the macro
`RESIZE_MALLOCED_BUFFER'. In RESIZE_MALLOCED_BUFFER, the new
buffer size is calculated by a while loop in which the macro
parameter for the previous buffer size is expanded. `pat_subst'
specifies the expression `STRLEN(str) + 1' to the macro parameter
of the previous buffer size, which is directly expanded in the
loop and repeatedly evaluated in the loop. As a result, the macro
`RESIZE_MALLOCED_BUFFER' becomes the quadratic complexity. In
this patch, we specify the pre-calculated value of `strlen' to the
macro `RESIZE_MALLOCED_BUFFER'. Also, the buffer size
determination of `RESIZE_MALLOCED_BUFFER' can be implemented
without using a loop.
The blue points and lines in `r0026{a,b,c}.png' show the time
complexity after the patches. They are now linear as expected.
--
Koichi
#!/bin/bash
function test1_1 {
local sep=$1 mod=$3
local max; printf -v max %06d $(($2-1))
eval "A=({000000..$max})"
IFS=$sep eval 'u="${A[*]}"'; time eval "a=\${u$mod}"
}
function test1 {
local mod=$1
test1_1 $'\n' 10000 "$mod"
test1_1 $'\n' 20000 "$mod"
test1_1 $'\n' 50000 "$mod"
test1_1 $'\n' 100000 "$mod"
test1_1 $'\n' 200000 "$mod"
test1_1 $'\n' 500000 "$mod"
}
TIMEFORMAT='%R %S %U'
test1 '^'
test1 '@U'
test1 '//A'
r0026-ifdef-DEBUG.patch
Description: Binary data
r0026-fix-superlinear-sh_modcase.patch
Description: Binary data
r0026-fix-superlinear-pat_subst.patch
Description: Binary data
