Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -flto=auto -ffat-lto-objects -flto=auto
-ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security
-Wall
uname output: Linux maxwell 5.19.0-50-generic #50-Ubuntu SMP PREEMPT_DYNAMIC
Mon Jul 10 18:24:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu
Bash Version: 5.1
Patch Level: 16
Release Status: release
Description:
The current implementation of the command history takes a long time to
load
a large HISTFILE when history is stifled. This is because there is a
memmove() for every line of HISTFILE once HISTSIZE lines have been loaded.
If N is the number of lines in HISTFILE then the cost of the memmoves ends
up being something like (N - HISTSIZE) * HISTSIZE, peaking when HISTSIZE
is
roughly N/2.
In my case, the history file is over 300,000 lines. If the history is
unstifled, it takes about 0.1 seconds to load. With HISTSIZE around 200,000
it takes over 10 seconds to load, i.e., roughly 100 times as long.
Repeat-By:
Create a text file 'alt-history.txt' with a few hundred thousand lines, say
300k. The content doesn't matter much. For example, you can just cat
several copies of your actual history file to make one large file.
In a clean shell, execute:
HISTFILE=alt-history.txt
HISTSIZE=15
history -r
and then observe how long the last command runs before returning.
For comparison, in another clean shell, execute the same commands, with
unstifled history:
HISTFILE=alt-history.txt
HISTSIZE=-1
history -r
and then observe how long this takes.
If you repeat with larger and larger history files, you will observe the
runtime's quadratic growth.
Fix:
My proposed fix consists of a couple of components, which are included in
the patched history.c that is attached below:
1. Currently, history reallocation is amortized by requesting 50
(DEFAULT_HISTORY_GROW_SIZE) extra slots at once. We can do something
similar with a stifled history: allocate extra slots when HISTSIZE has
been reached. Each time a new line is added, simply free the first
(oldest) entry and increment 'the_history' so the last entry is now
available. Slide the window by one each time and only do a memmove()
when the extra slots are all gone. Even allocating only 50 extra slots
reduces the memmove() work by a factor of 50, which brings the time down
to friendly territory.
2. Vary the number of extra slots allocated as the history gets longer.
The
current value of 50 is adequate when the history is small, but for a
long
history it would be good to increase it so the work (both memmove() and
realloc()) amortizes better. For example, a value of 1024 would only
preallocate 8KB on a 64-bit machine and 4KB on a 32-bit machine. If
memory is a concern for some, this value could be made a configure-time
constant. A better, more universal, approach would be to dynamically
choose the number of extra slots based on the length (N) of the history.
For example, if the extra slots are roughly equal to sqrt(N), with a
minimum of 50, the memmove() work becomes roughly N*log(N) instead of
N^2. This works well for small values, but for absurdly large ones,
too.
I include below a patched history.c that implements both of these. The
array management is all handled by new functions I called advance_history()
and hist_shift_resize(). Each time a reallocation happens, it grows by
roughly 2*sqrt(N).
The tarball also contains a second version of the file, which includes some
error checking code and a realloc() if HISTSIZE is substantially reduced.
Those pieces may be worth consideration, but I feel less strongly about
them. I based these commits on git commit
f3b6bd19457e260b65d11f2712ec3da56cef463f from Github.
In making these patches, I have endeavored to maintain the existing style
and functionality, touching as few functions as possible. Thank you for
your consideration and I welcome any feedback you may have.
Casey Johnson
-- BEGIN base64-encoded bash-history.tar.gz --
H4sIA+08a3fbNpb9Gv0KND3TSLb8kPPoNI7d4yRyol3XzrGdJtlORkOLkMSEIlWS8qNt
9rfvfQAgAFJy0uk+5ix15kxqEri49+K+L4g4utjKZBDGUSK3plFepNnN5uirP/W3Db9HDx7gv73v
Hm7b/8Lvwf2H2ztf9R48fPjdg++2Hzy4/9V27+FO7/5XYvvPRaP+t8iLIBPiqw/pNMnTZOm4297/
i/621oTZdbGxIYAbSRjEaSL1cxFHF1kA/65ttVow/Fk6v8miybQQ7Wcd0fv+r99v7Gzv9MRhJqU4
S8fFVZBJcZguAE4RpUlXDJLRZqslhDgHkGIcxVKM0qQIoiQXxVSKF8evxUu12JFarK0edLoiELks
RDpGCFm6KEBSAUqaiVmQBJMomRCQQl7jIDHP5GWULvL4Rh