On 09/25/2016 05:39 PM, Chet Ramey wrote: >> Description: >> If the history file (`.bash_history`) starts with a timestamp >> (`HIST_TIMESTAMP_START`), and contains lines that have been written >> without timestamps, then reading the history file is (a) very slow >> because (b) consecutive lines without timestamps are merged into a >> single history entry with quadratic complexity. >>
> > The appending behavior isn't really quadratic: the code simply keeps > reallocating the line and appending to it. You can imagine how long it > takes to append a million commands to a single buffer. You've managed > to identify the most degenerate case. That depends on how you do reallocation. Let's try a simple example: lets say you are processing a list of 128 items. If you allocate 1 additional slot on each iteration through the loop, you are performing 128 allocations [O(n)], but you are ALSO performing 128*127/2 moves [8128, O(n^2)] as you copy the previous iteration's contents into the newly-allocated larger array. THAT is where the quadratic behavior comes from - using a constant-size allocation pattern requires a quadratic-size copying. But if instead switch things to do geometric allocation, you amortize the costs. On the first iteration, you allocate 1 row. On the second iteration, you allocate 2 additional rows (twice the allocation size as last time), giving you the third iteration free. On the fourth iteration, you allocate 4 additional rows (again, double the allocation size), giving you three more iterations free. Overall, you perform only 8 allocations [O(log n)], and only 127 moves [O(n)]. As the array sizes get larger, the effects get more pronounced. Appending a million commands to a buffer absolutely needs geometric allocation size growth when you reallocate the array, rather than a constant growth factor, or you quickly become bogged down in the processing time required to copy the contents on each iteration that has to reallocate. ANY good allocation algorithm, where the size of the list is not known a priori, will use a geometric allocation growth mechanism (doubling the allocation size is easiest, but a 3/2 growth ratio also works; the point is that each growth is larger than the last, so that you get progressively more iterations where you don't have to reallocate and copy). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature