Re: bash history with mixed timestamps slow and broken
On 09/25/2016 05:39 PM, Chet Ramey wrote: >> Description: >> If the history file (`.bash_history`) starts with a timestamp >> (`HIST_TIMESTAMP_START`), and contains lines that have been written >> without timestamps, then reading the history file is (a) very slow >> because (b) consecutive lines without timestamps are merged into a >> single history entry with quadratic complexity. >> > > The appending behavior isn't really quadratic: the code simply keeps > reallocating the line and appending to it. You can imagine how long it > takes to append a million commands to a single buffer. You've managed > to identify the most degenerate case. That depends on how you do reallocation. Let's try a simple example: lets say you are processing a list of 128 items. If you allocate 1 additional slot on each iteration through the loop, you are performing 128 allocations [O(n)], but you are ALSO performing 128*127/2 moves [8128, O(n^2)] as you copy the previous iteration's contents into the newly-allocated larger array. THAT is where the quadratic behavior comes from - using a constant-size allocation pattern requires a quadratic-size copying. But if instead switch things to do geometric allocation, you amortize the costs. On the first iteration, you allocate 1 row. On the second iteration, you allocate 2 additional rows (twice the allocation size as last time), giving you the third iteration free. On the fourth iteration, you allocate 4 additional rows (again, double the allocation size), giving you three more iterations free. Overall, you perform only 8 allocations [O(log n)], and only 127 moves [O(n)]. As the array sizes get larger, the effects get more pronounced. Appending a million commands to a buffer absolutely needs geometric allocation size growth when you reallocate the array, rather than a constant growth factor, or you quickly become bogged down in the processing time required to copy the contents on each iteration that has to reallocate. ANY good allocation algorithm, where the size of the list is not known a priori, will use a geometric allocation growth mechanism (doubling the allocation size is easiest, but a 3/2 growth ratio also works; the point is that each growth is larger than the last, so that you get progressively more iterations where you don't have to reallocate and copy). -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: Magnitude of Order "For Loop" performance deltas based on syntax change
Would an array of pointers to structs of key-value pairs be better here? It should be faster in the common cases even though it may mean some wasted space and reallocs depending on how you decide to grow the array. A linear search through an array for an index should be faster than linked-list traversal. https://youtu.be/YQs6IC-vgmo (why every std::vector implementation uses arrays, really it's true of analogues in most non-functional langs). Also bash itself makes it hard to use sparse arrays efficiently regardless of the implementation. In the case of lists, one usually wants to address elements by ordinal position, but both the index in `arr[index]` and the offset in `${arr[@]:offset:length}` don't allow it, which means random insertion requires a linear search despite being linked-lists. That also makes the "length" inconsistent with everything else that looks at the value of the index, though at least length does what I really wish offset did. On top of that, ${!arr[@]:offset:len} doesn't even work. None of the parameter expansions work with ${!arr[@]}, so to calculate an ordinal index, a second array containing the indexes of the first is needed. Plus, pre-existing items are overwritten on assignment, so insertion means having to save and restore all the indexes and values at least in a region being inserted to, which means more index calculation.
Re: Magnitude of Order "For Loop" performance deltas based on syntax change
On 9/26/16 11:47 AM, Dan Douglas wrote: > Would an array of pointers to structs of key-value pairs be better > here? It should be faster in the common cases even though it may mean > some wasted space and reallocs depending on how you decide to grow the > array. A linear search through an array for an index should be faster > than linked-list traversal. https://youtu.be/YQs6IC-vgmo (why every > std::vector implementation uses arrays, really it's true of analogues > in most non-functional langs). I made a change that makes for-loop 2 nearly twice as fast as for-loop 1, mostly because it avoids storing and retrieving variables. > > Also bash itself makes it hard to use sparse arrays efficiently regardless > of the implementation. In the case of lists, one usually wants to address > elements by ordinal position, but both the index in `arr[index]` and the > offset in `${arr[@]:offset:length}` don't allow it, which means random > insertion requires a linear search despite being linked-lists. That also > makes the "length" inconsistent with everything else that looks at the > value of the index, though at least length does what I really wish > offset did. So you want offset N to be the nth element in the array instead of the element with index N? Huh. Well, you probably want another data structure. In any event, random insertion can be optimized in the same way that random access is. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: Magnitude of Order "For Loop" performance deltas based on syntax change
On Mon, Sep 26, 2016 at 3:32 PM, Chet Ramey wrote: > So you want offset N to be the nth element in the array instead of the > element with index N? Huh. Maybe, not always. Both would be nice. The offset isn't the element with the index N. It's the next set element whose index is >= that of the selected offset. There's no simple way of knowing how many set elements come before or after offset - possibly zero. In order to insert something after the first element you have to find the index of the first element. > Well, you probably want another data structure. Yes please. Everybody wants more data structures. (I know... patches accepted.)