Re: incremental history i/o? (was Re: A Feature Request for History)

Marcel (Felix) Giannelia Fri, 17 Jun 2011 14:58:18 -0700

I wonder how much de-duping the really old history would help. It seemsthat HISTCONTROL='erasedups' only affects the history of the currentbash process (i.e. commands that were typed since you started thatshell), and it leaves all the stuff it loaded from .bash_history alone.

As a quick test, removing duplicates from a 4 MB history file reducedthe number of commands in it from 125236 to 36937, so that file wasabout 70% duplicated data (not quite, 'cause the longer and moreinteresting commands mostly stayed...). Doing that to your 11 MB filemight get rid of that loading delay.

Of course, de-duplicating the history destroys its role of "accuratelyrecord everything I've done", so if you also use your history for thatit's not a good idea. For that latter use though, I can't think of agood reason for loading it on shell start, so maybe those roles shouldbe split -- .bash_log and .bash_commands? The log is write-only, neverclobbered, and has the equivalent of a HISTTIMEFORMAT set; the commandsfile is an efficiently stored hash table of unique commands, maybe withtweakable parameters for how "interesting" a command has to be to go init (store "mount -o loop,ro,uid=1000 -t vfat /some/file /mnt/temp" butignore "cd ~" 'cause you really don't need Ctrl+R to remember the latter).


~Felix.


On 16/06/11 12:55, Bradley M. Kuhn wrote:

I agree with Marcel's points about keeping a big bash history, although
I wasn't sure if discussing "why" users keep a big bash history was on
topic or not.

Marcel (Felix) Giannelia wrote at 13:16 (EDT) on Tuesday:

A .bash_history file going back years and years is still only a few
megs,

Actually, this relates to a thing I'd been looking into recently.  My
bash history is 11MB now, and on some machines I have a noticeable load
time as it reads the history.  I'd thought about adding support for
incremental read to bash history/readline code.  Basically, it would
load only the parts of the history it needed based on the history
requested.  Obviously running "history" would read it all, but if
reverse-search was requested, it could perhaps be read incrementally
somehow.

Given that this would be a big change (esp. to make it seamless to
existing readline API users), and would provide a feature clearly that
isn't universally desired (ability to have really big history files),
I'm asking, albeit with some trepidation, if such a rewrite of the
history reading/writing code would likely be accepted, and if so what it
would need to look like to be an acceptable patch.

I noticed someone previously attempted to implement mmap() in the
history code, but it's #ifdef'd out (IIRC from my investigations a few
weeks ago).  I theorized that it was #ifdef'd out because implementing
mmap() didn't help anything, since the history reading code immediately
goes through the whole array of history anyway, so the file will be
immediately read in to RAM anyway the way the code currently operates,
even if you mmap() it.  In other words, just slapping mmap() in place
wouldn't work (in fact, it's seem to have been tried and abandoned);
more in-depth changes would be made.

Thoughts on this idea?

Re: incremental history i/o? (was Re: A Feature Request for History)

Reply via email to