https://bugs.kde.org/show_bug.cgi?id=406395

Mariusz Glebocki <m...@arccos-1.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |m...@arccos-1.net
             Status|REPORTED                    |CONFIRMED
     Ever confirmed|0                           |1

--- Comment #1 from Mariusz Glebocki <m...@arccos-1.net> ---
Konsole v18.12.03, Ubuntu 18.04
Test environment: default config (ran with empty $HOME) + infinite scrollback +
store history in /tmp.


Before:
   0  tmp/konsole-J14446.history
   0  tmp/konsole-M14446.history
   0  tmp/konsole-T14446.history

Running: base64 -w 511 /dev/urandom | head -n $((1024 * 1024))

After:
 48M  tmp/konsole-L14446.history
8.0G  tmp/konsole-S14446.history
6.0M  tmp/konsole-n14446.history

The command above prints "512MB" of characters (assuming 1 character = 1B).
Character struct has 16B, so it is consistent with Konsole code.
For reference: This is slightly above 6.7 million lines with 80 characters
each.


Lets try compressing history file (i.e. single-format random alphanumeric
characters).

Algorithm: LZ4, 4MB block, fast compression (1)
Result: 8GB reduced to 1.5G => 3B/character.

Characters with single format have most of the structure repeated.
Additionally, most people do not read half GB of random characters (I hope so).


More realistic input:

Running: find src tests tools \( -name '*.cpp' -or -name '*.h' -or -name '*.py'
\) -exec pygmentize  {} \;

  24M /tmp/konsole-F14446.history
  47K /tmp/konsole-V14446.history
 374K /tmp/konsole-a14446.history

This outputs all Konsole source files, colorized. But not too much colorized
(only keywords, function name in definitions, strings, primitive types,
preprocessor). Good simulation of fancy prompt, colorful greps and ls here and
there, errors/warnings from compiler, and mostly regular text.

Algorithm: The same as above
Result: 24M reduced to 3.8M => 2.5B/character.

This might be even better with another algorithm; LZ4 is just the first fast
algorithm I thought of.


Actually it would be great to use compression even on in-memory history. I'll
probably implement compression after finishing my current tasks, unless someone
else wants to do it.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to