> I have never seen neither the papers from Lempel and Ziv nor the paper 
from Storer and Szymanski,   How can you even say it and then quote wikipedia, 
and know it is true?   > LZ77 algorithms achieve compression by replacing 
repeated occurrences of data with references to a single copy of that data 
existing earlier in the uncompressed data stream.   It is very general idea, 
that is in general true, but in a very next sentence there is untrue:   > A 
match is encoded by a pair of numbers called a length-distance pair   Half 
true, but, well, half truth is not whole truth. It is not true at all. To dig 
out the essence one have to analyze every single piece of it now.  LZ77 scheme 
is not based on 'length, distance pair', as it can only encode 
reference to the past, and the past is encoded how? According to this 
'definition' it is not.  One cannot use wikipedia as a source of 
information. Any information, in fact. There is one thing they do well, though 
- they give references to their claims. And there it original Lempel and 
Ziv's paper[1]. It is only 7 pages long and give no concept of a literal, 
and the only encoding is possible as triplets - `distance, length, 
substring'.  And the pseudocode, you quoted, explains that in line 14:  ``` 
 output (d, l, c)  ```   So let me give an example of coding with LZ77:  Given 
a string below that is 31*8 = 248 bits long. Let us encode it with LZ77 scheme 
according to the Lempels-Ziv's paper and wikipedia itself: (distance, 
length , character) == (d,l,c):   Str:  "Alice has cat and cat has 
Alice"  LZ77: (0,0,A)(0,0,l)(0,0,i)(0,0,c)(0,0,e)(0,0, 
)(0,0,h)(0,0,a)(0,0,s)(0,0, )(0,0,c)(0,0,a)(0,0,t)(0,0, 
)(0,0,a)(0,0,n)(0,0,d)(8,5,c)(16,4,h)(26,5,A)   if distance is 5 bits, length 3 
bits, and character (a substring) is 8 bits, then 20*(8+8) = 320 bits. Not much 
improvement from 284 bits, is to?   Now let me give you an example of coding 
using Storer-Szymanski's method described in their paper [2]. It is much 
longer but I will save you a time and take an example from page 3 (page 930 of 
the journal) with a string I used before:   Str:   "Alice has cat and cat 
has Alice"  LZSS: Alice has cat and(8,5)(16,4)(26,5)   Again, 31*8 = 248 
bits. If distance is 5 bits, length 3 bits, and literal/match flag is 1 bit, 
then 17*9+(1+8)*3 = 180 bits.   Subtle difference.   As you can see, 
wikipedia's statement  > The main difference between LZ77 and LZSS is 
that in LZ77 the dictionary reference could actually be longer than the string 
it was replacing.  although true, does not show the difference between the 
methods, but result of a difference. An effect, not a cause. The cause is 
explained in the next sentence/s  > In LZSS, such references are omitted if 
the length is less than the "break even" point. Furthermore, LZSS uses 
one-bit flags to indicate whether the next chunk of data is a literal (byte) or 
a reference to an offset/length pair.   Which explains real difference between 
the methods.  LZ77 demonstrates the general idea, that backreferencing strings 
can be a method of compression but it is practically inapplicable. Not until 
LZSS makes the scheme a practical, useful tool.  If I proof that electrically 
heated wire can be source of light am I suddenly inventor of a bulb? Far from 
it. Until Storer and Szymanski solver the problem of inefficient encoding LZ77 
was practically impractical. In fact, their invention of literal, and encoding 
it and match with single bit was a launch pad for the method.   LZMA is LZSS + 
Markov chains + arithmetic coding. In comparison Deflate is LZSS + prefix 
coding (Huffman derived but rarely Huffman).  As, hopefully, demonstrated, if 
not by me than maybe a prof. Bird, whom i quoted before, would be an authority 
here [3][4], LZMA, just like "PKZip, ARJ, RAR, ZOO, LHarc use LZSS rather 
than LZ77 as the primary compression algorithm; the encoding of literal 
characters and of length-distance pairs varies".    PS.  One cannot treat 
wikipedia as a source of any kind. There are good articles among them but 
majority of them is written by laymen and amateurs. Specialists do not have 
time to correct amateur, to not say amateurish articles in popular media. 
Therefore referring to references is a must.   ----   1. A Universal Algorithm 
for Sequential Data Compression. Ziv, Lempel  scholar.archive.org 
https://scholar.archive.org/work/gfdvh3vf5rb55a62mgsrvory2i   2. Data 
compression via textual substitution; Storer, Szymanski; Journal of the ACM, 
Volume 29, Issue 4pp 928–951; 1 Oct 1982  doi.org 
https://doi.org/10.1145/322344.322346  dl.acm.org 
https://dl.acm.org/doi/10.1145/322344.322346  dl.acm.org 
https://dl.acm.org/doi/pdf/10.1145/322344.322346   3. LZ schemes  
www.youtube.com https://www.youtube.com/watch?v=VDXBnmr8AY0&t=16m16s   4. 
LZ schemes (slide)  encode.su 
https://encode.su/attachment.php?attachmentid=10843&d=1697027101

Reply via email to