> I have never seen neither the papers from Lempel and Ziv nor the paper
from Storer and Szymanski, How can you even say it and then quote wikipedia,
and know it is true? > LZ77 algorithms achieve compression by replacing
repeated occurrences of data with references to a single copy of that data
existing earlier in the uncompressed data stream. It is very general idea,
that is in general true, but in a very next sentence there is untrue: > A
match is encoded by a pair of numbers called a length-distance pair Half
true, but, well, half truth is not whole truth. It is not true at all. To dig
out the essence one have to analyze every single piece of it now. LZ77 scheme
is not based on 'length, distance pair', as it can only encode
reference to the past, and the past is encoded how? According to this
'definition' it is not. One cannot use wikipedia as a source of
information. Any information, in fact. There is one thing they do well, though
- they give references to their claims. And there it original Lempel and
Ziv's paper[1]. It is only 7 pages long and give no concept of a literal,
and the only encoding is possible as triplets - `distance, length,
substring'. And the pseudocode, you quoted, explains that in line 14: ```
output (d, l, c) ``` So let me give an example of coding with LZ77: Given
a string below that is 31*8 = 248 bits long. Let us encode it with LZ77 scheme
according to the Lempels-Ziv's paper and wikipedia itself: (distance,
length , character) == (d,l,c): Str: "Alice has cat and cat has
Alice" LZ77: (0,0,A)(0,0,l)(0,0,i)(0,0,c)(0,0,e)(0,0,
)(0,0,h)(0,0,a)(0,0,s)(0,0, )(0,0,c)(0,0,a)(0,0,t)(0,0,
)(0,0,a)(0,0,n)(0,0,d)(8,5,c)(16,4,h)(26,5,A) if distance is 5 bits, length 3
bits, and character (a substring) is 8 bits, then 20*(8+8) = 320 bits. Not much
improvement from 284 bits, is to? Now let me give you an example of coding
using Storer-Szymanski's method described in their paper [2]. It is much
longer but I will save you a time and take an example from page 3 (page 930 of
the journal) with a string I used before: Str: "Alice has cat and cat
has Alice" LZSS: Alice has cat and(8,5)(16,4)(26,5) Again, 31*8 = 248
bits. If distance is 5 bits, length 3 bits, and literal/match flag is 1 bit,
then 17*9+(1+8)*3 = 180 bits. Subtle difference. As you can see,
wikipedia's statement > The main difference between LZ77 and LZSS is
that in LZ77 the dictionary reference could actually be longer than the string
it was replacing. although true, does not show the difference between the
methods, but result of a difference. An effect, not a cause. The cause is
explained in the next sentence/s > In LZSS, such references are omitted if
the length is less than the "break even" point. Furthermore, LZSS uses
one-bit flags to indicate whether the next chunk of data is a literal (byte) or
a reference to an offset/length pair. Which explains real difference between
the methods. LZ77 demonstrates the general idea, that backreferencing strings
can be a method of compression but it is practically inapplicable. Not until
LZSS makes the scheme a practical, useful tool. If I proof that electrically
heated wire can be source of light am I suddenly inventor of a bulb? Far from
it. Until Storer and Szymanski solver the problem of inefficient encoding LZ77
was practically impractical. In fact, their invention of literal, and encoding
it and match with single bit was a launch pad for the method. LZMA is LZSS +
Markov chains + arithmetic coding. In comparison Deflate is LZSS + prefix
coding (Huffman derived but rarely Huffman). As, hopefully, demonstrated, if
not by me than maybe a prof. Bird, whom i quoted before, would be an authority
here [3][4], LZMA, just like "PKZip, ARJ, RAR, ZOO, LHarc use LZSS rather
than LZ77 as the primary compression algorithm; the encoding of literal
characters and of length-distance pairs varies". PS. One cannot treat
wikipedia as a source of any kind. There are good articles among them but
majority of them is written by laymen and amateurs. Specialists do not have
time to correct amateur, to not say amateurish articles in popular media.
Therefore referring to references is a must. ---- 1. A Universal Algorithm
for Sequential Data Compression. Ziv, Lempel scholar.archive.org
https://scholar.archive.org/work/gfdvh3vf5rb55a62mgsrvory2i 2. Data
compression via textual substitution; Storer, Szymanski; Journal of the ACM,
Volume 29, Issue 4pp 928–951; 1 Oct 1982 doi.org
https://doi.org/10.1145/322344.322346 dl.acm.org
https://dl.acm.org/doi/10.1145/322344.322346 dl.acm.org
https://dl.acm.org/doi/pdf/10.1145/322344.322346 3. LZ schemes
www.youtube.com https://www.youtube.com/watch?v=VDXBnmr8AY0&t=16m16s 4.
LZ schemes (slide) encode.su
https://encode.su/attachment.php?attachmentid=10843&d=1697027101