Tim Peters <[email protected]> added the comment:
difflib generally synchs on the longest contiguous matching subsequence that
doesn't contain a "junk" element. By default, `ndiff()`'s optional `charjunk`
argument considers blanks and tabs to be junk characters.
In the strings:
"drwxrwxr-x 2 2000 2000\n"
"drwxr-xr-x 2 2000 2000\n"
the longest matching substring not containing whitespace is "rwxr-x", of length
6, starting at index 4 in the first string and at index 1 in the second. So
it's aligning the strings like so:
"drwxrwxr-x 2 2000 2000\n"
"drwxr-xr-x 2 2000 2000\n"
123456
That's why it wants to delete the 1:4 slice in the first string and insert
"r-x" after the longest matching substring.
The default is aimed at improving results for human-readable text, like prose
and Python code, where stuff between whitespace is often read "as a whole"
(words, keywords, identifiers, ...).
For cases like this one, where character-by-character differences are
important, it's often better to pass `charjunk=None`. Then the longest
matching substring is "xr-x 2 2000 2000" at the tail end of both strings, and
you get the output you're expecting.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35955>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com