I'm unsure this fix addresses the problem reported - perhaps due to my
choice of title!

The more extensive commit message from
https://fossies.org/linux/misc/e2fsprogs-1.44.5.tar.gz/e2fsprogs-1.44.5/doc/RelNotes/v1.44.5.txt
is:

"Use 64-bit counters to track the number of files that are
defragmented using in e4defrag, to avoid overflows when more than
2**32 files are defragmented.  (Addresses Debian Bug: #888899)"

However, this issue does not concern an overflow due to processing
over 2**32 files - although *technically* that might happen [even
though ext4's limit is (2**32)-1], if the issue reported is combined
with a filesystem that already has almost 2**32 files.

This is a logic issue deriving from the assumption that total_count >=
succeed_cnt, which may not be the case if files have been created
during the defrag process, as when nginx is writing - generally,
filesystems requiring defrag may have lots of writes.

Put another way, the filesystem is not static after the original call
to calc_entry_counts() to determine total_count; so, you may succeed
in defragmenting more files than exist... therefore, logically, you
*failed* to defragment fewer than zero files. :-)

In the original case there seem to have been 6195 newly-created files
defragmented, leading to a failure count calculation of -6195, which
is 4294961101 as an unsigned int. (In practice, even more may have
been created, while others - perhaps including newly-defragmented
files - were deleted by nginx to maintain its configured usage
limits.)

I called this an "overflow" even though my instinct was to say
"underflow" because this is apparently the correct terminology for a
value which is "too big" in a negative sense:
https://en.wikipedia.org/wiki/Integer_overflow
https://en.wikipedia.org/wiki/Arithmetic_underflow

Looking at the diff, the counters and associated printfs have been
increased to unsigned long long. However the counters are still
unsigned, and so rather than UINT_MAX I suspect it will output
something closer to ULONGLONG_MAX after subtracting succeed_cnt from
total_cnt in the problem situation.

i.e. with succeed_cnt = 779729, total_cnt = 779728 it might output:
"Failure: [18446744073709551615/779728]"

I have not tried this yet though; I'll do so once it becomes available
via backports.

Best regards, and apologies for the misleading title and
MAX_INT/UINT_MAX confusion,
-- 
Laurence "GreenReaper" Parry
http://www.greenreaper.co.uk/

Reply via email to