On 12/01/2014 02:31 PM, Shannon Dealy wrote: > On Mon, 1 Dec 2014, Nikolaus Rath wrote: > > [snip] >> I think at this point I can probably write you a patch to get the file >> system functional again, but I'd very much like to find out what's >> happening here. >> >> Would you be able to run fsck with --backend-options no-ssl, and capture >> the traffic using Wireshark? > > Hi Nikolaus, > > I performed several runs of fsck.s3ql while experimenting with wireshark > (it has been years since I've used it or tcpdump) to get the settings > right. Each time, fsck.s3ql failed in the same manner. Then when I did > what was to be the final run/capture, it ran to completion without errors. > > Given the behavior above, the first thing that leaps to mind is possibly > a race condition. I would usually expect more inconsistency (such as > failing at different objects each time) if it was simply uninitialized > data or corruption, though those may be possibilities too.
That's interesting, but it actually fits my hypothesis. fsck.s3ql is single-threaded, so it's not a race condition. However, when retrieving the object list from S3, S3QL has to do several requests because S3 forces to "paginize" the list to at most 10000 entries per request. I suspect there might be a bug in the pagination handling that causes an object to be listed twice. Most likely, this is only triggered when S3 does something uncommon-but-legal in its responses. But the only way to check this is to get a dump of the raw server response. So if you could try this again a few more times (use the --force option to force an fsck), that would be fantastic. > I still have the bucket that was copied using the aws command line > tools, and am in the process of copying that to a new bucket for testing > so we don't lose the corrupt version, but won't get to testing it > tonight. I have not tried to use the original file system since > fsck.s3ql succeeded and am not entirely sure if it trust it without > knowing what was wrong with fsck.s3ql At this point, I am pretty confident that this is a bug related to object listing. Object listing is only used by fsck.s3ql, so I think that using the file system normally (aka with mount.s3ql) should not result in any problems. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org