No problem, here is the reply to the Debian bug and my coworkers. Kind regards Jose M Calhariz
On Thu, Jun 17, 2021 at 08:26:57PM -0500, Andrew Deason wrote: > That's a question for Ben. Can you reply to 969...@bugs.debian.org > instead? (or ask him directly via ka...@mit.edu ). I was replying to you > directly, so the reply-to didn't go to the debian bug, sorry. > > Hi > > Thank you for the follow up. The machine were this error is coming > from have a complicated pattern of access to AFS, that I can not > reproduce so I will try 1.8.8pre1 on that machine. Is an experimental > package for Debian near? I can try to package myself, but I would > prefer a package from the maintainer. > > Kind regards > Jose M Calhariz > > > On Thu, Jun 17, 2021 at 05:39:05PM -0500, Andrew Deason wrote: > > Sorry, I may have accidentally not included you in this reply (not sure > > if you got this?). I didn't mean to exclude you, so just in case, here > > it is directly: > > > > > On Tue, Sep 01, 2020 at 03:43:37PM +0100, Jose M Calhariz wrote: > > > > The an example of errors on the logs are: > > > > > > > > afs: disk cache read error in CacheItems slot 350195 off > > > > 28015620/35000020 code -4/80 > > > > afs: Error while alloc'ing cache slot for file 204:536874423.964.4794; > > > > failing with an i/o error > > > > Hi, I'm the person that mentioned this briefly during the AFS workshop > > this week. These messages are not in themselves a problem; they are just > > reporting that we got an error code from the Linux kernel when trying to > > read from the disk cache. > > > > On Tue, 1 Sep 2020 16:07:55 -0700 > > Benjamin Kaduk <ka...@mit.edu> wrote: > > > > > This error message is supposed to indicate that a read from the cache > > > filesystem got EIO, which in turn is supposed to indicate a physical > > > problem with the drive. That said, I'm not going to jump to conclusions > > > and try to blame your drive, as there are several other things that could > > > be coming into play. > > > > The code logged is -4, which is EINTR (EIO would be -5). The most likely > > trigger of this is a process that got a SIGKILL signal (or other fatal > > signal) while we were reading from the disk cache. Traditionally we > > wouldn't get errors in that case, but Linux started returning errors in > > that situation after some version (possibly depending on the local fs in > > use? but I don't recall exactly). > > > > If you think these messages happen when some other bug or problem is > > happening, that's possible, but the messages themselves are not a > > problem. If you want to avoid the situation that causes these messages, > > you can try to avoid SIGKILL'ing the relevant processes, if you know > > what's causing that. The message you've shown doesn't log the pid, but > > there is already a change in 1.8.8pre1 to log the pid and some other > > information in that log message. > > > > If you want the specific patch to add some more info to that log > > message, it's here (gerrit 14437): > > > > https://git.openafs.org/?p=openafs.git;a=patch;h=5d863b4f6e817b1cc2615265c7747e17a2037ae6 > > > > I know of at least one bug that can be triggered by the log message > > you've mentioned, which is fixed by gerrit 14451 here: > > > > https://git.openafs.org/?p=openafs.git;a=patch;h=c55607d732a65f8acb1dfc6bf93aee0f4409cecf > > > > That's also in 1.8.8pre1, so if it's feasible for you to just try > > 1.8.8pre1, that's probably easiest. The messages will still appear with > > 1.8.8pre1, but they may be more informative, and some other related bugs > > may be fixed. If you are seeing some other problematic behavior with > > 1.8.8pre1, I can take a look if you provide some details. > > > > > > -- > -- > > As pessoas mudam através do que alcançam > > --Wystan Hugh Auden -- -- As pessoas mudam através do que alcançam --Wystan Hugh Auden
signature.asc
Description: PGP signature