No problem, here is the reply to the Debian bug and my coworkers.

Kind regards
Jose M Calhariz

On Thu, Jun 17, 2021 at 08:26:57PM -0500, Andrew Deason wrote:
> That's a question for Ben. Can you reply to 969...@bugs.debian.org
> instead? (or ask him directly via ka...@mit.edu ). I was replying to you
> directly, so the reply-to didn't go to the debian bug, sorry.
> 



> Hi
>
> Thank you for the follow up.  The machine were this error is coming
> from have a complicated pattern of access to AFS, that I can not
> reproduce so I will try 1.8.8pre1 on that machine.  Is an experimental
> package for Debian near?  I can try to package myself, but I would
> prefer a package from the maintainer.
>
> Kind regards
> Jose M Calhariz
>
>
> On Thu, Jun 17, 2021 at 05:39:05PM -0500, Andrew Deason wrote:
> > Sorry, I may have accidentally not included you in this reply (not sure
> > if you got this?). I didn't mean to exclude you, so just in case, here
> > it is directly:
> >
> > > On Tue, Sep 01, 2020 at 03:43:37PM +0100, Jose M Calhariz wrote:
> > > > The an example of errors on the logs are:
> > > >
> > > > afs: disk cache read error in CacheItems slot 350195 off 
> > > > 28015620/35000020 code -4/80
> > > > afs: Error while alloc'ing cache slot for file 204:536874423.964.4794; 
> > > > failing with an i/o error
> >
> > Hi, I'm the person that mentioned this briefly during the AFS workshop
> > this week. These messages are not in themselves a problem; they are just
> > reporting that we got an error code from the Linux kernel when trying to
> > read from the disk cache.
> >
> > On Tue, 1 Sep 2020 16:07:55 -0700
> > Benjamin Kaduk <ka...@mit.edu> wrote:
> >
> > > This error message is supposed to indicate that a read from the cache
> > > filesystem got EIO, which in turn is supposed to indicate a physical
> > > problem with the drive.  That said, I'm not going to jump to conclusions
> > > and try to blame your drive, as there are several other things that could
> > > be coming into play.
> >
> > The code logged is -4, which is EINTR (EIO would be -5). The most likely
> > trigger of this is a process that got a SIGKILL signal (or other fatal
> > signal) while we were reading from the disk cache. Traditionally we
> > wouldn't get errors in that case, but Linux started returning errors in
> > that situation after some version (possibly depending on the local fs in
> > use? but I don't recall exactly).
> >
> > If you think these messages happen when some other bug or problem is
> > happening, that's possible, but the messages themselves are not a
> > problem. If you want to avoid the situation that causes these messages,
> > you can try to avoid SIGKILL'ing the relevant processes, if you know
> > what's causing that. The message you've shown doesn't log the pid, but
> > there is already a change in 1.8.8pre1 to log the pid and some other
> > information in that log message.
> >
> > If you want the specific patch to add some more info to that log
> > message, it's here (gerrit 14437):
> >
> > https://git.openafs.org/?p=openafs.git;a=patch;h=5d863b4f6e817b1cc2615265c7747e17a2037ae6
> >
> > I know of at least one bug that can be triggered by the log message
> > you've mentioned, which is fixed by gerrit 14451 here:
> >
> > https://git.openafs.org/?p=openafs.git;a=patch;h=c55607d732a65f8acb1dfc6bf93aee0f4409cecf
> >
> > That's also in 1.8.8pre1, so if it's feasible for you to just try
> > 1.8.8pre1, that's probably easiest. The messages will still appear with
> > 1.8.8pre1, but they may be more informative, and some other related bugs
> > may be fixed. If you are seeing some other problematic behavior with
> > 1.8.8pre1, I can take a look if you provide some details.
> >
> >
>
> --
> --
>
> As pessoas mudam através do que alcançam
>
> --Wystan Hugh Auden



-- 
--

As pessoas mudam através do que alcançam

--Wystan Hugh Auden

Attachment: signature.asc
Description: PGP signature

Reply via email to