Jose,
Indeed I would be interested! Certainly worth a try.
Thanks,
Lou
On 2023-07-25 9:34 a.m., Jose M Calhariz wrote:
Hi,
If I understand well your problem I found it in 3.5.1 and I have a
patch that fix it, from the previous owner of amanda. The patch is
being in used by amanda in Debian for several years.
I can publish the patch here if you are interested.
Kind regards
Jose M Calhariz
On Wed, Jul 19, 2023 at 09:34:21AM -0700, Lou Hafer wrote:
Nuno,
Thanks for the reply! And apologies for being not quite clear.
I'm quite sure the offending hosts are powered down, so no chance of
partial response. When I look at the planner.<timestamp>.debug log, I can
see sendsize requests going out to the hosts that are powered up and
responsive, and I can see their responses arrive. There are two hosts
powered down, gallifrey and jpt. The requests go out to gallifrey, then jpt.
When the request to gallifrey times out, planner sees 255 status from SSH
and aborts with the EOF error. Doesn't even wait around for the timeout on
jpt.
If I go back and look at some old logs, I can see planner continue past
the `EOF on read' error. So I'm really starting to think this is a new bug
in 3.5.3.
For what it's worth, I'd interpret your error,
ERROR Request to MACHINE failed: Connection refused
as the machine was powered up and responsive but actively refused the
connection for some reason.
I'm puzzled by another thing: we're using the same version of amanda
(3.5.3) and I run backups to disk, no tape drive involved, but I've never
seen the error you mention in April: backup aborts after first machine/disk
in disklist. The obvious difference is Fedora 37 versus Fedora 38, but
really that shouldn't cause this much difference in behaviour.
Bah! Sometimes staying up-to-date is a bit painful. I'll see if anyone
else chimes in before I report this as a bug.
Spent two weeks in Porto and the Douro Valley in Fall 2022. Loved the
country!
Lou
On 2023-07-19 2:17 a.m., Nuno Dias wrote:
Hi Lou,
I'm using the same version as you, although in Fedora 37
amanda-3.5.3-1.fc37.x86_64 and I don't see that behaviour, I have some
machines that are down and the rest of the backups were made.
In my case I have this
planner: ERROR Request to MACHINE failed: Connection refused
From what you wrote, it seems gallifrey.ivriel is not down is
responding, but has some problems reporting the size.
Maybe this page will help
https://www.zmanda.com/knowledge-base/eof-on-read-error-from-a-client/
Although if is aborting all the planner it seems a bug, or there are
other reasons for aborting all the planner, maybe checking if the
etimeout is not very low.
Cheers,
Nuno
On Tue, 2023-07-18 at 13:50 -0700, Lou Hafer wrote:
Folks,
I've been using amanda for several years on a simple home
network.
Hosts are often powered down. Up through amanda 3.5.2, this worked
like
a charm. If the host didn't respond, it was simply skipped. Hosts
that
responded were properly backed up.
With amanda 3.5.3, the behaviour has changed. If a host doesn't
respond to the planner size request, the planner aborts the entire
backup with the error
planner: ERROR Request to gallifrey.ivriel failed:
EOF on read from gallifrey.ivriel
I've confirmed that my configuration is generally correct --- as long
as
all hosts in the disklist respond to the size request, the backup
succeeds.
Is this a bug? Do I need to change some parameter in my configuration
to
persuade planner to soldier on? Any thoughts would be appreciated.
As context, this problem came about with an upgrade from Fedora
37
to Fedora 38, with a matching upgrade from amanda 3.5.2 to amanda
3.5.3.
Thanks,
Lou