Quoting "Steven R. Gerber" <[email protected]>:
> On 3/24/2011 4:33 PM, [email protected] wrote:
> > Quoting "Steven R. Gerber" <[email protected]>:
> >
> >> On 3/24/2011 2:36 PM, [email protected] wrote:
> >>> Quoting "Steven R. Gerber" <[email protected]>:
> >>>
> >>>> -------- Original Message --------
> >>>> Subject: Re: rdist times out but will not die
> >>>> Date: Thu, 24 Mar 2011 21:49:01 +1300
> >>>> From: Richard Toohey <[email protected]>
> >>>> To: Steven R. Gerber <[email protected]>
> >>>> CC: [email protected]
> >>>>
> >>>> On 24/03/2011, at 4:06 PM, Steven R. Gerber wrote:
> >>>>
> >>>>> On 3/20/2011 2:07 PM, Steven R. Gerber wrote:
> >>>>>> I want to do local/remote mirror/backup (or should that be
> >>>> local-mirror
> >>>>>> / offsite-backup).
> >>>>>> So a two-part question:
> >>>>>> 1. Even if there is a timeout, shouldn't the job/process exit?
> >>>>>>
> >>>> *************************************************************
> >>>> ****************
> >>>> *
> >>>>>> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies:
> >> chown
> >>>> from
> >>>>>> rdist:operator to cdripper:operator
> >>>>>> rdist@thedump: thedump:
> >>>>>>
> /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999:
> >>>> chown
> >>>>>> from rdist:operator to root:operator
> >>>>>> rdist@thedump:
> >>>>>>
> >>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
> >>>> n_Affair_1999/THOMAS_CROW
> >>>> N_AFFAIR_16X9.md5:
> >>>>>> updating
> >>>>>> rdist@thedump:
> >>>>>>
> >>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
> >>>> n_Affair_1999/THOMAS_CROW
> >>>> N_AFFAIR_16X9.iso:
> >>>>>> installing
> >>>>>> rdist@thedump: LOCAL ERROR: Response time out
> >>>>>> rdist@thedump: updating of rdist@thedump finished
> >>>>>> $ ps -ax|grep rdist
> >>>>>> 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20
> >>>>>> 11059 ?? I 0:00.01 rdist -f /etc/Distfile
> >>>>>> 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist)
> >>>>>> 7795 ?? I 1:10.32 ssh -l rdist thedump r
> >>>>>> 13045 p0 S+ 0:00.00 grep rdist
> >>>>>>
> >>>> *************************************************************
> >>>> ****************
> >>>> *
> >>>>>> 2. I know that they happen from time to time. How can I
> >>>> avoid/prevent
> >>>>>> timeouts? The default is 900 sec AKA 15 min? How can this happen
> >>>>>> between two local machines?
> >>>>
> >>>> How big is the file?
> >>>
> >>> So, how big is the file that it times out on?
> >>>
> >>> More than 2Gb? Guess so if a movie file?
> >>>
> >>> I might be barking up the wrong tree, but it will take you two
> seconds
> >> to see if
> >>> there's anything in this > 2Gb idea and if I'm wrong, move on.
> >>>
> >>> Regardless of that, yes, put more debugging on - might give you
> some
> >> more clues.
> >>>
> >>> OpenBSD helps those who help themselves.
> >> Richard,
> >> Thanks for the help.
> >> I had already read the IBM note 'LOCAL ERROR: response time out'
> (from
> >> 2006). (Google is not my enemy?)
> >> I had already checked: the file is >2GB (4.4GB).
> >> I ASSUMED that I can't the only who has tried to push large files
> with
> >> rdist. I searched the OpenBSD list archives (mine go back to 2006)
> and
> >> found nothing significant/useful. Maybe I missed something?
> >> I immediately moved to the misc list per your suggestion.
> >> I did a (manual) run of rdist with "-D" and got similar results -- I
> am
> >> still analyzing those messages.
> >> I usually do not compile OpenBSD, so it will take a while to review
> the
> >> rdist source code (client.c?).
> >
> > Thanks ... never assume anything, eh? 8-)
> >
> > If your files are > 2Gb, then that IBM link seems to be spot on, and
> answers
> > (maybe) number 2 on your list - why would you get a timeout on a local
> transfer
> > (if hardware related, you'd expect sftp to fail, or there to be other
> noticeable
> > issues)?
> >
> > I've not used rdist before, but I don't mind having a look now that I
> know your
> > files are > 2Gb. But going to be a quiet (ha!) evening project, so no
> promises
> > (and maybe someone else will blow the theory out of the water and
> provide a
> > different answer/fix.)
> >
> > The IBM note suggests that both client & server need to be amended, IF
> I am on
> > the right track.
> >
> > This is all purely speculative on my part, but it does SEEM to match
> what you
> > are seeing, doesn't it?
> >
> > Thanks.
> [SNIP]
>
> You are right on it! Thanks!
> Not to be greedy, but ...
> What do you think of the other issue that rdist logs a "finished"
> message but does not exit?
>
> Thanks.
>
>
More guessing (I'm already out on a limb ... the branch is about to break) ...
"something" is unhappy because of the time out?
What messages are in the debug output - do you see "finish() called" as per the
code in common.c below? What's the rest of the message(s)?
What happens if you move all the > 2Gb files out the way temporarily and re-run
(obviously I don't know how practical this is)? Does it finish normally?
Or if that doesn't suit, how about creating a test directory with 20 (<2 Gb
each) files in, run it, then drop a big file (>2 Gb) in, re-run. If it fails,
then I'd say I was on to something (I don't know anything about rdist, so I do
not know how to set up this test environment.) Remove the big file, or truncate
it down to < 2Gb and re-run. If that works, I get a cookie.
common.c
154 void
155 finish(void)
156 {
157 extern jmp_buf finish_jmpbuf;
158
159 debugmsg(DM_CALL,
160 "finish() called: do_fork = %d amchild = %d isserver =
%d",
161 do_fork, amchild, isserver);
162 cleanup(0);
163
164 /*
165 * There's no valid finish_jmpbuf for the rdist master parent.
166 */
167 if (!do_fork || amchild || isserver) {
168
169 if (!setjmp_ok) {
170 #ifdef DEBUG_SETJMP
171 error("attemping longjmp() without target");
172 abort();
173 #else
174 exit(1);
175 #endif
176 }
177
178 longjmp(finish_jmpbuf, 1);
179 /*NOTREACHED*/
180 error("Unexpected failure of longjmp() in finish()");
181 exit(2);
182 } else
183 exit(1);
184 }
Thanks.