Quoting "Steven R. Gerber" <[email protected]>: > On 3/24/2011 5:00 PM, [email protected] wrote: > > Quoting "Steven R. Gerber" <[email protected]>: > > > >> On 3/24/2011 4:33 PM, [email protected] wrote: > >>> Quoting "Steven R. Gerber" <[email protected]>: > >>> > >>>> On 3/24/2011 2:36 PM, [email protected] wrote: > >>>>> Quoting "Steven R. Gerber" <[email protected]>: > >>>>> > >>>>>> -------- Original Message -------- > >>>>>> Subject: Re: rdist times out but will not die > >>>>>> Date: Thu, 24 Mar 2011 21:49:01 +1300 > >>>>>> From: Richard Toohey <[email protected]> > >>>>>> To: Steven R. Gerber <[email protected]> > >>>>>> CC: [email protected] > >>>>>> > >>>>>> On 24/03/2011, at 4:06 PM, Steven R. Gerber wrote: > >>>>>> > >>>>>>> On 3/20/2011 2:07 PM, Steven R. Gerber wrote: > >>>>>>>> I want to do local/remote mirror/backup (or should that be > >>>>>> local-mirror > >>>>>>>> / offsite-backup). > >>>>>>>> So a two-part question: > >>>>>>>> 1. Even if there is a timeout, shouldn't the job/process exit? > >>>>>>>> > >>>>>> ************************************************************* > >>>>>> **************** > >>>>>> * > >>>>>>>> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies: > >>>> chown > >>>>>> from > >>>>>>>> rdist:operator to cdripper:operator > >>>>>>>> rdist@thedump: thedump: > >>>>>>>> > >> /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999: > >>>> >> chown > >>>>>>>> from rdist:operator to root:operator > >>>>>>>> rdist@thedump: > >>>>>>>> > >>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow > >>>>>> n_Affair_1999/THOMAS_CROW > >>>>>> N_AFFAIR_16X9.md5: > >>>>>>>> updating > >>>>>>>> rdist@thedump: > >>>>>>>> > >>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow > >>>>>> n_Affair_1999/THOMAS_CROW > >>>>>> N_AFFAIR_16X9.iso: > >>>>>>>> installing > >>>>>>>> rdist@thedump: LOCAL ERROR: Response time out > >>>>>>>> rdist@thedump: updating of rdist@thedump finished > >>>>>>>> $ ps -ax|grep rdist > >>>>>>>> 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20 > >>>>>>>> 11059 ?? I 0:00.01 rdist -f /etc/Distfile > >>>>>>>> 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist) > >>>>>>>> 7795 ?? I 1:10.32 ssh -l rdist thedump r > >>>>>>>> 13045 p0 S+ 0:00.00 grep rdist > >>>>>>>> > >>>>>> ************************************************************* > >>>>>> **************** > >>>>>> * > >>>>>>>> 2. I know that they happen from time to time. How can I > >>>>>> avoid/prevent > >>>>>>>> timeouts? The default is 900 sec AKA 15 min? How can this > happen > >>>>>>>> between two local machines? > >>>>>> > >>>>>> How big is the file? > >>>>> > >>>>> So, how big is the file that it times out on? > >>>>> > >>>>> More than 2Gb? Guess so if a movie file? > >>>>> > >>>>> I might be barking up the wrong tree, but it will take you two > >> seconds > >>>> to see if > >>>>> there's anything in this > 2Gb idea and if I'm wrong, move on. > >>>>> > >>>>> Regardless of that, yes, put more debugging on - might give you > >> some > >>>> more clues. > >>>>> > >>>>> OpenBSD helps those who help themselves. > >>>> Richard, > >>>> Thanks for the help. > >>>> I had already read the IBM note 'LOCAL ERROR: response time out' > >> (from > >>>> 2006). (Google is not my enemy?) > >>>> I had already checked: the file is >2GB (4.4GB). > >>>> I ASSUMED that I can't the only who has tried to push large files > >> with > >>>> rdist. I searched the OpenBSD list archives (mine go back to 2006) > >> and > >>>> found nothing significant/useful. Maybe I missed something? > >>>> I immediately moved to the misc list per your suggestion. > >>>> I did a (manual) run of rdist with "-D" and got similar results -- > I > >> am > >>>> still analyzing those messages. > >>>> I usually do not compile OpenBSD, so it will take a while to > review > >> the > >>>> rdist source code (client.c?). > >>> > >>> Thanks ... never assume anything, eh? 8-) > >>> > >>> If your files are > 2Gb, then that IBM link seems to be spot on, > and > >> answers > >>> (maybe) number 2 on your list - why would you get a timeout on a > local > >> transfer > >>> (if hardware related, you'd expect sftp to fail, or there to be > other > >> noticeable > >>> issues)? > >>> > >>> I've not used rdist before, but I don't mind having a look now that > I > >> know your > >>> files are > 2Gb. But going to be a quiet (ha!) evening project, so > no > >> promises > >>> (and maybe someone else will blow the theory out of the water and > >> provide a > >>> different answer/fix.) > >>> > >>> The IBM note suggests that both client & server need to be amended, > IF > >> I am on > >>> the right track. > >>> > >>> This is all purely speculative on my part, but it does SEEM to > match > >> what you > >>> are seeing, doesn't it? > >>> > >>> Thanks. > >> [SNIP] > >> > >> You are right on it! Thanks! > >> Not to be greedy, but ... > >> What do you think of the other issue that rdist logs a "finished" > >> message but does not exit? > >> > >> Thanks. > >> > >> > > More guessing (I'm already out on a limb ... the branch is about to > break) ... > > "something" is unhappy because of the time out? > > > > What messages are in the debug output - do you see "finish() called" > as per the > > code in common.c below? What's the rest of the message(s)? > > > > What happens if you move all the > 2Gb files out the way temporarily > and re-run > > (obviously I don't know how practical this is)? Does it finish > normally? > > > > Or if that doesn't suit, how about creating a test directory with 20 > (<2 Gb > > each) files in, run it, then drop a big file (>2 Gb) in, re-run. If it > fails, > > then I'd say I was on to something (I don't know anything about rdist, > so I do > > not know how to set up this test environment.) Remove the big file, or > truncate > > it down to < 2Gb and re-run. If that works, I get a cookie. > > > > common.c > > > > 154 void > > 155 finish(void) > > 156 { > > 157 extern jmp_buf finish_jmpbuf; > > 158 > > 159 debugmsg(DM_CALL, > > 160 "finish() called: do_fork = %d amchild = %d isserver = %d", > > 161 do_fork, amchild, isserver); > > 162 cleanup(0); > > 163 > > 164 /* > > 165 * There's no valid finish_jmpbuf for the rdist master parent. > > 166 */ > > 167 if (!do_fork || amchild || isserver) { > > 168 > > 169 if (!setjmp_ok) { > > 170 #ifdef DEBUG_SETJMP > > 171 error("attemping longjmp() without target"); > > 172 abort(); > > 173 #else > > 174 exit(1); > > 175 #endif > > 176 } > > 177 > > 178 longjmp(finish_jmpbuf, 1); > > 179 /*NOTREACHED*/ > > 180 error("Unexpected failure of longjmp() in finish()"); > > 181 exit(2); > > 182 } else > > 183 exit(1); > > 184 } > > > > Thanks. > > > > > > > > I am getting the "finished() called" etc. > I now have a theory (your "something" unhappy guess): rdist times out, > but the child process does not and is still trying to get the > end-of-file. The child is basically in an infinite loop: it does not > time out because the dump does respond but it keeps retrieving from the > first part of file -- it never reaches past the miscalculated size. > >
My diffs will no doubt get mangled by my webmail and I don't know enough about rdist (or the rdist protocol) to know if these are correct. Hopefully they are a step in the right direction. Basic idea from https://www-304.ibm.com/support/docview.wss?uid=isg1IY85396 (I was going to look at FreeBSD's version for inspiration but looks like they ditched rdist in 2003.) Basically strtol to strtoll, %ld to %lld, and (int)/(long) to (off_t) to cope with files bigger than > 2Gb. Works for me on i386 - without these patches I see the reported behaviour, with the patches I see the 4Gb file transferred. With patches - it works: $ cat rdist.conf HOSTS = (172.16.1.111) FILES = (/home/richard.toohey/rdist-test) ${FILES} -> ${HOSTS} $ rdist -f rdist.conf 172.16.1.111: updating host 172.16.1.111 [email protected]'s password: 172.16.1.111: /home/richard.toohey/rdist-test/zerofile.tst: installing 172.16.1.111: updating of 172.16.1.111 finished zerofile.tst created with: dd if=/dev/zero of=zerofile.tst bs=1k count=4700000 HTH. /usr/src/usr.bin/rdist/client.c =============================== # diff -uw /home/richard.toohey/obsd-src/usr.bin/rdist/client.c client.c --- /home/richard.toohey/obsd-src/usr.bin/rdist/client.c Thu Oct 29 17:34:06 2009 +++ client.c Fri Mar 25 14:54:32 2011 @@ -399,8 +399,8 @@ */ ENCODE(ername, rname); - (void) sendcmd(C_RECVREG, "%o %04o %ld %ld %ld %s %s %s", - opts, stb->st_mode & 07777, (long) stb->st_size, + (void) sendcmd(C_RECVREG, "%o %04o %lld %ld %ld %s %s %s", + opts, stb->st_mode & 07777, (off_t) stb->st_size, stb->st_mtime, stb->st_atime, user, group, ername); if (response() < 0) { @@ -409,8 +409,8 @@ } - debugmsg(DM_MISC, "Send file '%s' %ld bytes\n", rname, - (long) stb->st_size); + debugmsg(DM_MISC, "Send file '%s' %lld bytes\n", rname, + (off_t) stb->st_size); /* * Set remote time out alarm handler. @@ -666,8 +666,8 @@ * Gather and send basic link info */ ENCODE(ername, rname); - (void) sendcmd(C_RECVSYMLINK, "%o %04o %ld %ld %ld %s %s %s", - opts, stb->st_mode & 07777, (long) stb->st_size, + (void) sendcmd(C_RECVSYMLINK, "%o %04o %lld %ld %ld %s %s %s", + opts, stb->st_mode & 07777, (off_t) stb->st_size, stb->st_mtime, stb->st_atime, user, group, ername); if (response() < 0) @@ -682,7 +682,7 @@ error("%s: readlink failed", target); err(); } - (void) snprintf(tbuf, sizeof(tbuf), "%.*s", (int) stb->st_size, lbuf); + (void) snprintf(tbuf, sizeof(tbuf), "%.*s", (off_t) stb->st_size, lbuf); ENCODE(ername, tbuf); (void) sendcmd(C_NONE, "%s\n", ername); @@ -869,7 +869,7 @@ /* * Parse size */ - size = (off_t) strtol(cp, (char **)&cp, 10); + size = (off_t) strtoll(cp, (char **)&cp, 10); if (*cp++ != ' ') { error("update: size not delimited"); return(US_NOTHING); @@ -921,8 +921,8 @@ debugmsg(DM_MISC, "update(%s,) local mode %04o remote mode %04o\n", rname, lmode, rmode); - debugmsg(DM_MISC, "update(%s,) size %ld mtime %d owner '%s' grp '%s'\n", - rname, (long) size, mtime, owner, group); + debugmsg(DM_MISC, "update(%s,) size %lld mtime %d owner '%s' grp '%s'\n", + rname, (off_t) size, mtime, owner, group); if (statp->st_mtime != mtime) { if (statp->st_mtime < mtime && IS_ON(opts, DO_YOUNGER)) { @@ -935,8 +935,8 @@ } if (statp->st_size != size) { - debugmsg(DM_MISC, "size does not match (%ld != %ld).\n", - (long) statp->st_size, (long) size); + debugmsg(DM_MISC, "size does not match (%lld != %lld).\n", + (off_t) statp->st_size, (off_t) size); return(US_OUTDATE); } /usr/src/usr.bin/rdistd/server.c ================================ # diff -uw /home/richard.toohey/obsd-src/usr.bin/rdistd/server.c server.c --- /home/richard.toohey/obsd-src/usr.bin/rdistd/server.c Thu Oct 29 17:34:06 2009 +++ server.c Fri Mar 25 14:49:18 2011 @@ -391,7 +391,7 @@ #else /* * We use MT_NOTICE instead of MT_CHANGE because this function is - * sometimes called by other functions that are suppose to return a + * sometimes called by other functions that are supposed to return a * single ack() back to the client (rdist). This is a kludge until * the Rdist protocol is re-done. Sigh. */ @@ -656,8 +656,8 @@ case S_IFIFO: #endif #endif - (void) sendcmd(QC_YES, "%ld %ld %o %s %s", - (long) stb.st_size, stb.st_mtime, + (void) sendcmd(QC_YES, "%lld %ld %o %s %s", + (off_t) stb.st_size, stb.st_mtime, stb.st_mode & 07777, getusername(stb.st_uid, target, options), getgroupname(stb.st_gid, target, options)); @@ -1420,7 +1420,7 @@ /* * Get file size */ - size = strtol(cp, &cp, 10); + size = strtoll(cp, &cp, 10); if (*cp++ != ' ') { error("recvit: size not delimited"); return; @@ -1523,7 +1523,7 @@ */ if (min_freespace || min_freefiles) { /* Convert file size to kilobytes */ - long fsize = (long) (size / 1024); + off_t fsize = (off_t) (size / 1024); if (getfilesysinfo(target, &freespace, &freefiles) != 0) return; Thanks.

