A few weeks ago one of my five rsync backup processes that runs from a daily cron 03:00 am has started failing with this exact issue.
I can only manually reproduce it when executing my rsync command with tripe verbosity: rsync -avvv --delete /source_dir /target_dir Manually calling rsync with less than triple verbosity runs the whole process as expected. However the hanging command used in cron is $ rsync -aq --delete /source_dir /target_dir > /dev/null $ strace -p 16857 16857 16:38:37 lstat("Saved/SavedArks/retracted.profilebak", {st_mode=S_IFREG|0600, st_size=22986, ...}) = 0 16857 16:38:37 write(1, "[sender] make_file(Saved/SavedArks/retracted.profilebak,*,2)\n", 69) = 69 16857 16:38:37 lstat("Saved/SavedArks/retracted.arkprofile", {st_mode=S_IFREG|0600, st_size=1810, ...}) = 0 16857 16:38:37 write(1, "[sender] make_file(Saved/SavedArks/retracted.arkprofile,*,2)\n", 69) = 69 16857 16:38:37 select(5, [], [4], [], {60, 0} <unfinished ...> 16858 16:39:36 <... select resumed> ) = 0 (Timeout) 16858 16:39:36 select(2, [], [1], [], {60, 0} <unfinished ...> 16859 16:39:37 <... select resumed> ) = 0 (Timeout) 16859 16:39:37 select(5, [], [4], [], {60, 0} <unfinished ...> 16857 16:39:37 <... select resumed> ) = 0 (Timeout) 16857 16:39:37 select(5, [], [4], [], {60, 0} $ lsof -p 15678 -ad 4,5 # note: I used multiple processes over multiple days, the PID will not be constant COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME rsync 15678 pamel 4u unix 0xffff8f88f9fec400 0t0 44294 type=STREAM rsync 15678 pamel 5u unix 0xffff8f88f9614800 0t0 44295 type=STREAM $ ls -l /proc/15678/fd/4 lrwx------ 1 pamel pamel 64 Mar 6 13:35 /proc/15678/fd/4 -> socket:[44294] $ ls -l /proc/15678/fd/5 lrwx------ 1 pamel pamel 64 Mar 6 13:35 /proc/15678/fd/5 -> socket:[44295] $ ss -nxp | grep -E '44294|44295' u_str ESTAB 0 59136 * 44296 * 44295 users:(("rsync",pid=15679,fd=1)) u_str ESTAB 0 97024 * 44294 * 44293 users:(("rsync",pid=15678,fd=4)) u_str ESTAB 82063 0 * 44293 * 44294 users:(("rsync",pid=15680,fd=0)) u_str ESTAB 53885 0 * 44295 * 44296 users:(("rsync",pid=15678,fd=5)) I have this issue on Ubuntu 17.04 with rsync 3.1.2, Debian 9.3 with rsync 3.1.2 and 3.1.3 (all as VM on ESXi 6.5). The only constants are the files that are being transferred: 6.6GB over 11'437 files. I am also certain that it does not hang on a specific file as it's sometimes a log file and sometimes a profile. I would say it happens 0.5s into the process after transferring about 44 files. $ rsync -avvv --delete /srv/ark/redacted/ShooterGame/Saved /srv/ark/backup/redacted/ [...] [sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2) [sender] make_file(Saved/SavedArks/redacted.arktribe,*,2) [sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2) send_files(29, /srv/ark/redacted/ShooterGame/Saved/Logs) Saved/Logs/ send_files(30, /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log) send_files mapped /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log of size 65472 calling match_sums /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log ^C [...] recv_file_name(Saved/SavedArks/redacted.profilebak) recv_file_name(Saved/SavedArks/redacted.arkprofile) recv_file_name(Saved/SavedArks/redacted.arkprofile) rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(504) [generator=3.1.3] [generator] _exit_cleanup(code=20, file=io.c, line=504): about to call exit(20) rsync: [generator] write error: Broken pipe (32) $ md5sum /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log 52596de045ba1759c68dfb8c1cd2d970 /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log $ id uid=1001(pamel) gid=1001(pamel) groups=1001(pamel),60(games) $ ls -lA /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log -rw------- 1 pamel pamel 65472 Apr 1 2017 /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log $ ls -lA /srv/ark/backup/ total 4 [...] drwxr-xr-x 3 pamel pamel 4096 Mar 6 13:25 redacted I've found other occurences of this issue but mostly people say "this sounds like a network issue". I'm only rsyncing between two local directories so I really hope it has nothing to do with network. Is there anything else I can do to further debug this issue? I can't really send 6.6GB of confidential data to replicate this issue and it's a really big issue that now has started to occur daily and requires me to manually abort the process and restart the services every morning. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsync in Ubuntu. https://bugs.launchpad.net/bugs/1528921 Title: rsync hangs on select(5, [], [4], [], {60, 0} Status in rsync package in Ubuntu: Confirmed Bug description: In the last few months my home directory backup stopped completing. I've been able to reproduce the problem on a single subdirectory although I had to add the --debug=all flag to reproduce it on that smaller directory. Specifically, this command never completes: rsync --debug=all -avz /tmp/html2 /tmp/rsynctest/ The html2 directory is a copy of gnuradio-3.7.8.1/build/docs/doxygen/html . When I strace the command, I see this: write(1, "sender finished /tmp/html2/atsc_"..., 58sender finished /tmp/html2/atsc__interleaver_8h__incl.md5 ) = 58 write(1, "send_files(338, /tmp/html2/atsc_"..., 59send_files(338, /tmp/html2/atsc__interleaver_8h__incl.png) ) = 59 open("html2/atsc__interleaver_8h__incl.png", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0664, st_size=264657, ...}) = 0 write(1, "html2/atsc__interleaver_8h__incl"..., 37html2/atsc__interleaver_8h__incl.png ) = 37 read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\n\253\0\0\2\233\10\6\0\0\0h\242\""..., 262144) = 262144 select(6, [5], [4], [5], {60, 0}) = 2 (in [5], out [4], left {59, 999996}) read(5, "\0\0\0\0\0\0\0\1\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\240\0\0\0"..., 95) = 95 write(4, "r\311\0\7\177\377\232\237\264\272e\300\300\240\316\264&\314\301\252*\37\256y\225g\373^\315j\370\350"..., 51574) = 51574 select(5, [], [4], [], {60, 0}) = 1 (out [4], left {59, 999997}) write(4, "\7\320\0\7\177\377\234|\7X\223Y\273\255c\27\25f\306\212\202\214#E\272\212t\1\225A\fU"..., 53259) = 53259 select(5, [], [4], [], {60, 0} The select command times out over and over. I get the same behavior when trying to back up my entire home directory but I don't need the --debug=all flag in that case. lsb_release -rd Description: Ubuntu 14.04.3 LTS Release: 14.04 apt-cache policy rsync rsync: Installed: 3.1.0-2ubuntu0.1 Candidate: 3.1.0-2ubuntu0.1 Version table: *** 3.1.0-2ubuntu0.1 0 500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main i386 Packages 500 http://security.ubuntu.com/ubuntu/ trusty-security/main i386 Packages 100 /var/lib/dpkg/status 3.1.0-2 0 500 http://us.archive.ubuntu.com/ubuntu/ trusty/main i386 Packages ProblemType: Bug DistroRelease: Ubuntu 14.04 Package: rsync 3.1.0-2ubuntu0.1 ProcVersionSignature: Ubuntu 3.13.0-74.118-generic 3.13.11-ckt30 Uname: Linux 3.13.0-74-generic i686 NonfreeKernelModules: nvidia ApportVersion: 2.14.1-0ubuntu3.19 Architecture: i386 CurrentDesktop: KDE Date: Wed Dec 23 09:44:17 2015 EcryptfsInUse: Yes InstallationDate: Installed on 2010-09-18 (1922 days ago) InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Beta i386 (20100901.1) SourcePackage: rsync UpgradeStatus: Upgraded to trusty on 2014-12-27 (361 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1528921/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp