Hi ML-followers,

I occasionally observe stalled scp connections while copying logfile archives 
from a debian server to cygwin on Windows 7. Besides 
https://sourceware.org/ml/cygwin/2015-02/msg00575.html I did not find similar 
issues, but there is no blocking/non-blocking switch involved in the middle of 
the data transmission (at random places in subsequent tests) through the pipe, 
where some data is lost (pls see below).

Pls find the redacted output of my "cygcheck -svr" attached. I also noticed 
that issue also before upgrading my cygwin to that latest version.

As I'm not into the details of read/write functions, I was only able to track 
down the issue as follows - I hope it is reproducible or otherwise helpful - so 
now I need your help:


1) I added some instrumentation in scp/ssh (both binaries copied to 
/usr/local/bin/ and used below):

user@host /usr/src/openssh-6.8p1-1.src/openssh-6.8p1-debug
$ grep -n -C4 TK: *.c
channels.c-1724-
channels.c-1725-                if (c->datagram) {
channels.c-1726-                        /* ignore truncated writes, datagrams 
might get lost */
channels.c-1727-                        len = write(c->wfd, buf, dlen);
channels.c:1728:logit("TK: channels.c: channel_handle_wfd(1): write: len=%d, 
buf=%.16s", len, buf);
channels.c-1729-                        free(data);
channels.c-1730-                        if (len < 0 && (errno == EINTR || errno 
== EAGAIN ||
channels.c-1731-                            errno == EWOULDBLOCK))
channels.c-1732-                                return 1;
--
channels.c-1745-                        dlen = MIN(dlen, 8*1024);
channels.c-1746-#endif
channels.c-1747-
channels.c-1748-                len = write(c->wfd, buf, dlen);
channels.c:1749:logit("TK: channels.c: channel_handle_wfd(2): write: len=%d, 
buf=%.16s", len, buf);
channels.c-1750-                if (len < 0 &&
channels.c-1751-                    (errno == EINTR || errno == EAGAIN || errno 
== EWOULDBLOCK))
channels.c-1752-                        return 1;
channels.c-1753-                if (len <= 0) {
--
channels.c-2368-                return 0;
channels.c-2369-
channels.c-2370-        /* Get the data. */
channels.c-2371-        data = packet_get_string_ptr(&data_len);
channels.c:2372:logit("TK: channels.c: channel_input_data: data_len=%d, 
buf=%.16s", data_len, data);
channels.c-2373-        win_len = data_len;
channels.c-2374-        if (c->datagram)
channels.c-2375-                win_len += 4;  /* string length header */
channels.c-2376-
--
dispatch.c-107-                         return r;
dispatch.c-108-                 if (type == SSH_MSG_NONE)
dispatch.c-109-                         return 0;
dispatch.c-110-         }
dispatch.c:111:logit("TK: dispatch.c: ssh_dispatch_run: type=%d", type);
dispatch.c-112-         if (type > 0 && type < DISPATCH_MAX &&
dispatch.c-113-             ssh->dispatch[type] != NULL) {
dispatch.c-114-                 if (ssh->dispatch_skip_packets) {
dispatch.c-115-                         debug2("skipped packet (type %u)", 
type);
--
scp.c-1110-                     count += amt;
scp.c-1111-                     do {
scp.c-1112-                             j = atomicio6(read, remin, cp, amt,
scp.c-1113-                                 scpio, &statbytes);
scp.c:1114:logit("TK: scp.c: sink: amt=%d, j=%d, i=%d, size=%d, count=%d, 
buf=%.16s", amt, j, i, size, count, cp);
scp.c-1115-                             if (j == 0) {
scp.c-1116-                                     run_err("%s", j != EPIPE ?
scp.c-1117-                                         strerror(errno) :
scp.c-1118-                                         "dropped connection");


2) Generated easily debuggable (numbered 16k ascii blocks) HUGEFILE on a debian 
server (file size taken from an original log archive):

$ for BLOCK in `seq -w 1 1 94795`; do for ELEM in `seq -w 1 1 1024`; do echo -n 
"B0$BLOCK-N00$ELEM-"; done; done > TEST.txt


3) Pulled that file in a loop from the debian server to the local cygwin on 
Windows 7:

$ while true; do /usr/local/bin/scp -v -C -o BatchMode=yes 
user@10.IP2.IP3.IP4:logDownload_m/TEST.txt 93_KO_TEST.txt 2>&1 | tee 
93_KO_scp_trace.txt; echo; date; echo; sleep 15; done


4) In case of stalled scp (in my env, it does not take that much time to get 
it, maybe 10-to-30 tries), collect and compare the "KO" output to a previously 
recorded and prepared "OK" output:

$ split -b 163840 -a 5 93_KO_TEST.txt 93_KO_TEST.txt.split_


$ for FILE in 93_OK_TEST.txt.split_*; do echo "$FILE"; diff -q $FILE `echo 
$FILE | sed -re "s/93_OK_/93_KO_/g"` || break; done
. . .
93_OK_TEST.txt.split_aanqv
93_OK_TEST.txt.split_aanqw
Files 93_OK_TEST.txt.split_aanqw and 93_KO_TEST.txt.split_aanqw differ


$ sdiff <(cat 93_OK_TEST.txt.split_aanqw | sed -re 
"s/(B......-N001024-)/\1\n/g") <(cat 93_KO_TEST.txt.split_aanqw | sed -re 
"s/(B......-N001024-)/\1\n/g")
B092261-N000001-B092261-N000002-B092261-N000003-B092261-N0000   
B092261-N000001-B092261-N000002-B092261-N000003-B092261-N0000
B092262-N000001-B092262-N000002-B092262-N000003-B092262-N0000   
B092262-N000001-B092262-N000002-B092262-N000003-B092262-N0000
B092263-N000001-B092263-N000002-B092263-N000003-B092263-N0000   
B092263-N000001-B092263-N000002-B092263-N000003-B092263-N0000
B092264-N000001-B092264-N000002-B092264-N000003-B092264-N0000   
B092264-N000001-B092264-N000002-B092264-N000003-B092264-N0000
B092265-N000001-B092265-N000002-B092265-N000003-B092265-N0000   
B092265-N000001-B092265-N000002-B092265-N000003-B092265-N0000
B092266-N000001-B092266-N000002-B092266-N000003-B092266-N0000   
B092266-N000001-B092266-N000002-B092266-N000003-B092266-N0000
B092267-N000001-B092267-N000002-B092267-N000003-B092267-N0000 <
B092268-N000001-B092268-N000002-B092268-N000003-B092268-N0000   
B092268-N000001-B092268-N000002-B092268-N000003-B092268-N0000
B092269-N000001-B092269-N000002-B092269-N000003-B092269-N0000   
B092269-N000001-B092269-N000002-B092269-N000003-B092269-N0000
B092270-N000001-B092270-N000002-B092270-N000003-B092270-N0000   
B092270-N000001-B092270-N000002-B092270-N000003-B092270-N0000
                                                              > 
B092271-N000001-B092271-N000002-B092271-N000003-B092271-N0000


... So, the 16k block "B092267" is missing in the "KO" case, lets check the 
scp/ssh logs, where we do have 3 steps where the HUGEFILE data is handled:

a) Writing it to the channel buffer in the ssh process after reading it from 
the socket and after decryption and decompression: "TK: channels.c: 
channel_input_data:"
b) Reading it from the channel buffer and writing it to the pipe towards the 
scp parent process in ssh process: "TK: channels.c: channel_handle_wfd(2):"
c) Reading it from the pipe in the scp process to write it to the target file: 
"TK: scp.c: sink:"

The logs show that the buffer is handled in a) and b), but does not show up in 
c), it is simply lost...

$ for TRACKER in TK:.channels.c:.channel_input_data: 
TK:.channels.c:.channel_handle_wfd.2.: TK:.scp.c:.sink:; do echo; cat 
93_KO_scp_trace.txt | grep "$TRACKER" | grep "B0922[67]"; done

TK: channels.c: channel_input_data: data_len=16384, buf=B092260-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092261-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092262-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092263-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092264-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092265-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092266-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092267-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092268-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092269-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092270-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092271-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092272-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092273-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092274-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092275-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092276-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092277-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092278-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B092279-N000001-

TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092260-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092261-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092262-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092263-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092264-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092265-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092266-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092267-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092268-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092269-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092270-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092271-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092272-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092273-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092274-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092275-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=32768, buf=B092276-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092278-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B092279-N000001-

TK: scp.c: sink: amt=65536, j=65536, i=1511587840, size=1553121293, 
count=65536, buf=B092261-N000001-
TK: scp.c: sink: amt=65536, j=65536, i=1511653376, size=1553121293, 
count=65536, buf=B092265-N000001-
TK: scp.c: sink: amt=65536, j=65536, i=1511718912, size=1553121293, 
count=65536, buf=B092270-N000001-
TK: scp.c: sink: amt=65536, j=65536, i=1511784448, size=1553121293, 
count=65536, buf=B092274-N000001-
TK: scp.c: sink: amt=65536, j=65536, i=1511849984, size=1553121293, 
count=65536, buf=B092278-N000001-

The 3rd line of c) "TK: scp.c: sink:" should be "buf=B092269-N000001-" if all 
of the 64k data of the 2nd line "count=65536, buf=B092265-N000001-" would have 
been read correctly from the pipe, but it is: "buf=B092270-N000001-", i.e. one 
16k block is written to the pipe by ssh ("TK: channels.c: 
channel_handle_wfd(2): write: len=16384, buf=B092267-N000001-"), but does not 
show up on teh other end of the pipe in scp.


In case bigger blocks are written to the pipe by ssh (e.g. "len=32768"), the 
bigger blocks get lost and does not show up in scp, e.g.:

93_KOOK_TEST.txt/93_OK_TEST.txt.split_aacmz VS 
92_KO_TEST.txt/92_KO_TEST.txt.split_aacmz
B016891-N000001-B016891-N000002-B016891-N000003-B016891-N0000 <
B016892-N000001-B016892-N000002-B016892-N000003-B016892-N0000 <
B016893-N000001-B016893-N000002-B016893-N000003-B016893-N0000   
B016893-N000001-B016893-N000002-B016893-N000003-B016893-N0000
. . . 
B016900-N000001-B016900-N000002-B016900-N000003-B016900-N0000   
B016900-N000001-B016900-N000002-B016900-N000003-B016900-N0000
                                                              > 
B016901-N000001-B016901-N000002-B016901-N000003-B016901-N0000
                                                              > 
B016902-N000001-B016902-N000002-B016902-N000003-B016902-N0000

92_KO_TEST.txt/92_KO_scp_trace.txt
TK: channels.c: channel_input_data: data_len=16384, buf=B016890-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B016891-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B016892-N000001-
TK: channels.c: channel_input_data: data_len=16384, buf=B016893-N000001-

TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B016890-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=32768, buf=B016891-N000001-
TK: channels.c: channel_handle_wfd(2): write: len=32768, buf=B016893-N000001-

TK: scp.c: sink: amt=65536, j=65536, i=276627456, size=1553121293, count=65536, 
buf=B016885-N000001-
TK: scp.c: sink: amt=65536, j=65536, i=276692992, size=1553121293, count=65536, 
buf=B016889-N000001-
TK: scp.c: sink: amt=65536, j=65536, i=276758528, size=1553121293, count=65536, 
buf=B016895-N000001-
TK: scp.c: sink: amt=65536, j=65536, i=276824064, size=1553121293, count=65536, 
buf=B016899-N000001-


Any help is much appreciated, thanks in advance!
T.

Attachment: cygcheck.REDACTED
Description: Binary data

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply via email to