Hello again, I have prepared two small patches.
1) Helper function to set proper type flag in dt based on type string. 2) In the gethttp function, there is an effective loop that reads from the socket until it receives a status code other than 10x. It is currently implemented with a label and a goto statement. I made an actual loop. I believe that these patches are two small steps in the direction that would make gethttp function shorter and easier to analyze. Now I would like to work on eliminating the other label from gethttp (used for http authentication). I was thinking about eventually moving this logic to http_loop (the socket is closed in this case anyway). Ie. on HTTP_STATUS_UNAUTHORIZED, the function would return to http_loop, and appropriate actions would be taken in that function (ie. another call to gethttp, with modified arguments). Later on, the code handling each http status code could be perhaps isolated and separated into smaller functions. Am I moving in the right directions? Do you have any suggestions? And BTW what do you think about initializing some pointer variables (like `head` or `resp`) to NULL in order to simplify resources management? W dniu 25.03.2015 o 23:16, Hubert Tarasiuk pisze: > I see. I will give it a try and will get back to you. > > Thank you for your assistance. > > Hubert > > W dniu 25.03.2015 o 22:49, Giuseppe Scrivano pisze: >> Please keep bug-wget in CC as it will include more people in the >> discussion. >> >> My refactoring wasn't very logical, I have just moved code outside of >> gethttp. I would like that each of these functions would make more sense >> and maybe be testable. Makes sense? >> >> Regards, >> Giuseppe >> >> Il 25 marzo 2015 22:11:47 CET, Hubert Tarasiuk >> <[email protected]> ha scritto: >> >> Hi Giuseppe, >> >> Yes, I can compile wget from repository. >> >> I've looked at the function you mentioned. I could do some more cleanup >> - mainly by factoring out parts of the code into separate functions - as >> I saw in your commits. Would that be appropriate, or did you mean >> something different? >> >> (Not sure if this part of conversation is relevant for entire bug-wget, >> so excluded it from CC. Let me know if I should keep CCing that list.) >> >> Best regards, >> Hubert >> >> >> W dniu 25.03.2015 o 21:00, Giuseppe Scrivano pisze: >> >> Hubert Tarasiuk <[email protected]> writes: >> >> Hello to everyone, >> >> My name is Hubert Tarasiuk and I am a CS student at the >> University of >> Warsaw, Poland (expected graduation date July 2015, B.Sc.). >> >> I am working on my Google Summer of Code proposal for two of >> the GNU >> wget ideas': >> >> Speed up Wget's Download Mechanism >> Improve Wget's Security >> >> >> As the deadline is approaching, I would like to start >> working on my >> patches. After checking the list on GitHub I see, that just >> one of the >> bugs appears to be good for me: 41002. Others are either >> closed/in >> progress/in discussion, or already have a patch proposed. >> >> My question: is the bug I mentioned still good for me, or >> perhaps is >> someone working on it already? >> If latter, can you provide an alternative? Or should I browse >> Savannah/Coverity and pick something myself? >> >> >> I think that bug was already fixed and there is not anything >> left to do. >> >> Were you already able to build wget from the git repository? >> >> I've started working on some cleanup for the gethttp function in the >> http.c file, but still there is much left to do, maybe it is >> easier for >> some fresh eyes to identify better how to make it more readable, >> would >> you be interested in that? >> >> Regards, >> Giuseppe >> >> >
From a1c00b354b7952dd35a7f5e26232b4d273b921eb Mon Sep 17 00:00:00 2001 From: hutabert <[email protected]> Date: Fri, 27 Mar 2015 14:00:33 +0100 Subject: [PATCH 1/2] Factor out set_content_type function from gethttp --- src/http.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/src/http.c b/src/http.c index 53c9818..ff7f58f 100644 --- a/src/http.c +++ b/src/http.c @@ -2330,6 +2330,27 @@ open_output_stream (struct http_stat *hs, int count, FILE **fp) return RETROK; } +/* Set proper type flags based on type string. */ +static void +set_content_type (int *dt, const char *type) +{ + /* If content-type is not given, assume text/html. This is because + o f the multitud*e of broken CGI's that "forget" to generate the + content-type. */ + if (!type || + 0 == strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S)) || + 0 == strncasecmp (type, TEXTXHTML_S, strlen (TEXTXHTML_S))) + *dt |= TEXTHTML; + else + *dt &= ~TEXTHTML; + + if (type && + 0 == strncasecmp (type, TEXTCSS_S, strlen (TEXTCSS_S))) + *dt |= TEXTCSS; + else + *dt &= ~TEXTCSS; +} + /* Retrieve a document through HTTP protocol. It recognizes status code, and correctly handles redirections. It closes the network socket. If it receives an error from the functions below it, it @@ -2957,21 +2978,7 @@ read_header: } } - /* If content-type is not given, assume text/html. This is because - of the multitude of broken CGI's that "forget" to generate the - content-type. */ - if (!type || - 0 == strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S)) || - 0 == strncasecmp (type, TEXTXHTML_S, strlen (TEXTXHTML_S))) - *dt |= TEXTHTML; - else - *dt &= ~TEXTHTML; - - if (type && - 0 == strncasecmp (type, TEXTCSS_S, strlen (TEXTCSS_S))) - *dt |= TEXTCSS; - else - *dt &= ~TEXTCSS; + set_content_type (dt, type); if (opt.adjust_extension) { -- 2.3.4
From abd70bc6332669f7d32937b1978ed1c6cfdabce6 Mon Sep 17 00:00:00 2001 From: hutabert <[email protected]> Date: Fri, 27 Mar 2015 15:35:57 +0100 Subject: [PATCH 2/2] Transform read_header label and goto into a loop --- src/http.c | 109 ++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 60 insertions(+), 49 deletions(-) diff --git a/src/http.c b/src/http.c index ff7f58f..5ee5f6b 100644 --- a/src/http.c +++ b/src/http.c @@ -2597,55 +2597,66 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, /* warc_write_request_record has also closed warc_tmp. */ } - -read_header: - head = read_http_response_head (sock); - if (!head) - { - if (errno == 0) - { - logputs (LOG_NOTQUIET, _("No data received.\n")); - CLOSE_INVALIDATE (sock); - request_free (req); - return HEOF; - } - else - { - logprintf (LOG_NOTQUIET, _("Read error (%s) in headers.\n"), - fd_errstr (sock)); - CLOSE_INVALIDATE (sock); - request_free (req); - return HERR; - } - } - DEBUGP (("\n---response begin---\n%s---response end---\n", head)); - - resp = resp_new (head); - - /* Check for status line. */ - message = NULL; - statcode = resp_status (resp, &message); - if (statcode < 0) - { - char *tms = datetime_str (time (NULL)); - logprintf (LOG_VERBOSE, "%d\n", statcode); - logprintf (LOG_NOTQUIET, _("%s ERROR %d: %s.\n"), tms, statcode, - quotearg_style (escape_quoting_style, - _("Malformed status line"))); - CLOSE_INVALIDATE (sock); - resp_free (resp); - request_free (req); - xfree (head); - return HERR; - } - - if (H_10X (statcode)) - { - DEBUGP (("Ignoring response\n")); - resp_free (resp); - xfree (head); - goto read_header; - } + /* Repeat while we receive a 10x response code. */ + { + bool _repeat; + + do + { + head = read_http_response_head (sock); + if (!head) + { + if (errno == 0) + { + logputs (LOG_NOTQUIET, _("No data received.\n")); + CLOSE_INVALIDATE (sock); + request_free (req); + return HEOF; + } + else + { + logprintf (LOG_NOTQUIET, _("Read error (%s) in headers.\n"), + fd_errstr (sock)); + CLOSE_INVALIDATE (sock); + request_free (req); + return HERR; + } + } + DEBUGP (("\n---response begin---\n%s---response end---\n", head)); + + resp = resp_new (head); + + /* Check for status line. */ + message = NULL; + statcode = resp_status (resp, &message); + if (statcode < 0) + { + char *tms = datetime_str (time (NULL)); + logprintf (LOG_VERBOSE, "%d\n", statcode); + logprintf (LOG_NOTQUIET, _("%s ERROR %d: %s.\n"), tms, statcode, + quotearg_style (escape_quoting_style, + _("Malformed status line"))); + CLOSE_INVALIDATE (sock); + resp_free (resp); + request_free (req); + xfree (head); + return HERR; + } + + if (H_10X (statcode)) + { + xfree (head); + resp_free (resp); + _repeat = true; + DEBUGP (("Ignoring response\n")); + } + else + { + _repeat = false; + } + } + while (_repeat); + } xfree(hs->message); hs->message = xstrdup (message); -- 2.3.4
signature.asc
Description: OpenPGP digital signature
