Package: approx Version: 3.0.0+b1 Severity: normal Tags: patch I'm using http://ftp.se.debian.org/debian/ as a mirror and it seems that this site does a 302 redirection on some (but not all) requests. Unfortunately this doesn't work well with approx, because approx does not seem to handle redirections properly. My approx configuration file /etc/approx/approx.conf contains the following lines:
$verbose true $debug true debian http://ftp.se.debian.org/debian For example, the command "apt-get source unzip" gives, at the time of writing, the following output. (Note that this example might become invalid as it seems that it varies over time which requests the server will be redirecting.) # apt-get source unzip Reading package lists... Done Building dependency tree Reading state information... Done Skipping already downloaded file 'unzip_5.52-10.dsc' Skipping already downloaded file 'unzip_5.52-10.diff.gz' Need to get 1140kB of source archives. Err http://localhost testing/main unzip 5.52-10 (tar) 404 Not Found Failed to fetch http://localhost:9999/debian/pool/main/u/unzip/unzip_5.52.orig.tar.gz 404 Not Found E: Failed to fetch some archives. Looking at the syslog gives an indication what went wrong: Mar 5 22:05:25 haakonst1 approx: Version: 3.0.0+b1 Mar 5 22:05:25 haakonst1 approx: Interface: any Mar 5 22:05:25 haakonst1 approx: Port: 9999 Mar 5 22:05:25 haakonst1 approx: Max rate: unlimited Mar 5 22:05:25 haakonst1 approx: User: approx Mar 5 22:05:25 haakonst1 approx: Group: approx Mar 5 22:05:25 haakonst1 approx: Syslog: daemon Mar 5 22:05:25 haakonst1 approx: Pdiffs: true Mar 5 22:05:25 haakonst1 approx: Offline: false Mar 5 22:05:25 haakonst1 approx: Max wait: 10 Mar 5 22:05:25 haakonst1 approx: Verbose: true Mar 5 22:05:25 haakonst1 approx: Debug: true Mar 5 22:05:30 haakonst1 approx: Connection from 127.0.0.1:60339 Mar 5 22:05:30 haakonst1 approx: Request /debian/pool/main/u/unzip/unzip_5.52.orig.tar.gz Mar 5 22:05:30 haakonst1 approx: Host: localhost:9999 Mar 5 22:05:30 haakonst1 approx: Connection: keep-alive Mar 5 22:05:30 haakonst1 approx: User-Agent: Debian APT-HTTP/1.3 (0.7.11) Mar 5 22:05:30 haakonst1 approx: => cache miss Mar 5 22:05:30 haakonst1 approx: Command: /usr/bin/curl --fail --silent --location --include "http://ftp.se.debian.org/debian/pool/main/u/unzip/unzip_5.52.orig.tar.gz" Mar 5 22:05:31 haakonst1 approx: HTTP/1.1 302 Found Mar 5 22:05:31 haakonst1 approx: Date: Wed, 05 Mar 2008 21:05:30 GMT Mar 5 22:05:31 haakonst1 approx: Server: Apache/2.2.6 (Unix) Mar 5 22:05:31 haakonst1 approx: Location: http://saimei.acc.umu.se/debian/pool/main/u/unzip/unzip_5.52.orig.tar.gz Mar 5 22:05:31 haakonst1 approx: Content-Length: 373 Mar 5 22:05:31 haakonst1 approx: Content-Type: text/html; charset=iso-8859-1 Mar 5 22:05:36 haakonst1 approx: Unexpected status code: 302 Mar 5 22:05:36 haakonst1 approx: http://ftp.se.debian.org/debian/pool/main/u/unzip/unzip_5.52.orig.tar.gz: not found Approx invokes curl with the --location option, which will cause it to follow the redirection. However, running this curl command as it was invoked by approx gives the following initial output: # /usr/bin/curl --fail --silent --location --include \ http://ftp.se.debian.org/debian/pool/main/u/unzip/unzip_5.52.orig.tar.gz \ | head -n 20 HTTP/1.1 302 Found Date: Wed, 05 Mar 2008 21:15:15 GMT Server: Apache/2.2.6 (Unix) Location: http://saimei.acc.umu.se/debian/pool/main/u/unzip/unzip_5.52.orig.tar.gz Content-Length: 373 Content-Type: text/html; charset=iso-8859-1 HTTP/1.1 200 OK Date: Wed, 05 Mar 2008 21:15:15 GMT Server: Apache/2.2.6 (Unix) Last-Modified: Tue, 01 Mar 2005 14:47:04 GMT ETag: "3ae57-116643-7852ea00" Accept-Ranges: bytes Content-Length: 1140291 Age: 2489 Content-Type: application/x-gzip [...] The problem seems to be that since the --include option is also specified curl will output the HTTP headers from both responses (this is also documented in the curl man page), but when looking at source, the functions in url.ml for parsing the output from curl apparently don't take this into account. I've attached two alternative patches to attempt to fix this problem. The first tries to handle a possible additional set of headers depending on the status code of each HTTP response output by curl. For this I also moved some of the header parsing code from approx.ml to url.ml. This became quite unwieldy and required a lot of changes, so the second patch tries to handle possible redirections directly by itself, and skips the use of the --location option when curl is invoked. In any case, I'll admit I don't really know the OCaml language, so even though these patches work for me, I won't be surprised if there's some problem with them or if they can be discarded entirely for something better. -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (990, 'unstable'), (500, 'testing'), (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.22-2-686 (SMP w/2 CPU cores) Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/bash Versions of packages approx depends on: ii adduser 3.106 add and remove users and groups ii bzip2 1.0.4-3 high-quality block-sorting file co ii curl 7.18.0-1 Get a file from an HTTP, HTTPS or ii libc6 2.7-9 GNU C Library: Shared libraries ii libpcre3 7.6-2 Perl 5 Compatible Regular Expressi ii lsb-base 3.2-3 Linux Standard Base 3.2 init scrip approx recommends no packages. -- no debconf information
diff -Naur a/approx.ml b/approx.ml --- a/approx.ml 2007-12-07 23:17:06.000000000 +0000 +++ b/approx.ml 2008-03-05 20:58:55.000000000 +0000 @@ -254,35 +254,20 @@ close_cache resp.length resp.last_modified; if resp.length >= 0L then Delivered else Cached -let with_pair rex str proc = - match Pcre.extract ~rex ~full_match: false str with - | [| a; b |] -> proc (a, b) - | _ -> assert false - -let status_re = Pcre.regexp "^HTTP/\\d+\\.\\d+\\s+(\\d{3})\\s+(.*?)\\s*$" -let header_re = Pcre.regexp "^(.*?):\\s*(.*?)\\s*$" - -let process_header resp str = - let do_status (code, _) = - resp.status <- int_of_string code - in - let do_header (header, value) = - match String.lowercase header with - | "content-length" -> - (try resp.length <- Int64.of_string value - with Failure _ -> - error_message "Cannot parse Content-Length %s" value) - | "last-modified" -> - (try resp.last_modified <- Url.time_of_string value - with Invalid_argument _ -> - error_message "Cannot parse Last-Modified date %s" value) - | _ -> () - in - if debug then debug_message " %s" str; - try with_pair header_re str do_header - with Not_found -> (* e.g., status line or CRLF *) - try with_pair status_re str do_status - with Not_found -> error_message "Unrecognized response: %s" str +let process_header resp field = + match field with + | Url.Status code -> (resp.status <- code) + | Url.Header (header, value) -> + (match String.lowercase header with + | "content-length" -> + (try resp.length <- Int64.of_string value + with Failure _ -> + error_message "Cannot parse Content-Length %s" value) + | "last-modified" -> + (try resp.last_modified <- Url.time_of_string value + with Invalid_argument _ -> + error_message "Cannot parse Last-Modified date %s" value) + | _ -> ()) let process_body resp cgi str pos len = if resp.status = 200 then diff -Naur a/url.ml b/url.ml --- a/url.ml 2007-12-07 23:17:06.000000000 +0000 +++ b/url.ml 2008-03-05 20:58:55.000000000 +0000 @@ -57,26 +57,58 @@ let head_command = curl_command ["--head"] +let with_pair rex str proc = + match Pcre.extract ~rex ~full_match: false str with + | [| a; b |] -> proc (a, b) + | _ -> assert false + +let status_re = Pcre.regexp "^HTTP/\\d+\\.\\d+\\s+(\\d{3})\\s+(.*?)\\s*$" +let header_re = Pcre.regexp "^(.*?):\\s*(.*?)\\s*$" + +let next chan = + try + let header = (input_line chan) in + let n = String.length header in + if n > 0 && header.[n - 1] = '\r' then + Some (String.sub header 0 (n - 1)) + else + begin + error_message "Unexpected header: %s" header; + None + end + with End_of_file -> None + +type header_field = Status of int | Header of string * string + let iter_headers proc chan = - let next () = - try Some (input_line chan) - with End_of_file -> None + let rec skip () = (* skip headers until we see just CRLF *) + match next chan with + | Some header -> if String.length header > 0 then skip () else () + | None -> () in let rec loop () = - match next () with + match next chan with | Some header -> - let n = String.length header in - if n > 0 && header.[n - 1] = '\r' then - if n > 1 then - begin - proc (String.sub header 0 (n - 1)); - loop () - end - else - (* CRLF terminates headers *) - () - else - error_message "Unexpected header: %s" header + if String.length header > 0 then + let do_header (header, value) = + proc (Header (header, value)) + in + let do_status (status, _) = + let code = int_of_string status in + match code with + | 301 | 302 | 303 | 307 -> skip () + | _ -> proc (Status code) + in + if debug then debug_message " %s" header; + (try with_pair header_re header do_header + with Not_found -> (* e.g., status line or CRLF *) + try with_pair status_re header do_status + with Not_found -> + error_message "Unrecognized response: %s" header); + loop () + else + (* CRLF terminates headers *) + () | None -> () in loop () diff -Naur a/url.mli b/url.mli --- a/url.mli 2007-12-07 23:17:06.000000000 +0000 +++ b/url.mli 2008-03-05 20:58:55.000000000 +0000 @@ -22,10 +22,12 @@ val protocol : string -> protocol +type header_field = Status of int | Header of string * string + (* Perform HTTP HEAD (or equivalent for FTP and FILE) on the given URL and apply a callback to each header that is returned *) -val head : string -> (string -> unit) -> unit +val head : string -> (header_field -> unit) -> unit (* Download the specified URL with optional request headers, then apply callbacks to the headers and body chunks *) @@ -33,7 +35,7 @@ val download : string -> ?headers:string list -> - ?header_callback:(string -> unit) -> + ?header_callback:(header_field -> unit) -> (string -> int -> int -> unit) -> unit (* Download a file from a remote repository *)
diff -Naur a/approx.ml b/approx.ml --- a/approx.ml 2007-12-07 23:17:06.000000000 +0000 +++ b/approx.ml 2008-03-05 20:59:10.000000000 +0000 @@ -37,6 +37,8 @@ done; if not !foreground then use_syslog () +let max_redirects = 5 + let stat_file name = try Some (stat name) with Unix_error (ENOENT, "stat", _) -> None @@ -239,6 +241,7 @@ mutable status : int; mutable length : int64; mutable last_modified : float; + mutable location : string; mutable body_seen : bool } let initial_response_state = @@ -246,6 +249,7 @@ status = 0; length = -1L; last_modified = 0.; + location = "?"; body_seen = false } let new_response name = { initial_response_state with name = name } @@ -276,6 +280,7 @@ (try resp.last_modified <- Url.time_of_string value with Invalid_argument _ -> error_message "Cannot parse Last-Modified date %s" value) + | "location" -> (resp.location <- value) | _ -> () in if debug then debug_message " %s" str; @@ -306,15 +311,26 @@ let headers = if ims > 0. then [ "If-Modified-Since: " ^ Url.string_of_time ims ] else [] in - let resp = new_response name in - let header_callback = process_header resp in - let body_callback = process_body resp cgi in - Url.download url ~headers ~header_callback body_callback; - match resp.status with - | 200 -> finish_delivery resp - | 304 -> Not_modified - | 404 -> File_not_found - | n -> error_message "Unexpected status code: %d" n; File_not_found + let rec download_redirect redirects url = + let resp = new_response name in + let header_callback = process_header resp in + let body_callback = process_body resp cgi in + Url.download url ~headers ~header_callback body_callback; + match resp.status with + | 200 -> finish_delivery resp + | 301 | 302 | 303 | 307 -> + if redirects <= max_redirects then + download_redirect (redirects+1) resp.location + else + begin + error_message "Too many redirections"; + File_not_found + end + | 304 -> Not_modified + | 404 -> File_not_found + | n -> error_message "Unexpected status code: %d" n; File_not_found + in + download_redirect 0 url (* Download a file from an FTP repository *) diff -Naur a/url.ml b/url.ml --- a/url.ml 2007-12-07 23:17:06.000000000 +0000 +++ b/url.ml 2008-03-05 20:59:10.000000000 +0000 @@ -52,7 +52,7 @@ | str -> "--limit-rate " ^ str let curl_command options url = - sprintf "/usr/bin/curl --fail --silent --location %s %s %s" + sprintf "/usr/bin/curl --fail --silent %s %s %s" rate_option (String.concat " " options) (quoted_string url) let head_command = curl_command ["--head"]