Source: latex2html
Version: 2015-debian1-1
Severity: wishlist
Tags: upstream
User: reproducible-bui...@lists.alioth.debian.org
Usertags: toolchain timestamps username randomness
X-Debbugs-Cc: reproducible-bui...@lists.alioth.debian.org
Control: block -1 by 827115

Dear Maintainer,

While working on the "reproducible builds" effort [1], we have noticed
that some packages (including latex2html itself) use latex2html in their
building process, leading to the following reproducibility issues :

* keys from the perl hashes are not sorted. See
reproducible-output.patch to sort them and get a reproducible order.
* a timestamp is included in the output. See
honour-SOURCE_DATE_EPOCH.patch to use the SOURCE_DATE_EPOCH environment
variable when set [2]. This way, the timestamps correspond to the
sources date instead of the build date.
* the user name is included in the output. See
suppress-username-from-output.patch to strip it.
* the index keys are not fully ordered in the case cleaned values are
equal. See idx-sort-all.patch

Once these patches are applied, and once https://bugs.debian.org/827115
is fixed, latex2html can be built reproducibly in our current
experimental framework.

Regards,
Alexis Bienvenüe.

[1] https://wiki.debian.org/ReproducibleBuilds
[2] https://reproducible-builds.org/specs/source-date-epoch/

diff -Nru latex2html-2015-debian1/debian/changelog latex2html-2015-debian1/debian/changelog
--- latex2html-2015-debian1/debian/changelog	2016-01-19 19:24:18.000000000 +0100
+++ latex2html-2015-debian1/debian/changelog	2016-06-10 15:20:45.000000000 +0200
@@ -1,3 +1,9 @@
+latex2html (2015-debian1-1.0~reproducible1) UNRELEASED; urgency=medium
+
+  * Reproducible output.
+
+ -- Alexis Bienvenüe <p...@passoire.fr>  Fri, 10 Jun 2016 15:20:45 +0200
+
 latex2html (2015-debian1-1) unstable; urgency=medium
 
   * New upstream release (Closes: #647433)
diff -Nru latex2html-2015-debian1/debian/patches/honour-SOURCE_DATE_EPOCH.patch latex2html-2015-debian1/debian/patches/honour-SOURCE_DATE_EPOCH.patch
--- latex2html-2015-debian1/debian/patches/honour-SOURCE_DATE_EPOCH.patch	1970-01-01 01:00:00.000000000 +0100
+++ latex2html-2015-debian1/debian/patches/honour-SOURCE_DATE_EPOCH.patch	2016-06-10 15:47:57.000000000 +0200
@@ -0,0 +1,22 @@
+Description: Honour SOURCE_DATE_EPOCH
+ Honour the SOURCE_DATE_EPOCH environment variable, to make the output
+ reproducible.
+ See https://reproducible-builds.org/specs/source-date-epoch/
+Author: Alexis Bienvenüe <p...@passoire.fr>
+
+--- latex2html-2015-debian1.orig/latex2html.pin
++++ latex2html-2015-debian1/latex2html.pin
+@@ -15006,7 +15006,12 @@ sub brackets { ($OP, $CP);}
+ 
+ sub get_date {
+     local($format,$order) = @_;
+-    local(@lt) = localtime;
++    local(@lt);
++    if($ENV{SOURCE_DATE_EPOCH}) {
++        @lt = gmtime($ENV{SOURCE_DATE_EPOCH})
++    } else {
++        @lt = localtime;
++    }
+     local($d,$m,$y) = @lt[3,4,5];
+     if ($format =~ /ISO/) {
+ 	sprintf("%4d-%02d-%02d", 1900+$y, $m+1, $d);
diff -Nru latex2html-2015-debian1/debian/patches/idx-sort-all.patch latex2html-2015-debian1/debian/patches/idx-sort-all.patch
--- latex2html-2015-debian1/debian/patches/idx-sort-all.patch	1970-01-01 01:00:00.000000000 +0100
+++ latex2html-2015-debian1/debian/patches/idx-sort-all.patch	2016-06-13 14:49:30.000000000 +0200
@@ -0,0 +1,16 @@
+Description: Sort all index keys
+ Sort index keys, even if they are the same after beeing cleaned, to
+ get a reproducible output.
+Author: Alexis Bienvenüe <p...@passoire.fr>
+
+--- latex2html-2015-debian1.orig/latex2html.pin
++++ latex2html-2015-debian1/latex2html.pin
+@@ -8536,7 +8536,7 @@ sub keysort {
+     # Put alphabetic characters after symbols; already downcased
+     $x =~ s/^([a-z])/~~~$1/;
+     $y =~ s/^([a-z])/~~~$1/;
+-    $x cmp $y;
++    ($x cmp $y) || ($a cmp $b);
+ }
+ 
+ sub index_key_eq {
diff -Nru latex2html-2015-debian1/debian/patches/reproducible-output.patch latex2html-2015-debian1/debian/patches/reproducible-output.patch
--- latex2html-2015-debian1/debian/patches/reproducible-output.patch	1970-01-01 01:00:00.000000000 +0100
+++ latex2html-2015-debian1/debian/patches/reproducible-output.patch	2016-06-13 09:50:57.000000000 +0200
@@ -0,0 +1,260 @@
+Description: Make the output reproducible.
+ Sort perl hash keys to get the output reproducible.
+ See https://wiki.debian.org/ReproducibleBuilds/
+Author: Alexis Bienvenüe <p...@passoire.fr>
+
+Index: latex2html-2015-debian1/latex2html.pin
+===================================================================
+--- latex2html-2015-debian1.orig/latex2html.pin
++++ latex2html-2015-debian1/latex2html.pin
+@@ -1049,7 +1049,7 @@ sub restore_critical_variables {
+     # undef any renewed-commands...
+     # so the new defs are read from %new_command
+     local($cmd,$key,$code);
+-    foreach $key (keys %renew_command) {
++    foreach $key (sort keys %renew_command) {
+ 	$cmd = "do_cmd_$key";
+ 	$code = "undef \&$cmd"; eval($code) if (defined &$cmd);
+ 	if ($@) { print "\nundef \&do_cmd_$cmd failed"}
+@@ -1673,7 +1673,7 @@ sub make_comment {
+ 
+ sub wrap_other_environments {
+     local($key, $env, $start, $end, $opt_env, $opt_start);
+-    foreach $key (keys %other_environments) {
++    foreach $key (sort keys %other_environments) {
+ 	# skip bogus entries
+ 	next unless ($env = $other_environments{$key});
+ 	$key =~ s/:/($start,$end)=($`,$');':'/e;
+@@ -3849,7 +3849,8 @@ sub make_off_line_images {
+         print "\n\n*** LaTeXERROR\n"; return();
+     }
+ 
+-    while ( ($name, $page_num) = each %new_id_map) {
++    for $name (sort keys %new_id_map) {
++        $page_num = $new_id_map{$name};
+ 	# Extract the page, convert and save it
+ 	&extract_image($page_num,$orig_name_map{$page_num});
+     }
+@@ -3952,7 +3953,8 @@ sub make_images {
+ 	    if (s/$PREFIX$img_rx\.new/$PREFIX$1.$IMAGE_TYPE/go);
+     }
+     print "\n *** removing unnecessary images ***\n" if ($VERBOSITY > 1);
+-    while ( ($name, $page_num) = each %id_map) {
++    for $name (sort keys %id_map) {
++        $page_num = $id_map{$name};
+ 	$contents = $latex_body{$name};
+ 	if ($page_num =~ /^\d+\#\d+$/) { # If it is a page number
+ 	    do {		# Extract the page, convert and save it
+@@ -5130,8 +5132,8 @@ sub substitute_meta_cmds {
+     #
+     # Now substitute the new commands and environments:
+     # (must do them all together because of cross definitions)
+-    $new_cmd_rx = &make_new_cmd_rx(keys %new_command);
+-    $new_cmd_no_delim_rx = &make_new_cmd_no_delim_rx(keys %new_command);
++    $new_cmd_rx = &make_new_cmd_rx(sort keys %new_command);
++    $new_cmd_no_delim_rx = &make_new_cmd_no_delim_rx(sort keys %new_command);
+     $new_env_rx = &make_new_env_rx;
+     $new_end_env_rx = &make_new_end_env_rx;
+ #    $new_cnt_rx = &make_new_cnt_rx(keys %new_counter);
+@@ -5140,7 +5142,8 @@ sub substitute_meta_cmds {
+     $new_cmd_or_env_rx =~ s/^ \||\|$//;
+ 
+     print STDOUT "\nnew commands:\n" if ($VERBOSITY > 2);
+-    while (($cmd, $body) = each %new_command) {
++    for $cmd (sort keys %new_command) {
++        $body = $new_command{$cmd};
+ 	unless ($expanded{"CMD$cmd"}++) {
+ 	    print STDOUT ".$cmd " if ($VERBOSITY > 2);
+ 	    $new_command{$cmd} = &expand_body;
+@@ -5150,7 +5153,8 @@ sub substitute_meta_cmds {
+     }
+ 
+     print STDOUT "\nnew environments:\n" if ($VERBOSITY > 2);
+-    while (($cmd, $body) = each %new_environment) {
++    for $cmd (sort keys %new_environment) {
++        $body = $new_environment{$cmd};
+ 	unless ($expanded{"ENV$cmd"}++) {
+ 	    print STDOUT ".$cmd" if ($VERBOSITY > 2);
+ 	    $new_environment{$cmd} = &expand_body;
+@@ -5160,39 +5164,42 @@ sub substitute_meta_cmds {
+ 
+     print STDOUT "\nnew counters and dependencies:\n" if ($VERBOSITY > 2);
+     &clear_mydb("dependent") if ($DEBUG);     #avoids appending to a previous version
+-    while (($cmd, $body) = each %dependent) {
++    for $cmd (sort keys %dependent) {
++        $body = $dependent{$cmd};
+ 	print STDOUT ".($cmd,$body)" if ($VERBOSITY > 2);
+         &write_mydb("dependent", $cmd, $dependent{$cmd});
+     }
+     &clear_mydb("img_style") if ($DEBUG);     #avoids appending to a previous version
+-    while (($cmd, $body) = each %img_style) {
++    for $cmd (sort keys %img_style) {
+         &write_mydb("img_style", $cmd, $img_style{$cmd});
+     }
+ 
+     &clear_mydb("depends_on") if ($DEBUG);     #avoids appending to a previous version
+-    while (($cmd, $body) = each %depends_on) {
++    for $cmd (sort keys %depends_on) {
++        $body = $dependent_on{$cmd};
+ 	print STDOUT ".($cmd,$body)" if ($VERBOSITY > 2);
+         &write_mydb("depends_on", $cmd, $depends_on{$cmd});
+     }
+ 
+ 
+     &clear_mydb("styleID") if ($DEBUG);     #avoids appending to a previous version
+-    while (($cmd, $body) = each %styleID) {
++    for $cmd (sort keys %styleID) {
+         &write_mydb("styleID", $cmd, $styleID{$cmd});
+     }
+ 
+     &clear_mydb("env_style") if ($DEBUG);     #avoids appending to a previous version
+-    while (($cmd, $body) = each %env_style) {
++    for $cmd (sort keys %env_style) {
+         &write_mydb("env_style", $cmd, $env_style{$cmd});
+     }
+     &clear_mydb("txt_style") if ($DEBUG);     #avoids appending to a previous version
+-    while (($cmd, $body) = each %txt_style) {
++    for $cmd (sort keys %txt_style) {
+         &write_mydb("txt_style", $cmd, $txt_style{$cmd});
+     }
+ 
+     print STDOUT "\ntheorem counters:\n" if ($VERBOSITY > 2);
+     &clear_mydb("new_theorem") if ($DEBUG);     #avoids appending to a previous version
+-    while (($cmd, $body) = each %new_theorem) {
++    for $cmd (sort keys %new_theorem) {
++        $body = $new_theorem{$cmd};
+ 	print STDOUT ".($cmd,$body)" if ($VERBOSITY > 2);
+         &write_mydb("new_theorem", $cmd, $new_theorem{$cmd});
+     }
+@@ -6522,7 +6529,7 @@ sub parse_keyvalues {
+ #	    s/(^|,)\s*([a-zA-Z]+)\s*\=\s*(\"([^"]*)\"|\'([^\']*)\'|([#%&@;:+-\/\w\d]*))\s*/
+ 	    s/(^|,)\s*([a-zA-Z]+)\s*\=\s*(\"([^"]*)\"|\'([^\']*)\'|([^<>,=\s]*))\s*/
+ 		$attributes{$2}=($4?$4:($5?$5:$6));' '/eg;
+-	    foreach $key (keys %attributes){ 
++	    foreach $key (sort keys %attributes){ 
+ 		$KEY = $key;
+ 		$KEY =~ tr/a-z/A-Z/;
+ 		if ($taglist =~ /,$KEY,/i) {	        
+@@ -6564,7 +6571,7 @@ sub parse_keyvalues {
+ 	# with no tags provided, just list the key-value pairs
+ 	$_ = $saved;
+ 	s/\s*(\w+)\s*=\s*\"?(\w+)\"?\s*,?/$attributes{$1}=$2;''/eg;
+-	foreach $key (keys %attributes){ 
++	foreach $key (sort keys %attributes){ 
+ 	    $KEY = $key;
+ 	    $KEY =~ tr/a-z/A-Z/;
+ 	    $atts = $attributes{$key};
+@@ -6633,7 +6640,7 @@ sub extract_attributes {
+ 	if ($$name) { $taglist = $$name }
+     }
+     s/\s*(\w+)\s*=\s*\"?(\w+)\"?\s*,?/$attributes{$1}=$2;''/eg;
+-    foreach $key (keys %attributes){ 
++    foreach $key (sort keys %attributes){ 
+ 	if ($taglist =~ /\,$key\,/) {
+ 	    $attribs .= " $key=\"$attributes{$key}\"";
+ 	    &write_warnings("valid attribute $key for $tag\n");
+@@ -7197,7 +7204,8 @@ TD.eqno			{ }	/* equation-number cells *
+ EOF
+     }
+     print "\n *** Adding document-specific styles *** ";
+-    while (($env,$style) = each %env_style) {
++    for $env (sort keys %env_style) {
++        $style = $env_style{$env};
+         if ($env =~ /\./) {
+             $env =~ s/\.$//;
+             print STYLESHEET "$env\t\t{ $style }\n";
+@@ -7213,10 +7221,12 @@ EOF
+             print STYLESHEET "DIV.$env\t\t{ $style }\n";
+         }
+     }
+-    while (($env,$style) = each %txt_style) {
++    for $env (sort keys %txt_style) {
++        $style = $txt_style{$env};
+         print STYLESHEET "SPAN.$env\t\t{ $style }\n";
+     }
+-    while (($env,$style) = each %img_style) {
++    for $env (sort keys %img_style) {
++        $style = $img_style{$env};
+         print STYLESHEET "IMG.$env\t\t{ $style }\n";
+     }
+ 
+@@ -8832,8 +8842,9 @@ sub replace_cite_marks {
+     #
+     #RRM: Associate the cite_key with  $citefile , for use by other segments.
+     if ($citefile) {
+-	local($cite_key, $cite_ref);
+-	while (($cite_key, $cite_ref) = each %cite_info) {
++        local($cite_key, $cite_ref);
++        for $cite_key (sort keys %cite_info) {
++            $cite_ref = $cite_info{$cite_key};
+ 	    if ($ref_files{'cite_'."$cite_key"} ne $citefile) {
+ 		$ref_files{'cite_'."$cite_key"} = $citefile;
+ 		$changed = 1; }
+@@ -9802,7 +9813,7 @@ sub replace_word {
+ # for use in regular expressions;
+ sub get_current_sections {
+     local($_, $key);
+-    foreach $key (keys %section_commands) {
++    foreach $key (sort keys %section_commands) {
+ 	if ($key =~ /star/) {
+ 	    $_ = $key . "|" . $_}
+ 	else {
+@@ -10220,8 +10231,9 @@ sub save_array_in_file {
+ 	} else {
+ 	    print FILE "# LaTeX2HTML $TEX2HTMLVERSION\n";
+ 	    print FILE "# Associate $type original text with physical files.\n\n";
+-	}
+-	while (($uutxt,$file) = each %array) {
++        }
++        for $uutxt (sort keys %array) {
++            $file = $array{$uutxt};
+ 	    $uutxt =~ s|/|\\/|g;
+ 	    $uutxt =~ s|\\\\/|\\/|g;
+ 
+@@ -10676,7 +10688,8 @@ sub do_cmd_mbox {
+ sub generate_declaration_subs {
+     local($key, $val, $pre, $post, $code );
+     print "\n *** processing declarations ***\n";
+-    while ( ($key, $val) = each %declarations) {
++    for $key (sort keys %declarations) {
++        $val = $declarations{$key};
+ 	if ($val) {
+ 	    ($pre,$post) = ('','');
+ 	    $val =~ m|</.*$|;
+@@ -10698,7 +10711,8 @@ sub generate_declaration_subs {
+ # *Generates* subroutines to handle each of the sectioning commands.
+ sub generate_sectioning_subs {
+     local($key, $val, $cmd, $body);
+-    while ( ($key, $val) = each %standard_section_headings) {
++    for $key (sort keys %standard_section_headings) {
++        $val = $standard_section_headings{$key};
+ 	$numbered_section{$key} = 0;
+ 	eval "sub do_cmd_$key {"
+ 	    . 'local($after,$ot) = @_;'
+@@ -13329,7 +13343,7 @@ sub do_cmd_textohtmlindex {
+ # when using  makeidx.perl
+ sub make_index_labels {
+     local($key, @keys);
+-    @keys = keys %index_labels;
++    @keys = sort keys %index_labels;
+     foreach $key (@keys) {
+ 	if (($ref_files{$key}) && !($ref_files{$key} eq "$idxfile")) {
+ 	    local($tmp) = $ref_files{$key};
+@@ -13345,7 +13359,7 @@ sub make_preindex { &make_real_preindex
+ sub make_real_preindex {
+     local($key, @keys, $head, $body);
+     $head = "<HR>\n<H4>Legend:</H4>\n<DL COMPACT>";
+-    @keys = keys %index_segment;
++    @keys = sort keys %index_segment;
+     foreach $key (@keys) {
+ 	local($tmp) = "segment$key";
+ 	$tmp = $ref_files{$tmp};
+@@ -16777,7 +16791,7 @@ sub addto_languages {
+ sub make_raw_arg_cmd_rx {
+     # $1 or $2 : commands to be processed in latex (with arguments untouched)
+     # $4 : delimiter
+-    $raw_arg_cmd_rx = &make_new_cmd_rx(keys %raw_arg_cmds);
++    $raw_arg_cmd_rx = &make_new_cmd_rx(sort keys %raw_arg_cmds);
+     $raw_arg_cmd_rx;
+ }
+ 
diff -Nru latex2html-2015-debian1/debian/patches/series latex2html-2015-debian1/debian/patches/series
--- latex2html-2015-debian1/debian/patches/series	2016-01-19 19:15:15.000000000 +0100
+++ latex2html-2015-debian1/debian/patches/series	2016-06-13 14:48:36.000000000 +0200
@@ -2,3 +2,7 @@
 debian-install.patch
 perl5.22-defined-array.patch
 perl5.22-unescaped-left-braces.patch
+reproducible-output.patch
+honour-SOURCE_DATE_EPOCH.patch
+suppress-username-from-output.patch
+idx-sort-all.patch
diff -Nru latex2html-2015-debian1/debian/patches/suppress-username-from-output.patch latex2html-2015-debian1/debian/patches/suppress-username-from-output.patch
--- latex2html-2015-debian1/debian/patches/suppress-username-from-output.patch	1970-01-01 01:00:00.000000000 +0100
+++ latex2html-2015-debian1/debian/patches/suppress-username-from-output.patch	2016-06-10 15:50:45.000000000 +0200
@@ -0,0 +1,25 @@
+Description: Strip username from output,
+ to make the output reproducible.
+ See https://reproducible-builds.org/
+Author: Alexis Bienvenüe <p...@passoire.fr>
+
+--- latex2html-2015-debian1.orig/latex2html.pin
++++ latex2html-2015-debian1/latex2html.pin
+@@ -186,7 +186,7 @@ $PARTITION_PREFIX = 'part_' unless $PART
+ 
+ # Author address
+ @address_data = &address_data('ISO');
+-$ADDRESS = "$address_data[0]\n$address_data[1]";
++$ADDRESS = "$address_data[1]";
+ 
+ # ensure non-zero defaults
+ $MAX_SPLIT_DEPTH = 4 unless ($MAX_SPLIT_DEPTH);
+@@ -14088,7 +14088,7 @@ sub default_textohtmlinfopage {
+ 	, "<STRONG>latex2html</STRONG> <TT>$argv</TT>\n"
+ 	, (($SHOW_INIT_FILE && ($INIT_FILE ne ''))?
+ 	   "\n<P>with initialization from: <TT>$INIT_FILE</TT>\n$init_file_mark\n" :'')
+-	, "<P>The translation was initiated by $address_data[0] on $address_data[1]"
++	, "<P>The translation was initiated on $address_data[1]"
+ 	, $open_all, $_)
+       : join('', $close_all, "$INFO\n", $open_all, $_));
+     $_;

Reply via email to