On 08.04.2013, Junio C Hamano wrote:
> [email protected] (Jürgen Kreileder) writes:
>
>> Fixes the encoding for several _plain actions and for text/* and */*+xml
>> blobs.
>>
>> Signed-off-by: Jürgen Kreileder <[email protected]>
I see that this patch does (or tries to do) two _independent_ things,
and should be split into at least two separate commits:
>> ---
>
> Thanks, will queue but not hold until I hear something from Jakub.
>
>> gitweb/gitweb.perl | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 1309196..9cfe5b5 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -3823,7 +3823,7 @@ sub blob_contenttype {
>> my ($fd, $file_name, $type) = @_;
>>
>> $type ||= blob_mimetype($fd, $file_name);
>> - if ($type eq 'text/plain' && defined $default_text_plain_charset) {
>> + if (($type =~ m!^text/\w[-\w]*$! || $type =~
>> m!^\w[-\w]*/\w[-\w]*\+xml$!) && defined $default_text_plain_charset) {
>> $type .= "; charset=$default_text_plain_charset";
>> }
First, it extends adding "; charset=$default_text_plain_charset" to
other mimetypes for 'blob_plain' view to all 'text/*' and '*/*+xml'
mimetypes (without changing name of variable... though this is more
complicated as it is configuration variable and we would want to
preserve backward compatibility, but at least a comment would be,
I think, needed).
Originally it applied only to 'text/plain' files, which can be
displayed inline by web browser, and which need charset in
'Content-Type:' HTTP header to be displayed correctly, as they
do not include such information inside the file.
'text/html' and 'application/xhtml+xml' can include such information
inside them (meta http-equiv for 'text/html' and <?xml ...> for
'application/xhtml+xml'). I don't know what browser does when there
is conflicting information about charset, i.e. which one wins, but
it is certainly something to consider.
It might be a good change; I don't know what web browser do when
serving 'text/css', 'text/javascript', 'text/xml' to client to
view without media type known.
BTW I have noticed that we do $prevent_xss dance outside
blob_contenttype(), in it's only caller i.e. git_blob_plain()...
which for example means that 'text/html' converted to 'text/plain'
don't get '; charset=...' added. I guess that it *might* be
what prompted this part of change... but if it is so, it needs
to be fixed at source, e.g. by moving $prevent_xss to
blob_contenttype() subroutine.
About implementation:
>> + if (($type =~ m!^text/\w[-\w]*$! || $type =~
>> m!^\w[-\w]*/\w[-\w]*\+xml$!) && defined $default_text_plain_charset) {
>> $type .= "; charset=$default_text_plain_charset";
would be better with line split
>> + if (($type =~ m!^text/\w[-\w]*$! || $type =~
>> m!^\w[-\w]*/\w[-\w]*\+xml$!) &&
>> + defined $default_text_plain_charset) {
>> $type .= "; charset=$default_text_plain_charset";
Also: do we need to be strict with '\w[-\w]*', or would '.*' suffice?
>> @@ -7637,7 +7637,9 @@ sub git_blobdiff {
>> last if $line =~ m!^\+\+\+!;
>> }
>> local $/ = undef;
>> + binmode STDOUT, ':raw';
>> print <$fd>;
>> + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
>> close $fd;
>> }
>> }
>> @@ -7884,12 +7886,16 @@ sub git_commitdiff {
>>
>> } elsif ($format eq 'plain') {
>> local $/ = undef;
>> + binmode STDOUT, ':raw';
>> print <$fd>;
>> + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
>> close $fd
>> or print "Reading git-diff-tree failed\n";
>> } elsif ($format eq 'patch') {
>> local $/ = undef;
>> + binmode STDOUT, ':raw';
>> print <$fd>;
>> + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
>> close $fd
>> or print "Reading git-format-patch failed\n";
>> }
Second it changes 'blobdiff_plain', 'commitdiff_plain' (which I think
that should be abandoned in favor of 'patch' view; but that is
a separate issue) and 'patch' views so they use binary-safe output.
Note that in all cases (I think) we use
$cgi->header(
-type => 'text/plain',
-charset => 'utf-8',
...
);
promising web browser that output is as whole in 'utf-8' encoding.
It is not explained in the commit message what is the reason for this
change. Is it better handing of a situation where files being diff-ed
being in other encoding (like for example in commit that changes
encoding of files from legacy encoding such like e.g. iso-8859-2
or cp1250 to utf-8)? But what about -charset => 'utf-8' then?
About implementation: I think after this change common code crosses
threshold for refactoring it into subroutine, for example:
sub dump_fh_raw {
my $fh = shift;
local $/ = undef;
binmode STDOUT, ':raw';
print <$fh>;
binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
return $fh;
}
--
Jakub Narębski
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html