martin f krafft wrote:
> I have a UTF-8 system, but occasionally, some filenames will still
> be non-UTF-8, e.g.
> 
>   touch ü $(echo ü | iconv -t latin1)
> 
> If I use vidir on that directory, the tmpfile will be encoded with
> latin1, which breaks the UTF-8 symbols.

Hmm, when I try it, after configuring vim not to force use utf-8
(I had fileencodings=utf-8 in my .vimrc), vim displays the latin1 file
properly and mis-displays the utf-8 one as two bytes. But when I save,
both filenames are encoded as before and come out uncorrupted.

If only the utf-8 file exists, then vim displays it correctly in vidir,
I assume it's autodetecting utf-8 in that case.

With fileencodings=utf-8, vim warns about an illegal byte in line 2
(the latin1 file), displays it as '?', and makes the temp file read-only.
You have to force it to write to cause the broken encoding to be written
out.

> I think it should instead
> force the editor to use a UTF-8 encoding (e.g. by running the file
> through `iconv -t utf-8` before spawning the editor

I don't think that will behave better than currently. One encoding
or the other will be screwed up. If I apply the change below, the utf-8
encoded filename will come through ok, but the other will be forced to
utf-8 and be mangled (to '�') in the process.

--- a/vidir
+++ b/vidir
@@ -79,6 +79,7 @@ Licensed under the GNU GPL.
 use File::Spec;
 use File::Temp;
 use Getopt::Long;
+use Encode;
 
 my $error=0;
 
@@ -120,6 +121,7 @@ my $c=0;
 foreach (@dir) {
        next if /^(.*\/)?\.$/ || /^(.*\/)?\.\.$/;
        $item{++$c}=$_;
+       $_=encode_utf8(decode_utf8($_));
        print OUT "$c\t$_\n";
 }
 @dir=();

If I instead apply this patch, vidir will die with an error
("utf8 "\xFC" does not map to Unicode"), which seems better than
corrupting data, but does not seem better than the current situation.

diff --git a/vidir b/vidir
index a77739f..fa49cc3 100755
--- a/vidir
+++ b/vidir
@@ -79,6 +79,7 @@ Licensed under the GNU GPL.
 use File::Spec;
 use File::Temp;
 use Getopt::Long;
+use Encode;
 
 my $error=0;
 
@@ -120,6 +121,7 @@ my $c=0;
 foreach (@dir) {
        next if /^(.*\/)?\.$/ || /^(.*\/)?\.\.$/;
        $item{++$c}=$_;
+       $_=encode_utf8(decode_utf8($_, 1));
        print OUT "$c\t$_\n";
 }
 @dir=();

-- 
see shy jo

Attachment: signature.asc
Description: Digital signature

Reply via email to