On Mon, May 21, 2007 at 11:50:32AM -0400, [EMAIL PROTECTED] wrote:
> 
> I'm getting the same error.  Bittornado has had the same problem for a while 
> now.
> 
> 
> 
> The cause is in how Unicode is handled in the .torrent file.  When a torrent 
> has files in it with Unicode characters, two versions of the file name are 
> stored.  One in utf8 format, the other stripped to  ASCII.  Some broken 
> clients **cough** BitComet **cough** put garbage characters in the ASCII 
> string when they generate a torrent with Unicode characters.  When cfv hits 
> these characters, it barfs.
> 

Actually there are two different standards for how non-ascii text should
be handled in a torrent file.  One is to include an 'encoding' field in
the torrent metainfo which specifies what encoding is used for all the
strings.  Cfv supports this.

The other is as you mention, where all the filenames contain two
versions, one that's useless (in unknown encoding, or maybe just
garbage), and the .utf-8 version.  Cfv doesn't currently support this,
so it just tries to read the normal filename and since it doesn't know
the charset it defaults to the safe value, ascii.  I'll add handling
this type of torrent to my todo list.

(Actually there is the third method which lamer/older clients use: just
include the raw strings in whatever random encoding and not specify
anything.  Not much can be done about that, though, other than letting
the user specify an encoding to use.  (The cfv 2.x devel code does allow
that.))

-- 
Matthew Mueller
[EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to