Bug#425040: id3v2: unicode support is broken and results in id3 tag data loss

Marius Mikučionis Tue, 18 Mar 2008 16:24:36 -0700

2008/3/17, Ben Hutchings <[EMAIL PROTECTED]>:
> The conversion process does *not* remove ID3v1 tags, so you may be able
>  to recover by deleting the ID3v2 tags (id3v2 -d).


No, somehow this does not recover the information. See a test below.

>  What encoding was used in the ID3v1 tags?  ID3v1 does not have any flag
>  to indicate encoding and is normally assumed to use ISO 8859-1.  Text
>  with this encoding seems to be converted correctly.

I am not 100% sure, but I think I have a mixture of UTF-8 and ISO8859-13 files.
AmaroK, easytag and others detect and display it correctly without any
assistance.
If they use LANG environment to guess the encoding then it must be UTF-8,
perhaps with some kind of smart fallback to ISO8859-13 when reading
(if id3 v1 *really* lacks the encoding info).

I did the following test:
1) recorded blank mp3
2) added/edited the tag with amarok (amarok and easytag display it
correctly, id3v2 shows that only id3 v1 tag is present, and UTF-8
characters are broken and interpreted as ISO8859-1)
3) did the conversion with "id3v2 -C" (id3v2 shows that id3 v1 and v2
are present, all tools show broken UTF-8 characters)
4) stripped with "id3v2 -d" (id3v2 shows that only id3 v1 tag is
present, all tools show broken UTF-8)

So my conclusion is that the other tools somehow know what is the
correct encoding and correctly interpret it, but id3v2 overwrites this
information effectively killing the method used by other tools.
Interestingly, easytag suggests to save some(?) tag information on
broken-by-id3v2 files, although I did not change anything. My blind
guess is that it found that the encoding information is missing and
wants to write something generic there, although results do not
improve (for obvious reasons).

I've put the files from the test here (perhaps you can dig it with hex dumps):
http://www.cs.aau.dk/~marius/id3v2


-- 
Marius Mikučionis

Bug#425040: id3v2: unicode support is broken and results in id3 tag data loss

Reply via email to