Package: file
Version: 4.26-2
Severity: normal

UTF-32BE files beginning with a Byte Order Mark are not properly
detected because the unicode magic doesn't match a properly encoded
32-bit big-endian Byte Order Mark. The current match is for FE FF 00
00, but it should be 00 00 FE FF. The attached diff adds a new patch
to fix magic/Magdir/unicode.

This was initially reported on Ubuntu's bug tracker, but has been
confirmed on current debian git.

https://bugs.launchpad.net/ubuntu/+source/file/+bug/285309
From 2856f3bb0451b9cf648374199b68a9d94b1201d9 Mon Sep 17 00:00:00 2001
From: Adam Buchbinder <adam.buchbin...@gmail.com>
Date: Thu, 29 Jan 2009 16:11:37 -0500
Subject: [PATCH] Fix UTF-32BE BOM magic.

The UTF-32 BOM is wrong for big-endian files. According to the
Unicode FAQ [http://unicode.org/faq/utf_bom.html#bom4], it should
be 00 00 FE FF.
---
 debian/patches/00list                       |    1 +
 debian/patches/101-magic-fix-utf32be.dpatch |   22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)
 create mode 100644 debian/patches/101-magic-fix-utf32be.dpatch

diff --git a/debian/patches/00list b/debian/patches/00list
index 82c74bf..61b1234 100644
--- a/debian/patches/00list
+++ b/debian/patches/00list
@@ -1,3 +1,4 @@
+101-magic-fix-utf32be.dpatch
 202-magic-update-awk.dpatch
 203-magic-update-reiserfs.dpatch
 204-magic-update-asf.dpatch
diff --git a/debian/patches/101-magic-fix-utf32be.dpatch b/debian/patches/101-magic-fix-utf32be.dpatch
new file mode 100644
index 0000000..a7c7667
--- /dev/null
+++ b/debian/patches/101-magic-fix-utf32be.dpatch
@@ -0,0 +1,22 @@
+#! /bin/sh /usr/share/dpatch/dpatch-run
+## 101-magic-fix-utf32be.dpatch by Adam Buchbinder <adam.buchbin...@gmail.com>
+##
+## All lines beginning with `## DP:' are a description of the patch.
+## DP: UTF-32BE text is detected by the presence of the Byte Order Mark, in
+## DP: UTF-32BE encoding. The stock version of the BOM is incorrect; it should
+## DP: read 00 00 FE FF, according to the Unicode FAQ.[1] (LP: #285309)
+## DP: 
+## DP: [1] http://unicode.org/faq/utf_bom.html#bom4
+
+...@dpatch@
+diff -urNad file-4.24~/magic/Magdir/unicode file-4.24/magic/Magdir/unicode
+--- file-4.24~/magic/Magdir/unicode	2008-02-28 13:57:35.000000000 -0500
++++ file-4.24/magic/Magdir/unicode	2009-01-29 15:31:01.000000000 -0500
+@@ -9,6 +9,6 @@
+ 0	string	+/v+			Unicode text, UTF-7
+ 0	string	+/v/			Unicode text, UTF-7
+ 0	string	\335\163\146\163	Unicode text, UTF-8-EBCDIC
+-0	string	\376\377\000\000	Unicode text, UTF-32, big-endian
++0	string	\000\000\376\377	Unicode text, UTF-32, big-endian
+ 0	string	\377\376\000\000	Unicode text, UTF-32, little-endian
+ 0	string	\016\376\377		Unicode text, SCSU (Standard Compression Scheme for Unicode)
-- 
1.5.6.3

Reply via email to