Guido Schnepp created VFS-637:
---------------------------------
Summary: Zip files with legacy encoding and special characters let
VFS crash
Key: VFS-637
URL: https://issues.apache.org/jira/browse/VFS-637
Project: Commons VFS
Issue Type: Bug
Environment: Windows 10 64 Bit, Java 8
Reporter: Guido Schnepp
Oracle has reworked the ZipFile object with Java 7. Since then the default
constructor used by commons-vfs2 2.1 is more restrictive than with Java 6. The
ZipFile constructor has got a second parameter (Charset) now for specification
of the legacy charset to be used explicitly if the ZipFile doesn't state its
UTF-8 compliance internally. This affects all ZIP files using a legacy charset
for filename encoding but not using UTF-8 is it is common today. This could be
a ZIP file with files containing german umlauts or russian characters in
archived file's filenames, for example.
To support this new parameter with (more or less) default values, the class
org.apache.commons.vfs2.provider.zip.ZipFileSystem has to be extended by a
default charset parameter, getter or setter (as you like) to forward this
setting to the java.util.zip.ZipFile constructor.
Quick workaround for me was to create a new OwnZipFileProvider referring to the
even new OwnZipFileSystem (extending ZipFileSystem) with the following modified
function. Change has been highlighted:
{{ protected ZipFile createZipFile(final File file) throws
FileSystemException {
try {
return new ZipFile(file{color:red}*,
Charset.forName("IBM437")*{color});
} catch (final IOException ioe) {
throw new
FileSystemException("vfs.provider.zip/open-zip-file.error", file, ioe);
}
}
}}
Presetting to charset 437 as legacy default charset seems to be a a good
workaround as stated in appendix D here:
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT :
"D.1 The ZIP format has historically supported only the original IBM PC
character encoding set, commonly referred to as IBM Code Page 437. This limits
storing file name characters to only those within the original MS-DOS range of
values and does not properly support file names in other character encodings,
or languages. [...]"
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)