[
https://issues.apache.org/jira/browse/VFS-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057955#comment-16057955
]
Bernd Eckenfels commented on VFS-637:
-------------------------------------
What do you think about StandardCharSet.ASCII or LATIN1 as default or would you
use absent (which throws for non UTF8 marked archives?)
> Zip files with legacy encoding and special characters let VFS crash
> -------------------------------------------------------------------
>
> Key: VFS-637
> URL: https://issues.apache.org/jira/browse/VFS-637
> Project: Commons VFS
> Issue Type: Bug
> Environment: Windows 10 64 Bit, Java 8
> Reporter: Guido Schnepp
> Labels: easyfix
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Oracle has reworked the ZipFile object with Java 7. Since then the default
> constructor used by commons-vfs2 2.1 is more restrictive than with Java 6.
> The ZipFile constructor has got a second parameter (Charset) now for
> specification of the legacy charset to be used explicitly if the ZipFile
> doesn't state its UTF-8 compliance internally. This affects all ZIP files
> using a legacy charset for filename encoding but not using UTF-8 is it is
> common today. This could be a ZIP file with files containing german umlauts
> or russian characters in archived file's filenames, for example.
> To support this new parameter with (more or less) default values, the class
> org.apache.commons.vfs2.provider.zip.ZipFileSystem has to be extended by a
> default charset parameter, getter or setter (as you like) to forward this
> setting to the java.util.zip.ZipFile constructor.
> Quick workaround for me was to create a new OwnZipFileProvider referring to
> the even new OwnZipFileSystem (extending ZipFileSystem) with the following
> modified function. Change has been highlighted:
> {{ protected ZipFile createZipFile(final File file) throws
> FileSystemException {
> try {
> return new ZipFile(file{color:red}*,
> Charset.forName("IBM437")*{color});
> } catch (final IOException ioe) {
> throw new
> FileSystemException("vfs.provider.zip/open-zip-file.error", file, ioe);
> }
> }
> }}
> Presetting to charset 437 as legacy default charset seems to be a a good
> workaround as stated in appendix D here:
> https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT :
> "D.1 The ZIP format has historically supported only the original IBM PC
> character encoding set, commonly referred to as IBM Code Page 437. This
> limits storing file name characters to only those within the original MS-DOS
> range of values and does not properly support file names in other character
> encodings, or languages. [...]"
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)