Re: WAR FileSystem for fast nested JAR access?

Konstantin Kolinko Wed, 04 Mar 2015 03:51:59 -0800

2015-03-04 8:20 GMT+03:00 Jeremy Boynes <jboy...@apache.org>:
> In https://bz.apache.org/bugzilla/show_bug.cgi?id=57251, Mark Thomas wrote:
>
>> The fix for bug 57472 might shave a few seconds of the deployment time but
>> it doesn't appear to make a significant difference.
>>
>> The fundamental problem when running from a packed WAR is that to access any
>> resource in a JAR, Tomcat has to do the following:
>> - open the WAR
>> - get the entry for the JAR
>> - get the InputStream for the JAR entry
>> - Create a JarInputStream
>> - Read the JarInputStream until it finds the entry it wants
>>
>> This is always going to be slow.
>>
>> The reason that it is fast in Tomcat 7 and earlier took some digging. In
>> unpackWARs is false in Tomcat 7, it unpacks the JARs anyway into the work
>> directory and uses them from there. Performance is therefore comparable with
>> unpackWARs="true".
>
> Has anyone looked into using a NIO2 FileSystem for this? It may offer a way 
> to avoid having to stream the entry in order to be able to locate a resource. 
> ZipFile is fast, I believe, because it has random access to the archive and 
> can seek directly to an entry's location based on the zip index; the jar: 
> FileSystem seems to be able to do the same.
>
> However, neither can cope with nested entries: ZipFile because its 
> constructor takes a File rather than a Path and uses native code, and ZipFS 
> because it relies on URIs and can't cope with a jar: URI based on another 
> jar: URI (ye olde problem with jar: URL syntax).
>
> What a FileSystem can do differently is return a FileChannel which supports 
> seek operations over the archive's content. IOW, if ZipFS can work given a 
> random access channel to bytes on disk, the same approach could be adopted 
> with a random access channel to bytes on a virtual FileSystem.
>
> I imagine that would get pretty hairy for write operations but fortunately we 
> would not need to deal with that.
>
> If no-one’s looked at it yet I'll take a shot.
> Cheers
> Jeremy
>
> FWIW, this could also be exposed to web applications e.g.
>   FileSystem webappFS = servletContext.getFileSystem();
>   Path resource = webappFS.getPath(request.getPathInfo());
>   Files.copy(resource, response.getOutputStream());
>


The fundamental issue is how the data of JAR file (as a whole) is
available via API.

To be able to use random access with the JAR you technically have to

1) Jump to the end of the JAR file and read the ZIP index ("Central
directory") that is located there. See the image at:
http://en.wikipedia.org/wiki/Zip_%28file_format%29

2) Jump to the specific file.

As JAR itself is compressed, there is no real API to jump to a
position in it, besides maybe InputStream.skip(). This skip() will
involve the same overhead as the current implementation that scans the
jar, unless the war has zero compression.


Also
1. Reading the zip index takes time and would better be cached. That
is the issue behind
https://bz.apache.org/bugzilla/show_bug.cgi?id=52448

2. It makes sense to cache the list of directories (packages) in the
zip file. Scanning the whole jar for a class that is not present there
is the worst case.  A bonus is that it can improve handling of JARs
that do not have explicit entries for directories.

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Re: WAR FileSystem for fast nested JAR access?

Reply via email to