On Mar 4, 2015, at 3:49 AM, Konstantin Kolinko <knst.koli...@gmail.com> wrote: > > 2015-03-04 8:20 GMT+03:00 Jeremy Boynes <jboy...@apache.org>: >> In https://bz.apache.org/bugzilla/show_bug.cgi?id=57251, Mark Thomas wrote: >> >>> The fix for bug 57472 might shave a few seconds of the deployment time but >>> it doesn't appear to make a significant difference. >>> >>> The fundamental problem when running from a packed WAR is that to access any >>> resource in a JAR, Tomcat has to do the following: >>> - open the WAR >>> - get the entry for the JAR >>> - get the InputStream for the JAR entry >>> - Create a JarInputStream >>> - Read the JarInputStream until it finds the entry it wants >>> >>> This is always going to be slow. >>> >>> The reason that it is fast in Tomcat 7 and earlier took some digging. In >>> unpackWARs is false in Tomcat 7, it unpacks the JARs anyway into the work >>> directory and uses them from there. Performance is therefore comparable with >>> unpackWARs="true". >> >> Has anyone looked into using a NIO2 FileSystem for this? It may offer a way >> to avoid having to stream the entry in order to be able to locate a >> resource. ZipFile is fast, I believe, because it has random access to the >> archive and can seek directly to an entry's location based on the zip index; >> the jar: FileSystem seems to be able to do the same. >> >> However, neither can cope with nested entries: ZipFile because its >> constructor takes a File rather than a Path and uses native code, and ZipFS >> because it relies on URIs and can't cope with a jar: URI based on another >> jar: URI (ye olde problem with jar: URL syntax). >> >> What a FileSystem can do differently is return a FileChannel which supports >> seek operations over the archive's content. IOW, if ZipFS can work given a >> random access channel to bytes on disk, the same approach could be adopted >> with a random access channel to bytes on a virtual FileSystem. >> >> I imagine that would get pretty hairy for write operations but fortunately >> we would not need to deal with that. >> >> If no-one’s looked at it yet I'll take a shot. >> Cheers >> Jeremy >> >> FWIW, this could also be exposed to web applications e.g. >> FileSystem webappFS = servletContext.getFileSystem(); >> Path resource = webappFS.getPath(request.getPathInfo()); >> Files.copy(resource, response.getOutputStream()); >> > > The fundamental issue is how the data of JAR file (as a whole) is > available via API. > > To be able to use random access with the JAR you technically have to > > 1) Jump to the end of the JAR file and read the ZIP index ("Central > directory") that is located there. See the image at: > http://en.wikipedia.org/wiki/Zip_%28file_format%29 > > 2) Jump to the specific file. > > As JAR itself is compressed, there is no real API to jump to a > position in it, besides maybe InputStream.skip(). This skip() will > involve the same overhead as the current implementation that scans the > jar, unless the war has zero compression. > > > Also > 1. Reading the zip index takes time and would better be cached. That > is the issue behind > https://bz.apache.org/bugzilla/show_bug.cgi?id=52448 > > 2. It makes sense to cache the list of directories (packages) in the > zip file. Scanning the whole jar for a class that is not present there > is the worst case. A bonus is that it can improve handling of JARs > that do not have explicit entries for directories.
I agree caching would help but I’m not convinced the lack thereof is the main cause of the speed issue here. From Mark’s description above, "Read the JarInputStream until it finds the entry it wants” sounds more problematic. “Open the WAR” and “get the entry for the JAR” can use ZipFile which uses random access to locate the bytes for the nested JAR. However, ZipFile only provides access to those bytes as an InputStream so we need to stream to locate the resource entry. As an aside, there’s also the issue that zip archives can have zombie entries left in the stream but removed from the central directory, so the only way to know if an entry should actually be returned is to read to the directory which happens to be at the end. AIUI, ZipInputStream will return those zombies as it proceeds. This is seldom an issue for JARs as they typically don’t have zombies. My suggestion for using an NIO2 FileSystem is because its API provides for nesting and for random access to the entries in the filesystem. Something like: Path war = FileSystems.getDefault().getPath(“real/path/of/application.war”); FileSystem warFS = FileSystems.newFileSystem(“war:” + war.toURI()); Path nestedJar = warFS.getPath(“/WEB-INF/lib/some.jar”); FileSystem jarFS = FileSystems.newFileSystem(“jar:” + nestedJar.toURI()); Path resource = jarFS.getPath(“some/resource.txt”); return Files.newInputStream(resource); // or newFileChannel(resource) etc. There are two requirements on the archive FileSystem implementation for this to work: * Support for nesting in the URI * Functioning implementation of newByteChannel or newFileChannel Unfortunately the jar: provider that comes with the JRE won’t do that. It has ye olde jar: URL nesting issues and requires the archive Path be provided by the default FileSystem. Its newByteChannel() returns a SeekableByteChannel that is not seekable (doh!) and newFileChannel() works by extracting the entry to a temp file. The former problem seems easy to work around. To support a seekable channel without extraction would be trickier as you would need to convert channel positions to the actual position in the compressed data which would mean digging into the compression block structure. However, I think the naive approach of scanning the entry data and then caching the block offsets would still be quicker than inflating to a temp file. — Jeremy
signature.asc
Description: Message signed with OpenPGP using GPGMail