On Mar 6, 2015, at 7:43 AM, Mark Thomas <ma...@apache.org> wrote:
> Interesting. The deciding factor for me will be performance. Keep in
> mind that we might not need all the API. As long as there is enough to
> implement WebResourceSet and WebResource, we probably have all we need.

I ran a micro-benchmark using the greenhouse WAR associated with the original 
bug. I instrumented JarWarResource to log all resources opened during startup 
and record the time. On my system it took ~21,000ms to start the application of 
which ~16,000ms was spent in getJarInputStreamWrapper(). 2935 resources were 
opened, primarily class files.

I then replayed the log against the sandbox FS. With the current implementation 
it took ~300ms to open the war, ~350ms to open all the jars, and ~5ms to open 
all the entries with newInputStream().

I interpret that to mean that there is pretty constant time taken to inflate 
15MB of data - the 300ms to scan the archive and the ~350ms to scan each of the 
jars within (each one that was used at least). The speed up here comes because 
we only scan each archive once, the downside is the extra memory used to store 
the inflated data.

This is promising enough to me that I’m going to keep exploring.

Konstantin’s patch, AIUI, creates an index for each jar which eliminates the 
need to scan jars on the classpath that don’t contain the class being 
requested. However, once the classloader has determined the jar file to use we 
still need to stream through that jar until we reach the desired entry.

I think we can avoid that here by digging into the zip file’s internal 
metadata. Where I am currently  streaming the jar to build the directory, with 
random access I can build an index just by reading the central directory 
structure. An index entry would contain the name, metadata, and the offset in 
the archive of the entry’s data. When an entry is opened would we inflate the 
data so that it could be used to underpin the channel. When the channel is 
closed the memory would be released.

In general, I don’t think there’s a need for the FileSystem to retain inflated 
data after the channel is closed. This would be particularly true for the leaf 
resources which are not likely to be reused; for example, once a ClassLoader 
has used the .class file to define the Class or once a framework has processed 
a .xml config file then neither will need it again.

However, I think the WAR ClassLoader would benefit from keeping the JAR files 
on the classpath open to avoid re-inflating them. The pattern though would be 
bursty e.g. lots of class loads during startup followed by quiescence. I can 
think of two ways to handle that:
1) FileSystem has maintains a cache of inflated entries much like a disk 
filesystem has buffers
   The FileSystem would be responsible for evictions, perhaps on a LRU or timed 
basis.
2) Having the classloader keep the JARs opened/mounted after loading a resource 
until such time as it thinks quiescence is reached. It would then unmount JARs 
to free the memory.
We could do both as they don’t conflict.

Next step will be to look into building the index directly from the archive’s 
central directory rather than by streaming it.

Cheers
Jeremy

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to