Jeremy, On 3/17/15 2:39 AM, Jeremy Boynes wrote: > On Mar 7, 2015, at 10:13 AM, Jeremy Boynes <jboy...@apache.org> wrote: >> >> On Mar 6, 2015, at 7:43 AM, Mark Thomas <ma...@apache.org> wrote: >>> Interesting. The deciding factor for me will be performance. Keep in >>> mind that we might not need all the API. As long as there is enough to >>> implement WebResourceSet and WebResource, we probably have all we need. >> >> I ran a micro-benchmark using the greenhouse WAR associated with the >> original bug. I instrumented JarWarResource to log all resources opened >> during startup and record the time. On my system it took ~21,000ms to start >> the application of which ~16,000ms was spent in getJarInputStreamWrapper(). >> 2935 resources were opened, primarily class files. >> >> I then replayed the log against the sandbox FS. With the current >> implementation it took ~300ms to open the war, ~350ms to open all the jars, >> and ~5ms to open all the entries with newInputStream(). >> >> I interpret that to mean that there is pretty constant time taken to inflate >> 15MB of data - the 300ms to scan the archive and the ~350ms to scan each of >> the jars within (each one that was used at least). The speed up here comes >> because we only scan each archive once, the downside is the extra memory >> used to store the inflated data. >> >> This is promising enough to me that I’m going to keep exploring. >> >> Konstantin’s patch, AIUI, creates an index for each jar which eliminates the >> need to scan jars on the classpath that don’t contain the class being >> requested. However, once the classloader has determined the jar file to use >> we still need to stream through that jar until we reach the desired entry. >> >> I think we can avoid that here by digging into the zip file’s internal >> metadata. Where I am currently streaming the jar to build the directory, >> with random access I can build an index just by reading the central >> directory structure. An index entry would contain the name, metadata, and >> the offset in the archive of the entry’s data. When an entry is opened would >> we inflate the data so that it could be used to underpin the channel. When >> the channel is closed the memory would be released. >> >> In general, I don’t think there’s a need for the FileSystem to retain >> inflated data after the channel is closed. This would be particularly true >> for the leaf resources which are not likely to be reused; for example, once >> a ClassLoader has used the .class file to define the Class or once a >> framework has processed a .xml config file then neither will need it again. >> >> However, I think the WAR ClassLoader would benefit from keeping the JAR >> files on the classpath open to avoid re-inflating them. The pattern though >> would be bursty e.g. lots of class loads during startup followed by >> quiescence. I can think of two ways to handle that: >> 1) FileSystem has maintains a cache of inflated entries much like a disk >> filesystem has buffers >> The FileSystem would be responsible for evictions, perhaps on a LRU or >> timed basis. >> 2) Having the classloader keep the JARs opened/mounted after loading a >> resource until such time as it thinks quiescence is reached. It would then >> unmount JARs to free the memory. >> We could do both as they don’t conflict. >> >> Next step will be to look into building the index directly from the >> archive’s central directory rather than by streaming it. > > Next step was actually just to verify that we could make a URLClassLoader > work with this API. I got this to work by turning the path URIs into > collection URLs (ending in ‘/‘) which prevented the classloader from trying > to open them as JarFiles. > > The classloader works but the classpath search is pretty inefficient relying > on UrlConnection#getInputStream throwing an Exception to detect if a resource > exists. Using it to load the 2935 resources from before took ~1900ms even > after the jars had been indexed. getInputStream() was called ~120,000 times > as the classpath was scanned, i.e. 15us per check with an average of ~40 > checks per resource which seems about right for a classpath that contains 73 > jars. > > An obvious solution to avoid the repeated search would be to union the jars’ > directories into a single index. I may try this with a PathClassLoader that > operates using a list of Paths rather than URLs.
I just wanted to let you know that I'm reading these with interest. I'm anxious to find out if this is going to pan-out. -chris
signature.asc
Description: OpenPGP digital signature