dweiss commented on PR #14651: URL: https://github.com/apache/lucene/pull/14651#issuecomment-2875398128
I dug deep into this, fascinating. So the problem is indeed in the release check flag. Here is a flame graph from a slowed-down execution of ecj:  you'll notice a lot of time spent in ```Files.exists``` there, which in turn is called from ```ClasspathJep247Jdk12```. JEP 247 (https://openjdk.org/jeps/247) introduced the ``--release`` option for the compiler and also introduced an - undocumented? - 'ct.sym' file, which is part of the JDK and is a zip file with class signatures from previous Java version. Couldn't find the docs on this file anywhere and ecj's source is a fun reading too: ``` * Abstraction to the ct.sym file access (see https://openjdk.java.net/jeps/247). The ct.sym file is required to * implement JEP 247 feature (compile with "--release" option against class stubs for older releases) and is currently * (Java 15) a jar file with undocumented internal structure, currently existing in at least two different format * versions (pre Java 12 and Java 12 and later). * <p> * The only documentation known seem to be the current implementation of * com.sun.tools.javac.platform.JDKPlatformProvider and probably some JDK build tools that construct ct.sym file. Root * directories inside the file are somehow related to the Java release number, encoded as single digit or letter (single * digits for releases 7 to 9, capital letters for 10 and higher). * <p> * If a release directory contains "system-modules" file, it is a flag that this release files are not inside ct.sym * file because it is the current release, and jrt file system should be used instead. * <p> * All other release directories contain encoded signature (*.sig) files with class stubs for classes in the release. * <p> * Some directories contain files that are shared between different releases, exact logic how they are distributed is * not known. * <p> * ... ``` Anyway. The cause of the slowdown in ecj is caused by many repetitive calls to zip Filesystem provider, which doesn't seem to be too efficient. There does seem to be some caches around it in CtSym.java but evidently it depends not only on ecj version but also on the content of ct.sym file - how many patch releases there were for the target version (all patch sub-directories need to be checked). It's quite insane. I would leave release checking out of ecj. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org