Hello Bram, Some of what you are sharing confuses me. I don't think sharing the wall-clock-time is pertinent for background threads -- and I assume those Jetty HttpClients are in the background doing nothing. Yes, CoreContainer creates a Jetty HttpClient that is unused in an embedded mode. Curious; are you creating lots of CoreContainers (perhaps indirectly via creating EmbeddedSolrServer)? Maybe we have a regression there. I suspect a test environment would be doing this, creating a CoreContainer for each test, basically. Solr's tests do this too! And a slowdown as big as you show sounds like something we'd notice... most likely. On the other hand, if your CI/tests creates very few CoreContainers and there's all this slowdown you report, then CoreContainer startup is mostly irrelevant.
We do have a benchmark that should capture a slowdown in this area -- https://github.com/apache/solr/blob/9c911e7337cd1026accc1a825e26906039982328/solr/benchmark/src/java/org/apache/solr/bench/lifecycle/SolrStartup.java (scope is a bit larger but good enough) but we don't have continuous benchmarking over releases to make relative comparisons. We've been talking about that, but the recent discussions are unlikely to support a way to do this for embedded Solr. I've been working on this benchmark code lately as well. *Anyway*, I recommend that you try this benchmark, starting with its great README, mostly documenting JMH itself. If you do that and find some curious/suspicious things, I'd love to hear more! On Tue, Mar 24, 2026 at 3:51 AM Bram Luyten <[email protected]> wrote: > > Hi all, > > Disclaimer: I am a DSpace developer, not a Solr/Jetty internals > expert. Much of the profiling and analysis below was done with heavy > assistance from Claude. I'm sharing this because the data seems > significant, > but I may be misinterpreting some of it. Corrections and guidance are very > welcome. > > > CONTEXT > --------------- > > We are upgrading DSpace (open-source repository software) from > Spring Boot 3 / Solr 8 to Spring Boot 4 / Solr 10. Our integration > test suite uses embedded Solr via solr-core as a test dependency > (EmbeddedSolrServer style, no HTTP traffic -- everything is > in-process in a single JVM). > > After the upgrade, our IT suite went from ~31 minutes to ~2 hours > in CI. We spent considerable time profiling and eliminating other > causes (Hibernate 7, Spring 7, H2 database, GC, lock contention, > caching). Wall-clock profiling with async-profiler ultimately > pointed to embedded Solr as the primary bottleneck. > > Note: we previously reported the Solr 10 POM issue with missing > Jackson 2 dependency versions (solr-core, solr-solrj, solr-api). > We have the workaround in place (explicit dependency declarations), > so the embedded Solr 10 has a complete classpath. > > > THE PROBLEM > ---------------------- > > Wall-clock profiling (async-profiler -e wall) of the same test class > (DiscoveryRestControllerIT, 83 tests) on both branches shows: > > Component Main (Solr 8) SB4 (Solr 10) Difference > ---------------------------------------------------------------- > Solr total 3.6s 11.5s +7.9s > Hibernate 0.2s 0.2s 0.0s > H2 Database 0.1s 0.1s 0.0s > Spring 0.1s 0.1s 0.0s > Test total 68.4s 84.3s +15.9s > > Solr accounts for 50% of the total wall-clock difference (7.9s out > of 15.9s). Hibernate, H2, and Spring are essentially unchanged. > > > THE ROOT CAUSE > --------------------------- > > Breaking down the Solr wall-clock time by operation: > > Operation Main SB4 > --------------------------------------------------------------- > Jetty EatWhatYouKill.produce() 2558 (58%) -- > Jetty AdaptiveExecutionStrategy.produce() -- 12786 (91%) > DirectUpdateHandler2.commit() 522 (12%) 707 (5%) > SpellChecker.newSearcher() 119 (3%) 261 (2%) > > (Numbers are async-profiler wall-clock samples) > > The dominant operation is Jetty's NIO selector execution strategy: > > - Solr 8 / Jetty 9: EatWhatYouKill.produce(): 2558 samples (58%) > - Solr 10 / Jetty 12: AdaptiveExecutionStrategy.produce(): 12786 samples > (91%) > - That is a 5x increase in wall-clock time > > The full stack trace shows: > > ThreadPoolExecutor > -> MDCAwareThreadPoolExecutor > -> ManagedSelector (Jetty NIO selector) > -> AdaptiveExecutionStrategy.produce() > -> AdaptiveExecutionStrategy.tryProduce() > -> AdaptiveExecutionStrategy.produceTask() > -> ... -> KQueue.poll (macOS NIO) > > This is the Jetty HTTP client's NIO event loop. Even though we use > EmbeddedSolrServer (no HTTP traffic), Solr 10's CoreContainer > appears to create an internal Jetty HTTP client (likely for > inter-shard communication via HttpJettySolrClient). In embedded > single-node mode, this client has no work to do, but its NIO > selector thread still runs, and AdaptiveExecutionStrategy.produce() > idles much less efficiently than Jetty 9's EatWhatYouKill did. > > On macOS this manifests as busy-polling in KQueue.poll. The impact > may differ on Linux (epoll). > > > PROFILING METHODOLOGY > ----------------------------------------- > > - Tool: async-profiler 4.3 (wall-clock mode, safepoint-free) > - JDK: OpenJDK 21.0.9 > - Both branches use the same H2 2.4.240 test database > - Both branches use the same test code and Solr schema/config > - The only Solr-related difference is the Solr version (8.11.4 vs 10.0.0) > - Profiling was done on macOS (Apple Silicon), but the CI slowdown > (GitHub Actions, Ubuntu) shows the same pattern at larger scale > > > WHAT WE RULED OUT > --------------------------------- > > Before identifying the Solr/Jetty issue, we investigated and ruled > out many other causes: > > - Hibernate 7 overhead: SQL query count is similar (fewer on SB4), > query execution time is <40ms total for 1400+ queries > - H2 database: same version (2.4.240) on both branches, negligible > wall-clock difference > - GC pauses: only +0.7s extra on SB4 (1.4% of total difference) > - Lock contention: main actually has MORE lock contention than SB4 > - Hibernate session.clear(): tested with/without, no effect > - JaCoCo coverage: tested with/without, no effect > - Hibernate caching (L2, query cache): disabled both, no effect > - Hibernate batch fetch size: tested, no effect > > > QUESTIONS FOR THE SOLR TEAM > -------------------------------------------------- > > 1. Does embedded mode (EmbeddedSolrServer / CoreContainer without > an HTTP listener) need to create a Jetty HTTP client at all? > If the client is only for shard-to-shard communication, it > seems unnecessary in single-node embedded testing. > > 2. If the HTTP client is required, can its NIO selector / thread > pool be configured with minimal resources for embedded mode? > (e.g., fewer selector threads, smaller thread pool, or an > idle-friendly execution strategy) > > 3. Is there a Solr configuration (solr.xml property, system > property, or CoreContainer API) that we can use from the > consuming application to reduce this overhead? > > 4. Is this specific to macOS (KQueue) or does it also affect > Linux (epoll)? Our CI runs on Ubuntu and shows a larger > slowdown (3.8x) than local macOS (1.28x), which could be > related. > > ENVIRONMENT > ----------------------- > > Solr: 10.0.0 (solr-core as test dependency for embedded server) > Jetty: 12.0.x (pulled in transitively by Solr 10) > JDK: 21 > OS: macOS (profiled), Ubuntu (CI where the 4x slowdown manifests) > Project: DSpace (https://github.com/DSpace/DSpace) > PR: https://github.com/DSpace/DSpace/pull/11810 > > Happy to provide the full async-profiler flame graph files or > additional profiling data if useful. > > Thanks, > Bram Luyten, Atmire --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
