Re: Custom Schedulers use-case

Robert Engels Fri, 17 Oct 2025 16:56:07 -0700

I am pretty sure you can resolve that with your own blocking mutex implementation - essentially you can control who wakes up when and how many are running at a given time. There may be so additional latency but doubtful it will register in a db oriented system.

On Oct 17, 2025, at 6:13 PM, Man Cao <[email protected]> wrote:

Thanks for the detailed response. We hope OpenJDK will commit to supporting custom schedulers at some point. For "self deadlock" cases, the API could provide some guidelines on what users should not do inside custom schedulers.

Our colleague David Gay also provided more details on the multi-tenancy use case:

Firestore is a cloud-based database, implemented with a multi-tenant (i.e., a single job serves many customers) architecture. Multi-tenancy allows us to serve small-scale customers very cheaply, but brings isolation challenges: : traffic to a single Firestore database can potentially affect the performance and availability of other databases by consuming all or most of the resources in one or more components. Thus providing isolation between customers sending traffic to the same task in a job is critical.

Specifically for Java: Firestore backends are implemented in Java, currently using a custom asynchronous programming framework which basically:
- provides all the usual Java control structures (try-catch, loops, etc)
- "automatic" suspension at (manually identified) blocking points
- scheduling of 'slices' of these asynchronous computations via a fair scheduler (we're using a stride scheduler)

Replacing our custom asynchronous programming framework with virtual threads is obviously highly desirable - much more readable and efficient code (and I can stop getting confused by continuations), but the fair scheduling of slices is an absolute requirement. We did experiments comparing the performance impact of an antagonistic workload from customer A on an 'innocent' workload from customer B:
- without fair scheduling, B sees two orders of magnitude worse latency (p50 and p99)
- with fair scheduling, B sees essentially no p50 latency impact and tolerable p99 impact
The 'without fair scheduling' measurements are effectively measuring how the linux kernel schedules our threads - I would expect broadly similar results from the default virtual thread scheduler as neither has any information on which customer owns which traffic to appropriately prioritise scheduling.

The above is partly summarised from https://research.google/pubs/firestore-the-nosql-serverless-database-for-the-application-developer/ - specifically see:
- section IV.C for the overview of our isolation approach
- section V.C and Figure 11 for the isolation benchmark

-Man

On Fri, Oct 10, 2025 at 1:01 AM Alan Bateman <[email protected]> wrote:
On 09/10/2025 22:11, Man Cao wrote:
> Hi loom developers,
>
> Official support for custom schedulers is highly valuable to some of
> our Java applications such as our colleague David Gay's use case.
>
> Are there any major concerns or obstacles to official support for
> custom schedulers?
>

There are some workloads that are not suited to a work stealing
scheduler. We've seen this with workloads that have low concurrency, not
a lot going on, and the scanning to "find work" consuming additional CPU
cycles that nobody wants to pay for. There may be merit in having the
JDK provide a different scheduler for such cases, more experimentation
is required.

There are folks that want to do things like using the AWT event thread,
or the JavaFX application thread, as the carrier. They've seen
coroutines used on UI threads in other systems and want to experiment
doing something similar. Early explorations into this did not go very far.

There are other folks that are interested in thread affinity, binding
virtual threads to specific carriers, and carriers to specific cores in
NUMA nodes. Some of this exploration is about integration with existing
systems that use event loops. We are looking forward to a write-up of
these explorations and any findings.

Beyond this there are folks doing fun things with simulation and other
experimentation.

I'm not familiar with David Gay's work except for Liam's mail to say
that they are doing something in the area of multi-tenancy. If a
write-up or a summary of the explorations and findings could be sent to
loom-dev then it would be useful.

To your question, the topic of custom schedulers is an
exploration/research topic. The JDK has to be cautious. Calling out to a
custom scheduler (= arbitrary code) from core/sensitive parts of the
runtime is very scary. It's very easy to "self deadlock" - we've seen
folks trying to use locks to coordinate between mounted virtual threads
and their carrier. We are also concerned that the API surface for
schedulers will grow.

There are two prototypes in the loom repo at this time, this is what
Liam linked to. We are hoping that folks that are interested in this
topic will try one or both and come back their findings. The more data,
esp. from real world usage, will help inform this project on whether
there is merit is going further with either direction or whether there
are other directions that might be more fruitful.

-Alan

Re: Custom Schedulers use-case

Reply via email to