Hi, I've found two problems with the current use of lock_object/lock_completed in libpager:
- When a translator wants to synchroize all the contents of a pager, it must call to pager_sync(), which in turn calls to m_o_lock_object for the entire length of the object, and waits for lock_completed if sync==1. This way, we are asking the kernel to return us all dirty pages of that object by using m_o_data_return. When syncing a large object, it generates a lot of requests in a short amount of time, so a lot of threads are created to deal with them. Spliting the lock request in multiple calls for large objects helps a bit by giving the pager the chance to deal with some requests between calls (specially when using a queue, as I commented in a previous mail), but it is not a proper solution. - sync() doesn't really waits for the contents to be written to disk. Receiving a m_o_lock_completed from the kernel, only means that it has dispatched the data to a translator, but this one still needs to write them to the storage. This could be related to bug #29292. For solving these issues, I think pagers should be able to know which pages are currently dirty (implementing a new RPC?) before calling to m_o_lock_object. This way, they could wait for the pages to be returned by the kernel, assuring they are properly written before returning from sync(). They could also (indirectly) throttle the number of threads to be created, by knowking how many m_o_data_return requests are going to be received for each m_o_lock_object call. Other ideas?