All these sound like very nice improvements!

On Fri, Aug 17, 2018 at 4:13 AM, Robert Middleton <osfan6...@gmail.com>
wrote:

> Interesting results!  I haven't experienced any problems with logging
> myself, but I've also never gone into benchmarking it.
>
> I don't have much experience with memory pools, but if you're still
> getting blockages on a call to new/malloc you may be able to get
> around it by using the memory pool and then calling the constructor
> in-place(I have never done this before, but I know it is possible to
> do). [1][2].  Or there may be some way to do it using a custom
> allocator, that's something that I have never done either.
>
> -Robert Middleton
>
> [1]: https://stackoverflow.com/questions/519808/call-a-
> constructor-on-a-already-allocated-memory
> [2]: https://stackoverflow.com/questions/222557/what-uses-
> are-there-for-placement-new
>
> On Thu, Aug 16, 2018 at 11:58 AM, Matt Sicker <boa...@gmail.com> wrote:
> > I don't know much about the state of log4cxx architecture, but nearly all
> > your points (other than the lock in stringstream) are points we optimize
> > for in log4j2 at least. Even the stringstream optimization sounds similar
> > to StringBuffer versus StringBuilder in java. As for the queue used in
> > async logging, I'm not sure what guarantees you get in C++ memory models,
> > but I'm curious if the disruptor queue pattern made its way over to
> boost?
> >
> > On Thu, 16 Aug 2018 at 10:37, Denys Smolainiuk <
> > denys.smolian...@harmonicinc.com> wrote:
> >
> >> Hello All,
> >>
> >> I'd like to share some experience as well as some patches with regard to
> >> using log4cxx in timing-critical application. First few words about our
> >> requirements: it's a service which must generate some network packets
> with
> >> up to hundred of microseconds precision. Thus, it's very important to
> have
> >> predictable code timing. One can argue that log4cxx is not very well
> suited
> >> for such applications, but surprisingly it works pretty well after some
> >> light tuning.
> >>
> >> So, what were the issues?
> >> Basically from library user's point of view they looked the same: one
> of a
> >> sudden logging done with LOG4CXX_DEBUG() macro could take unexpectedly
> long
> >> time to complete. For example the same trace which takes several μs for
> 99%
> >> of the time would take hundreds microseconds and even few milliseconds
> >> sometimes. After further investigation this has been traced down to few
> of
> >> the root-causes:
> >>
> >> 1. Asyns logger (which we have been using of course) has internal queue
> to
> >> pass log entries to background disk-writer thread. This queue is
> >> mutex-protected which might seem fine unless you think a little bit more
> >> about it. First of all, someone calling LOG4CXX_DEBUG() to simply put
> >> something into the log might not expect to be blocked inside waiting
> for a
> >> mutex at all. Second point is that, although there were measures taken
> to
> >> minimize time disk-thread holds that lock, OS-schedulers often work in a
> >> way that thread which is blocked on a mutex gets de-scheduled. With
> normal
> >> OS-scheduler quantum that means that the logging thread can be preempted
> >> for milliseconds.
> >>
> >> 2. There are some mutexes protecting internal states of both loggers and
> >> appenders. This means that two separate threads calling LOG4CXX_DEBUG()
> can
> >> block each other. Even if they are using different loggers they would
> block
> >> on appender! This has the same consequences for execution timing and the
> >> performance as described above.
> >>
> >> 3. std::stringstream class constructor has some internal locks on it's
> >> own. Unfortunately each MessageBuffer has it's own instance of this
> class.
> >> And also unfortunately MessageBuffer is created inside LOG4CXX_DEBUG()
> >> macro. There is optimization to not create stringstream for logging
> simple
> >> strings, but as soon as your log statement has single '<<' operator it's
> >> created.
> >>
> >> 4. Dynamic memory allocations. Unfortunately there are still quite few
> of
> >> them even though memory pool is used in some other places. Thus, hidden
> >> calls to new and malloc induce unpredictable delays.
> >>
> >> So, what we did to mitigate these problems?
> >>
> >> 1. Natural solution for this issue was to use atomic queue. There are
> few
> >> of them available, but we made use of boost::lockfree::queue as it can
> >> serve as a drop-in replacement allowing us to keep all present
> >> functionality.
> >>
> >> 2. After looking more into the code it has appeared that two concurrent
> >> calls to LOG4CXX_DEBUG() from within different threads are not harmful
> >> because internal structures of logger and appender are not being changed
> >> there. What only really requires protection is concurrency between
> logging
> >> and configuring. Thus, we came to a solution - read-write locks where
> >> logging calls act as readers and configuration/exiting calls are
> writers.
> >> With such approach multiple threads calling LOG4CXX_DEBUG() became free
> of
> >> any contention.
> >>
> >> 3. This problem also has one simple solution - make one static
> >> std::stringstream object per thread using std::thread_local.
> Unfortunately
> >> we found one drawback - thread_local memory is not released if thread is
> >> not detached or joined. As there is some code which does neither of
> this we
> >> made static stringstream a xml file configuration option. Also, there
> could
> >> be an issue with using multiple MessageBuffer instances from within
> single
> >> thread, but LOG4CXX_DEBUG() is not doing that.
> >>
> >> 4. At this time we didn't do anything to address dynamic memory
> allocation
> >> issue.
> >>
> >> So, if you want to give our patch a try it is attached. It's based on
> >> log4cxx 0.10.0-12 as found in Debian. There are new SHARED_MUTEX and
> >> LOCK_R/LOCK_W macros defined in mutex.h and synchronized.h for easy
> >> switching between RW and convenient mutexes for benchmarking. Also,
> there
> >> is an test application which spawns two threads doing logging with some
> >> small sleep between iterations. It prints to stdout if trace statement
> took
> >> more than 500 microseconds. It might look familiar too you because it's
> >> based on one of the examples from StackOverflow.
> >>
> >> In our testing modified log4cxx has about 20% better performance
> overall,
> >> but more importantly for us it has much less cases when log statement
> takes
> >> excessive time to complete. The second part is only true for CPUs with
> >2
> >> cores where all threads can physically run in parallel. Unfortunately we
> >> still see rare cases with prolonged logging calls, we have traced that
> down
> >> to dynamic memory allocation issue.
> >>
> >> Any thoughts?
> >>
> >> Denys Smolianiuk
> >>
> >> Senior SW Engineer
> >> Harmonic Inc.
> >>
> >>
> >
> > --
> > Matt Sicker <boa...@gmail.com>
>

Reply via email to