Re: [log4cxx] Logging in Timing-Critical Applications

Denys Smolainiuk Fri, 17 Aug 2018 03:55:33 -0700

Hi Matt,

Don't think boost has queue following disruptor pattern. But hopefullywe don't need multiple consumers/producers here.

Otherwise I'm glad to here I'm not the only one seeing such issues :)


Thanks,
Denys Smolianiuk

I don't know much about the state of log4cxx architecture, but nearlyallyour points (other than the lock in stringstream) are points weoptimizefor in log4j2 at least. Even the stringstream optimization soundssimilar
to StringBuffer versus StringBuilder in java. As for the queue used in
async logging, I'm not sure what guarantees you get in C++ memorymodels,but I'm curious if the disruptor queue pattern made its way over toboost?
On Thu, 16 Aug 2018 at 10:37, Denys Smolainiuk <
denys.smolian...@harmonicinc.com> wrote:

> Hello All,
>
> I'd like to share some experience as well as some patches withregard to> using log4cxx in timing-critical application. First few words aboutour> requirements: it's a service which must generate some networkpackets with> up to hundred of microseconds precision. Thus, it's very importantto have> predictable code timing. One can argue that log4cxx is not verywell suited> for such applications, but surprisingly it works pretty well aftersome
> light tuning.
>
> So, what were the issues?
> Basically from library user's point of view they looked the same:one of a> sudden logging done with LOG4CXX_DEBUG() macro could takeunexpectedly long> time to complete. For example the same trace which takes severalμs for 99%> of the time would take hundreds microseconds and even fewmilliseconds> sometimes. After further investigation this has been traced down tofew of
> the root-causes:
>
> 1. Asyns logger (which we have been using of course) has internalqueue to
> pass log entries to background disk-writer thread. This queue is
> mutex-protected which might seem fine unless you think a little bitmore> about it. First of all, someone calling LOG4CXX_DEBUG() to simplyput> something into the log might not expect to be blocked insidewaiting for a> mutex at all. Second point is that, although there were measurestaken to> minimize time disk-thread holds that lock, OS-schedulers often workin a> way that thread which is blocked on a mutex gets de-scheduled. Withnormal> OS-scheduler quantum that means that the logging thread can bepreempted
> for milliseconds.
>
> 2. There are some mutexes protecting internal states of bothloggers and> appenders. This means that two separate threads callingLOG4CXX_DEBUG() can> block each other. Even if they are using different loggers theywould block> on appender! This has the same consequences for execution timingand the
> performance as described above.
>
> 3. std::stringstream class constructor has some internal locks onit's> own. Unfortunately each MessageBuffer has it's own instance of thisclass.> And also unfortunately MessageBuffer is created insideLOG4CXX_DEBUG()> macro. There is optimization to not create stringstream for loggingsimple> strings, but as soon as your log statement has single '<<' operatorit's
> created.
>
> 4. Dynamic memory allocations. Unfortunately there are still quitefew of> them even though memory pool is used in some other places. Thus,hidden
> calls to new and malloc induce unpredictable delays.
>
> So, what we did to mitigate these problems?
>
> 1. Natural solution for this issue was to use atomic queue. Thereare few> of them available, but we made use of boost::lockfree::queue as itcan
> serve as a drop-in replacement allowing us to keep all present
> functionality.
>
> 2. After looking more into the code it has appeared that twoconcurrent> calls to LOG4CXX_DEBUG() from within different threads are notharmful> because internal structures of logger and appender are not beingchanged> there. What only really requires protection is concurrency betweenlogging> and configuring. Thus, we came to a solution - read-write lockswhere> logging calls act as readers and configuration/exiting calls arewriters.> With such approach multiple threads calling LOG4CXX_DEBUG() becamefree of
> any contention.
>
> 3. This problem also has one simple solution - make one static
> std::stringstream object per thread using std::thread_local.Unfortunately> we found one drawback - thread_local memory is not released ifthread is> not detached or joined. As there is some code which does neither ofthis we> made static stringstream a xml file configuration option. Also,there could> be an issue with using multiple MessageBuffer instances from withinsingle
> thread, but LOG4CXX_DEBUG() is not doing that.
>
> 4. At this time we didn't do anything to address dynamic memoryallocation
> issue.
>
> So, if you want to give our patch a try it is attached. It's basedon
> log4cxx 0.10.0-12 as found in Debian. There are new SHARED_MUTEX and
> LOCK_R/LOCK_W macros defined in mutex.h and synchronized.h for easy
> switching between RW and convenient mutexes for benchmarking. Also,there> is an test application which spawns two threads doing logging withsome> small sleep between iterations. It prints to stdout if tracestatement took> more than 500 microseconds. It might look familiar too you becauseit's
> based on one of the examples from StackOverflow.
>
> In our testing modified log4cxx has about 20% better performanceoverall,> but more importantly for us it has much less cases when logstatement takes> excessive time to complete. The second part is only true for CPUswith >2> cores where all threads can physically run in parallel.Unfortunately we> still see rare cases with prolonged logging calls, we have tracedthat down
> to dynamic memory allocation issue.
>
> Any thoughts?
>
> Denys Smolianiuk
>
> Senior SW Engineer
> Harmonic Inc.
>
>

--
Matt Sicker <boa...@gmail.com>

Re: [log4cxx] Logging in Timing-Critical Applications

Reply via email to