Re: [graylog2] Extractors with Wildcards Cause High CPU/Load Average

Pete GS Sun, 14 Jun 2015 18:56:51 -0700

Just wanted to post a follow up to this...

I've finally gotten my head around Grok patterns and how to use these with 
extractors and I have replaced all my extractors with only two to achieve 
the same set of extractions.


Load average on the Graylog servers is now in the 0.08 - 0.12 range while 
processing the same number of messages/second and extracting the exact same 
data.

So once again thanks Kay!

Cheers, Pete

On Saturday, 6 June 2015 06:16:54 UTC+10, Pete GS wrote:
>
> Ah thanks Kay!
>
> I've never looked into Grok patterns, but that sounds like they could help 
> a great deal.
>
> As you've pointed out in my extractors, there's only a very small number 
> of specific log lines I need to identify and these contain all the fields I 
> wish to extract relating to the potential issues, so a Grok pattern sounds 
> like a perfect solution for that.
>
> I don't think I need any data type conversions but I'm planning on 
> upgrading the test lab to 1.1 next week anyway.
>
> Thanks for your help, I have some reading to do!
>
> Cheers, Pete
>
> On Friday, 5 June 2015 16:12:21 UTC+10, Kay Röpke wrote:
>>
>> Pete, 
>>
>> The extractors themselves do not look too bad, but however whenever you 
>> use leading wildcards to extract similar data, the work that the extractors 
>> have to do is repeated, since they are executed one after the other.
>>
>> If there's no better way to extract that data, you might want to look 
>> into Grok patterns, as those will be executed "in parallel".
>> For example, if you have multiple patterns that could potentially match, 
>> and then use | to combine those patterns, they get compiled down into a 
>> single regular expression.
>> That should be faster, even though the overall expression is larger.
>>
>> The upside is that you can extract multiple named fields at once with 
>> Grok and can apply data type conversions in 1.1.
>>
>> You'll find examples in our documentation. Please note that the type 
>> conversions are a new feature in 1.1.
>>
>> Best,
>> Kay
>>
>> On Fri, Jun 5, 2015, 2:45 AM Pete GS <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I've finally discovered the source of my excess CPU load and high load 
>>> averages on my Graylog nodes!
>>>
>>> I've got a bunch of extractors that I use to pull information from my 
>>> vSphere platform's VMKernel logs.
>>>
>>> The catch with these is that a lot of items in the message string vary 
>>> quite a bit, so finding a regex to match is quite difficult... read pretty 
>>> much impossible for my limited regex skills :)
>>>
>>> The way I've worked around this is to use wildcards in the regex strings 
>>> and that seems to be causing my load average to go from ~0.4 to ~2 or even 
>>> more and the CPU's regularly peak at 100%.
>>>
>>> Is this expected behaviour?
>>>
>>> I recall an issue with earlier versions of Graylog where wildcards in 
>>> stream rules would cause this but I believe that was much improved in the 
>>> 1.0 release and I have noticed that difference. I'm running 1.0.2 at 
>>> present.
>>>
>>> Is there a similar improvement with extractors in 1.1 or is it being 
>>> worked on perhaps?
>>>
>>> I intend to put 1.1 into my test lab early next week but it doesn't see 
>>> anywhere near as many messages/sec as Production so I won't really see any 
>>> indications until I get it into Production.
>>>
>>> I've attached my current extractors.
>>>
>>> Any feedback on this would be great, and in the meantime I'll start 
>>> trying to optimise my extractors a bit more to see if I can remove some 
>>> wildcards.
>>>
>>> Cheers, Pete
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "graylog2" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Extractors with Wildcards Cause High CPU/Load Average

Reply via email to