Re: [graylog2] Extractors with Wildcards Cause High CPU/Load Average

Pete GS Fri, 05 Jun 2015 13:17:31 -0700

Ah thanks Kay!

I've never looked into Grok patterns, but that sounds like they could help 
a great deal.


As you've pointed out in my extractors, there's only a very small number of 
specific log lines I need to identify and these contain all the fields I 
wish to extract relating to the potential issues, so a Grok pattern sounds 
like a perfect solution for that.

I don't think I need any data type conversions but I'm planning on 
upgrading the test lab to 1.1 next week anyway.

Thanks for your help, I have some reading to do!

Cheers, Pete

On Friday, 5 June 2015 16:12:21 UTC+10, Kay Röpke wrote:
>
> Pete, 
>
> The extractors themselves do not look too bad, but however whenever you 
> use leading wildcards to extract similar data, the work that the extractors 
> have to do is repeated, since they are executed one after the other.
>
> If there's no better way to extract that data, you might want to look into 
> Grok patterns, as those will be executed "in parallel".
> For example, if you have multiple patterns that could potentially match, 
> and then use | to combine those patterns, they get compiled down into a 
> single regular expression.
> That should be faster, even though the overall expression is larger.
>
> The upside is that you can extract multiple named fields at once with Grok 
> and can apply data type conversions in 1.1.
>
> You'll find examples in our documentation. Please note that the type 
> conversions are a new feature in 1.1.
>
> Best,
> Kay
>
> On Fri, Jun 5, 2015, 2:45 AM Pete GS <[email protected] <javascript:>> 
> wrote:
>
>> Hi all,
>>
>> I've finally discovered the source of my excess CPU load and high load 
>> averages on my Graylog nodes!
>>
>> I've got a bunch of extractors that I use to pull information from my 
>> vSphere platform's VMKernel logs.
>>
>> The catch with these is that a lot of items in the message string vary 
>> quite a bit, so finding a regex to match is quite difficult... read pretty 
>> much impossible for my limited regex skills :)
>>
>> The way I've worked around this is to use wildcards in the regex strings 
>> and that seems to be causing my load average to go from ~0.4 to ~2 or even 
>> more and the CPU's regularly peak at 100%.
>>
>> Is this expected behaviour?
>>
>> I recall an issue with earlier versions of Graylog where wildcards in 
>> stream rules would cause this but I believe that was much improved in the 
>> 1.0 release and I have noticed that difference. I'm running 1.0.2 at 
>> present.
>>
>> Is there a similar improvement with extractors in 1.1 or is it being 
>> worked on perhaps?
>>
>> I intend to put 1.1 into my test lab early next week but it doesn't see 
>> anywhere near as many messages/sec as Production so I won't really see any 
>> indications until I get it into Production.
>>
>> I've attached my current extractors.
>>
>> Any feedback on this would be great, and in the meantime I'll start 
>> trying to optimise my extractors a bit more to see if I can remove some 
>> wildcards.
>>
>> Cheers, Pete
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "graylog2" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Extractors with Wildcards Cause High CPU/Load Average

Reply via email to