Ah thanks Kay! I've never looked into Grok patterns, but that sounds like they could help a great deal.
As you've pointed out in my extractors, there's only a very small number of specific log lines I need to identify and these contain all the fields I wish to extract relating to the potential issues, so a Grok pattern sounds like a perfect solution for that. I don't think I need any data type conversions but I'm planning on upgrading the test lab to 1.1 next week anyway. Thanks for your help, I have some reading to do! Cheers, Pete On Friday, 5 June 2015 16:12:21 UTC+10, Kay Röpke wrote: > > Pete, > > The extractors themselves do not look too bad, but however whenever you > use leading wildcards to extract similar data, the work that the extractors > have to do is repeated, since they are executed one after the other. > > If there's no better way to extract that data, you might want to look into > Grok patterns, as those will be executed "in parallel". > For example, if you have multiple patterns that could potentially match, > and then use | to combine those patterns, they get compiled down into a > single regular expression. > That should be faster, even though the overall expression is larger. > > The upside is that you can extract multiple named fields at once with Grok > and can apply data type conversions in 1.1. > > You'll find examples in our documentation. Please note that the type > conversions are a new feature in 1.1. > > Best, > Kay > > On Fri, Jun 5, 2015, 2:45 AM Pete GS <[email protected] <javascript:>> > wrote: > >> Hi all, >> >> I've finally discovered the source of my excess CPU load and high load >> averages on my Graylog nodes! >> >> I've got a bunch of extractors that I use to pull information from my >> vSphere platform's VMKernel logs. >> >> The catch with these is that a lot of items in the message string vary >> quite a bit, so finding a regex to match is quite difficult... read pretty >> much impossible for my limited regex skills :) >> >> The way I've worked around this is to use wildcards in the regex strings >> and that seems to be causing my load average to go from ~0.4 to ~2 or even >> more and the CPU's regularly peak at 100%. >> >> Is this expected behaviour? >> >> I recall an issue with earlier versions of Graylog where wildcards in >> stream rules would cause this but I believe that was much improved in the >> 1.0 release and I have noticed that difference. I'm running 1.0.2 at >> present. >> >> Is there a similar improvement with extractors in 1.1 or is it being >> worked on perhaps? >> >> I intend to put 1.1 into my test lab early next week but it doesn't see >> anywhere near as many messages/sec as Production so I won't really see any >> indications until I get it into Production. >> >> I've attached my current extractors. >> >> Any feedback on this would be great, and in the meantime I'll start >> trying to optimise my extractors a bit more to see if I can remove some >> wildcards. >> >> Cheers, Pete >> >> -- >> You received this message because you are subscribed to the Google Groups >> "graylog2" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
