Hi Dustin,
That's great information. Thanks for posting such a detailed response. Both
of you have given me some ideas for things down the road.
Regards,
Brandon
On Sunday, December 11, 2016 at 1:35:06 PM UTC-6, Dustin Tennill wrote:
>
> The feedback on this post has been excellent - I had forgotten about the
> possibility of ram disks as a solution.
>
> We don't have the budget to dedicate large nodes to es/gl, and decided to
> use a combo of older workstations as "slow" nodes and VM's on SSD as the
> "fast" nodes. The workstation nodes worked out fairly well, with enough
> room for three large SATA disks and 16g of ram. The original plan was to
> configure the workstation nodes and see if they have enough gas to support
> our incoming logs, then make SSD decisions once we had better data. After
> spending a few days troubleshooting full output buffers on the graylog
> side, we decided to go ahead and implement two SSD nodes to help with
> incoming messages.
>
> Anyway, hardware in use:
> Two VM Graylog Servers (2.1.2)
> -- Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 16 cores
> -- 40G Ram, 20G Heap
>
> Ten ElasticSearch Nodes (2.4.2)
> - Six Workstation nodes
> -- Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
> -- 16g of ram (12G heap)
> -- 18T raw space (three 6T drives)
> - Two VM nodes with SSD backed storage
> -- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
> -- 16g of ram, 12G heap
> -- 500g of SSD usable
> - One dedicated master node
> -- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
> -- 16g of ram
>
> We understand the heap should be about half our ram on the ES nodes, but
> at 8g ES would crash fairly often. 12g was the sweet spot for us.
> We handle 5000-6000 msgs per second average today, expecting this to be
> about 15,000 per second when the project is completed.
>
> Graylog settings (not the whole set, just what we routinely look at first
> when troubleshooting throughput):
> output_batch_size = 800
> output_flush_interval = 1
> processbuffer_processors = 12
> outputbuffer_processors = 16
> processor_wait_strategy = blocking
> ring_size = 1048576
> inputbuffer_ring_size = 65536
> inputbuffer_processors = 4
> inputbuffer_wait_strategy = blocking
>
> Elasticsearch settings:
>
> ## SSD Nodes
> cluster.name: mycluster
> node.name: ssdnode1
> node.master: false
> node.data: true
> node.box_type: ssd
> path.data: /elasticdata1,/elasticdata2
> path.conf: /etc/elasticsearch
> bootstrap.mlockall: true
> network.host: 192.168.2.2
> transport.tcp.port: 9300
> http.port: 9200
>
> discovery.zen.ping.multicast.enabled: false
> discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
> discovery.zen.ping.timeout: 60s
> discovery.zen.ping.retries: 6
> discovery.zen.ping.interval: 5s
>
> threadpool.bulk.type: fixed
> threadpool.bulk.size: 2
> threadpool.bulk.queue_size: 500
> indices.store.throttle.max_bytes_per_sec: "100mb" ## Allow SSD nodes
> to write faster
>
> ## Workstation Nodes
> cluster.name: mycluster
> node.name: slownode1
> node.master: false
> node.data: true
> node.box_type: slow
> path.data: /elasticdata1,/elasticdata2,/elasticdata3
> path.conf: /etc/elasticsearch
> bootstrap.mlockall: true
> network.host: 192.168.2.5
> transport.tcp.port: 9300
> http.port: 9200
> discovery.zen.ping.multicast.enabled: false
> discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
> discovery.zen.ping.timeout: 60s
> discovery.zen.ping.retries: 6
> discovery.zen.ping.interval: 5s
> index.refresh_interval: 60s ## This made our workstation nodes work
> well.
> indices.fieldata.cache.size: 5%
> threadpool.bulk.type: fixed
> threadpool.bulk.size: 2
> threadpool.bulk.queue_size: 500
>
> To achieve hot-warm, we tagged the fast nodes as "ssd" and added an index
> template to elasticsearch so all new data would be created on those nodes.
> Then we installed curator on our master node, and added a crontab entry
> that runs a bash script each night.
>
> To create the template:
>
> curl -XPUT elasticsearch_node_in_your_cluster:9200/_template/graylog_1 -d
> '{
> "template": "graylog2*",
> "settings": {
> "index.routing.allocation.require.box_type": "ssd"
> }
> }'
>
> Contents of the bash script that runs curator:
> #!/bin/bash
> curator_cli --logfile /var/log/curator.log --loglevel INFO --logformat
> default --host 192.168.2.10 --port 9200 allocation --key box_type --value
> slow --filter_list
> '{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":2}'
>
> Roughly, find all the shards more than 2 days old and change them so they
> must now be located on "slow" storage.
>
> I would love to see hot-warm be handled by Graylog itself - it is a little
> tedious.
>
> Dustin Tennill
> Eastern Kentucky University
>
>
> On Saturday, December 3, 2016 at 10:13:51 AM UTC-5, Dustin Tennill wrote:
>>
>> All,
>>
>> We just finished implementing
>> https://www.elastic.co/blog/hot-warm-architecture
>> <https://www.elastic.co/blog/hot-warm-architecture?blade=tw> for our
>> Graylog environment. After weeks of troubleshooting elasticsearch
>> performance issues with our budget ES nodes, the addition of a two small
>> SSD nodes REALLY made a difference. Our output buffers had been filling up
>> from time to time, and this appears to have resolved that issue.
>>
>> If anyone is interested, we will post our config information.
>>
>> Dustin Tennill
>> EKU
>>
>>
--
You received this message because you are subscribed to the Google Groups
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/graylog2/0bc5f27a-2bf9-4c43-8903-1a9d75d2ded8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.