Hi Dustin,

That's great information. Thanks for posting such a detailed response. Both 
of you have given me some ideas for things down the road.

Regards,
Brandon

On Sunday, December 11, 2016 at 1:35:06 PM UTC-6, Dustin Tennill wrote:
>
> The feedback on this post has been excellent - I had forgotten about the 
> possibility of ram disks as a solution. 
>
> We don't have the budget to dedicate large nodes to es/gl, and decided to 
> use a combo of older workstations as "slow" nodes and VM's on SSD as the 
> "fast" nodes. The workstation nodes worked out fairly well, with enough 
> room for three large SATA disks and 16g of ram. The original plan was to 
> configure the workstation nodes and see if they have enough gas to support 
> our incoming logs, then make SSD decisions once we had better data. After 
> spending a few days troubleshooting full output buffers on the graylog 
> side, we decided to go ahead and implement two SSD nodes to help with 
> incoming messages. 
>
> Anyway, hardware in use: 
> Two VM Graylog Servers (2.1.2)
> -- Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 16 cores 
> -- 40G Ram, 20G Heap
>
> Ten ElasticSearch Nodes (2.4.2)
> - Six Workstation nodes
> -- Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
> -- 16g of ram (12G heap)
> -- 18T raw space (three 6T drives)
> - Two VM nodes with SSD backed storage
> -- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
> -- 16g of ram, 12G heap
> -- 500g of SSD usable
> - One dedicated master node
> -- Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
> -- 16g of ram
>
> We understand the heap should be about half our ram on the ES nodes, but 
> at 8g ES would crash fairly often. 12g was the sweet spot for us. 
> We handle 5000-6000 msgs per second average today, expecting this to be 
> about 15,000 per second when the project is completed. 
>
> Graylog settings (not the whole set, just what we routinely look at first 
> when troubleshooting throughput):
>     output_batch_size = 800
>     output_flush_interval = 1
>     processbuffer_processors = 12 
>     outputbuffer_processors = 16
>     processor_wait_strategy = blocking
>     ring_size = 1048576 
>     inputbuffer_ring_size = 65536
>     inputbuffer_processors = 4
>     inputbuffer_wait_strategy = blocking
>
> Elasticsearch settings: 
>
> ## SSD Nodes
>     cluster.name: mycluster
>     node.name: ssdnode1
>     node.master: false
>     node.data: true
>     node.box_type: ssd 
>     path.data: /elasticdata1,/elasticdata2
>     path.conf: /etc/elasticsearch
>     bootstrap.mlockall: true
>     network.host: 192.168.2.2
>     transport.tcp.port: 9300
>     http.port: 9200
>
>     discovery.zen.ping.multicast.enabled: false
>     discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
>     discovery.zen.ping.timeout: 60s
>     discovery.zen.ping.retries: 6
>     discovery.zen.ping.interval: 5s
>
>     threadpool.bulk.type: fixed
>     threadpool.bulk.size: 2
>     threadpool.bulk.queue_size: 500  
>     indices.store.throttle.max_bytes_per_sec: "100mb" ## Allow SSD nodes 
> to write faster
>
> ## Workstation Nodes
>     cluster.name: mycluster
>     node.name: slownode1
>     node.master: false 
>     node.data: true 
>     node.box_type: slow 
>     path.data: /elasticdata1,/elasticdata2,/elasticdata3
>     path.conf: /etc/elasticsearch
>     bootstrap.mlockall: true
>     network.host: 192.168.2.5
>     transport.tcp.port: 9300
>     http.port: 9200
>     discovery.zen.ping.multicast.enabled: false
>     discovery.zen.ping.unicast.hosts: 192.168.2.10:9300
>     discovery.zen.ping.timeout: 60s
>     discovery.zen.ping.retries: 6
>     discovery.zen.ping.interval: 5s
>     index.refresh_interval: 60s ## This made our workstation nodes work 
> well. 
>     indices.fieldata.cache.size: 5%
>     threadpool.bulk.type: fixed
>     threadpool.bulk.size: 2
>     threadpool.bulk.queue_size: 500
>
> To achieve hot-warm, we tagged the fast nodes as "ssd" and added an index 
> template to elasticsearch so all new data would be created on those nodes. 
> Then we installed curator on our master node, and added a crontab entry 
> that runs a bash script each night. 
>
> To create the template: 
>
> curl -XPUT elasticsearch_node_in_your_cluster:9200/_template/graylog_1 -d 
> '{
>   "template": "graylog2*",
>   "settings": {
>     "index.routing.allocation.require.box_type": "ssd"
>   }
> }'
>
> Contents of the bash script that runs curator: 
> #!/bin/bash
> curator_cli --logfile /var/log/curator.log --loglevel INFO --logformat 
> default --host 192.168.2.10 --port 9200 allocation --key box_type --value 
> slow --filter_list 
> '{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":2}'
>
> Roughly, find all the shards more than 2 days old and change them so they 
> must now be located on "slow" storage. 
>
> I would love to see hot-warm be handled by Graylog itself - it is a little 
> tedious. 
>
> Dustin Tennill
> Eastern Kentucky University
>
>
> On Saturday, December 3, 2016 at 10:13:51 AM UTC-5, Dustin Tennill wrote:
>>
>> All,
>>
>> We just finished implementing 
>> https://www.elastic.co/blog/hot-warm-architecture 
>> <https://www.elastic.co/blog/hot-warm-architecture?blade=tw> for our 
>> Graylog environment. After weeks of troubleshooting elasticsearch 
>> performance issues with our budget ES nodes, the addition of a two small 
>> SSD nodes REALLY made a difference. Our output buffers had been filling up 
>> from time to time, and this appears to have resolved that issue. 
>>
>> If anyone is interested, we will post our config information. 
>>
>> Dustin Tennill
>> EKU
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/0bc5f27a-2bf9-4c43-8903-1a9d75d2ded8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to