If you would like to do it in a more dynamic way you can also you
service registry/key-value stores.
For example the configuration could be stored in Consul and the servers
(namenode, datanode) could be started with consul-template
(https://github.com/hashicorp/consul-template)
In case of configuration change the servers will be refreshed and
restarted automatically.
Marton
ps: I use a very similar (but much more complex) approach when I run
Hadoop in the cloud.
1. I initialize the vm-s with Terraform
2. After that I install the basic infrastructure (eg. Consul, Nomad
servers and Weave scope monitoring) with ansible. Inventory file is
generated from the Terraform state file.
3. I start hadoop/namenode from docker. Containers are scheduled with
Nomad. (nomad definition: http://github.com/flokkr/runtime-nomad,
generic docs about containers: https://github.com/flokkr/flokkr)
4. Configuration is stored in a git repository
(https://github.com/flokkr/configuration) in a simplified format. During
a preprocessing step they are uploaded to the consul (in the final
form). And it supports specific profiles. For example I can switch
easily between HA or non-HA configuration just with one flag.
5. A consul-template like script (but more simple:
https://github.com/elek/consul-launcher) is part of my the docker images
(https://github.com/flokkr/docker-baseimage). They are listen on the
changes in consul and the servers will be restarted if the configuration
are changes.
There are many small pieces, so most probably it's a more complex
solution then what you need. But if you are familiar with the small
tools, it's not so hard the build some low-level (and lightening-fast)
configuration-management/service-registry solution with the existing
devops tools.
On 09/22/2017 12:42 PM, Sanel Zukan wrote:
Hi,
For this amount of nodes, I'd go with automation tools like
Ansible[1]/Puppet[2]/Rex[3]. They can install necessary packages, setup
/etc/hosts and make per-node settings.
Ansibles has a nice playbook
(https://github.com/analytically/hadoop-ansible) you can start with and
Puppet isn't short either (https://forge.puppet.com/tags/hadoop).
Best,
Sanel
[1] https://ansible.com
[2] https://puppet.com
[3] https://rexify.org
"Zaki SEc." <[email protected]> writes:
[I am sorry in case this mail is sent twice, it was not intentional]
Hi!
I'm fairly new to Hadoop, but I've been browsing the documentation and
'how-to'-s for some time now.
My question would be as follows; How can one setup a cluster, where the
nodes aren't static?
What I mean is, I want to be able to run a cluster, say, 20 machines, where
each of the nodes have Hadoop installed, and they 'recognize' each other -
saving me from having to manually set their hostnames and configure their
'/etc/hosts' file.
I did look into Apache Ambari, hoping that it would give me an easy
solution to the above problem, but it does not support Ubuntu 16.04 which I
have to work with, and it failed to build for various reasons.
I have also looked into Cloudera's CDH distribution, (the manual
installation) but that has the same problem - it asks me to manually
configure these settings for each node.
It seemed to me, that "Rack Awareness" could potentially solve my problem,
but after some reading, I had to realize that it's for a different thing
entirely.
So now it looks like I'm out of options.
Lately, I was wondering about writing an external script, that would update
the settings for each of the nodes automatically, based on one central
'list', hosted on, for ex. the NameNode. While this isn't nearly on the
level of a real dynamic setup, it would make my job significantly easier.
Thanks in advance,
Zaki
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]