Re: Hadoop "managed" setup basic question (Ambari, CDH?)

Marton, Elek Tue, 26 Sep 2017 14:02:19 -0700

If you would like to do it in a more dynamic way you can also youservice registry/key-value stores.

For example the configuration could be stored in Consul and the servers(namenode, datanode) could be started with consul-template(https://github.com/hashicorp/consul-template)

In case of configuration change the servers will be refreshed andrestarted automatically.


Marton

ps: I use a very similar (but much more complex) approach when I runHadoop in the cloud.


1. I initialize the vm-s with Terraform

2. After that I install the basic infrastructure (eg. Consul, Nomadservers and Weave scope monitoring) with ansible. Inventory file isgenerated from the Terraform state file.

3. I start hadoop/namenode from docker. Containers are scheduled withNomad. (nomad definition: http://github.com/flokkr/runtime-nomad,generic docs about containers: https://github.com/flokkr/flokkr)

4. Configuration is stored in a git repository(https://github.com/flokkr/configuration) in a simplified format. Duringa preprocessing step they are uploaded to the consul (in the finalform). And it supports specific profiles. For example I can switcheasily between HA or non-HA configuration just with one flag.

5. A consul-template like script (but more simple:https://github.com/elek/consul-launcher) is part of my the docker images(https://github.com/flokkr/docker-baseimage). They are listen on thechanges in consul and the servers will be restarted if the configurationare changes.

There are many small pieces, so most probably it's a more complexsolution then what you need. But if you are familiar with the smalltools, it's not so hard the build some low-level (and lightening-fast)configuration-management/service-registry solution with the existingdevops tools.










On 09/22/2017 12:42 PM, Sanel Zukan wrote:

Hi,

For this amount of nodes, I'd go with automation tools like
Ansible[1]/Puppet[2]/Rex[3]. They can install necessary packages, setup
/etc/hosts and make per-node settings.

Ansibles has a nice playbook
(https://github.com/analytically/hadoop-ansible) you can start with and
Puppet isn't short either (https://forge.puppet.com/tags/hadoop).

Best,
Sanel

[1] https://ansible.com
[2] https://puppet.com
[3] https://rexify.org

"Zaki SEc." <[email protected]> writes:

[I am sorry in case this mail is sent twice, it was not intentional]

Hi!

I'm fairly new to Hadoop, but I've been browsing the documentation and
'how-to'-s for some time now.

My question would be as follows; How can one setup a cluster, where the
nodes aren't static?
What I mean is, I want to be able to run a cluster, say, 20 machines, where
each of the nodes have Hadoop installed, and they 'recognize' each other -
saving me from having to manually set their hostnames and configure their
'/etc/hosts' file.

I did look into Apache Ambari, hoping that it would give me an easy
solution to the above problem, but it does not support Ubuntu 16.04 which I
have to work with, and it failed to build for various reasons.
I have also looked into Cloudera's CDH distribution, (the manual
installation) but that has the same problem - it asks me to manually
configure these settings for each node.

It seemed to me, that "Rack Awareness" could potentially solve my problem,
but after some reading, I had to realize that it's for a different thing
entirely.
So now it looks like I'm out of options.

Lately, I was wondering about writing an external script, that would update
the settings for each of the nodes automatically, based on one central
'list', hosted on, for ex. the NameNode. While this isn't nearly on the
level of a real dynamic setup, it would make my job significantly easier.

Thanks in advance,
Zaki


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Hadoop "managed" setup basic question (Ambari, CDH?)

Reply via email to