Hi Konstantinos, Nice documentation! Wish you all the success for expanding to Hadoop-HA mode.
I'd say, the JournalNode should be co-located on machines with other Hadoop master daemons; for example Namenodes, YARN ResourceManager etc. These daemons are attractive because they are already well-provisioned machines with little unpredictable user activity, and those daemons are generally light on disk usage, compares to worker nodes(Datanode, Nodemanager etc.). In general, dedicating a disk drive on each of the machines for use by the JournalNode helps avoid disk spindle competition between others. Sorry, I don't have any reports with me now. Perhaps other folks can pitch in and add more about any performance benchmarks results, if any. For ZooKeeper server, can refer http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html, https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview pages. Thanks, Rakesh On Fri, Aug 12, 2016 at 5:56 PM, Konstantinos Tsakalozos < [email protected]> wrote: > + the hadoop list > > On Fri, Aug 12, 2016 at 3:25 PM, Konstantinos Tsakalozos < > [email protected]> wrote: > >> Hi Rakesh, >> >> Thank you for your prompt reply. >> >> In the Juju big data team we bundle Hadoop and a set of "peripheral" >> helper services so that any interested user can easily deploy the full >> environment in an automated way. >> The deployment bundle looks like this: >> https://jujucharms.com/hadoop-processing/ >> . On the right side of the bundle you see a client service that can be >> replaced with any other service the user wishes (eg Hive, Pig etc). We >> also decided to go with ganglia and rsyslog for monitoring. Would you >> prefer to see anything more there? In the next release we will be adding >> Apache Zookeeper that will give us HA and this is why I am asking where >> would it be best to place the journal nodes. >> >> In our case it would be preferable to "waste" one more "namenode" >> machine (machine=unit in juju terminology) to place the third journal >> service by itself. The deployment would be cleaner and easier to reach. >> Also, appreciate very much your advice on dedicated storage. Are there any >> performance benchmarks showing what bandwidth we can sustain with shared vs >> dedicated storage for the journal nodes? >> >> Thank you, >> Konstantinos >> >> >> >> >> On Fri, Aug 12, 2016 at 2:26 PM, Rakesh Radhakrishnan <[email protected] >> > wrote: >> >>> Hi Konstantinos, >>> >>> The typical deployment is, three Journal Nodes(JNs) and can collocate >>> two of the three JNs on the same machine where Namenodes(2 NNs) are >>> running. The third one can be deployed to the machine where ZK server is >>> running(assume ZK cluster has 3 nodes). I'd recommend to have a dedicated >>> disk for each JN server to use for edit log path as edit logs will be >>> writing continuously. >>> >>> It would be helpful if you could give more details of your Hadoop >>> cluster size and components including ZK service etc. >>> >>> Thanks, >>> Rakesh >>> >>> On Fri, Aug 12, 2016 at 3:12 PM, Konstantinos Tsakalozos < >>> [email protected]> wrote: >>> >>>> Hi everyone, >>>> >>>> In an HA setup do you tend to co-host the journal service with other >>>> services instead of having them on separate dedicated machines? If so, what >>>> services do you pack together? >>>> >>>> Thank you, >>>> Konstantinos >>>> >>> >>> >> >
