I would start off by asking that Journal nodes be on separate machines, maybe along with namenodes. If that is not possible, at least provide dedicated disks to journalnode process, that is not shared by your datanode process.
>Is it expected to grow very large and/or needs to be in a separate partition? It is not the size of the journals that will hurt you; the datanode is a very high bandwidth application, that is it writes lots of data but can afford to be slower. Whereas journal nodes do not write too much data, but if they are waiting around for I/O to complete because of Datanode I/O, it might lead to your namenodes becoming slow, which means that your cluster will be slower. In other words, Journal I/O is latency sensitive. Thanks Anu From: Francisco de Freitas <[email protected]> Date: Wednesday, April 18, 2018 at 1:07 AM To: "[email protected]" <[email protected]> Subject: Journal node edits directory We currently run journalnodes together with datanodes and they share the same mount point for both the data dir and edits dir. We ran into the issue where this shared mount point volume used for the datanode got full and thus the journal node was unable to start due to insufficient space. How would you go about where to place the journal node edits? Is it expected to grow very large and/or needs to be in a separate partition? Or can I use e.g. tmpfs for it? Our namespace of 1PB with 5 journal nodes sees the journal node edits size of about 5.4GB (on each journal node) Thanks for any tips and best practices.
