I would start off by asking that Journal nodes be on separate machines, maybe 
along with namenodes.
If that is not possible, at least provide dedicated disks to journalnode 
process, that is not shared by your datanode process.

>Is it expected to grow very large and/or needs to be in a separate partition?
It is not the size of the journals that will hurt you; the datanode is a very 
high bandwidth application, that is it writes lots of data but can afford to be 
slower.
Whereas journal nodes do not write too much data, but if they are waiting 
around for I/O to complete because of Datanode I/O,
it might lead to your namenodes becoming slow, which means that your cluster 
will be slower. In other words, Journal I/O is latency sensitive.

Thanks
Anu

From: Francisco de Freitas <[email protected]>
Date: Wednesday, April 18, 2018 at 1:07 AM
To: "[email protected]" <[email protected]>
Subject: Journal node edits directory

We currently run journalnodes together with datanodes and they share the same 
mount point for both the data dir and edits dir.

We ran into the issue where this shared mount point volume used for the 
datanode got full and thus the journal node was unable to start due to 
insufficient space.

How would you go about where to place the journal node edits? Is it expected to 
grow very large and/or needs to be in a separate partition? Or can I use e.g. 
tmpfs for it? Our namespace of 1PB with 5 journal nodes sees the journal node 
edits size of about 5.4GB (on each journal node)

Thanks for any tips and best practices.

Reply via email to