[
https://issues.apache.org/jira/browse/HADOOP-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280617#comment-14280617
]
Laurie Denness commented on HADOOP-10181:
-----------------------------------------
+1 for this one.. We have 350 nodes so they span two subnets/VLANs, but we
forward the Ganglia multicast between them. We just upgraded to CDH5 (Hadoop
2.5) and now we've lost our Ganglia metrics for half of the cluster (because
the TTL is set to 1) depending on which half of the cluster you ask (as they
will not span the two subnets)
(Given we now have a tonne of missing metrics I would say this is more than
Minor, as the alternative is use Unicast Ganglia and that is not ideal for
other reasons)
> GangliaContext does not work with multicast ganglia setup
> ---------------------------------------------------------
>
> Key: HADOOP-10181
> URL: https://issues.apache.org/jira/browse/HADOOP-10181
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Andrew Otto
> Priority: Minor
> Labels: ganglia, hadoop, metrics, multicast
>
> The GangliaContext class which is used to send Hadoop metrics to Ganglia uses
> a DatagramSocket to send these metrics. This works fine for Ganglia
> multicast setups that are all on the same VLAN. However, when working with
> multiple VLANs, a packet sent via DatagramSocket to a multicast address will
> end up with a TTL of 1. Multicast TTL indicates the number of network hops
> for which a particular multicast packet is valid. The packets sent by
> GangliaContext do not make it to ganglia aggregrators on the same multicast
> group, but in different VLANs.
> To fix, we'd need a configuration property that specifies that multicast is
> to be used, and another that allows setting of the multicast packet TTL.
> With these set, we could then use MulticastSocket setTimeToLive() instead of
> just plain ol' DatagramSocket.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)